Brewing Elixir

Intro to Elixir Applications on Kubernetes

Brewing Elixir — Sat, 09 Dec 2023 16:43:58 GMT

Welcome to this new series about running Elixir applications on K8s (short for Kubernetes) where we explore the world of Kubernetes through the eyes of an Elixir programmer to achieve even higher availability, reliability and robustness by levering most tools in the K8s toolbox in a way that would play nice Elixir/OTP and Phoenix applications.

But before we continue let's define some base requirements to get the most out of the series.

Requirements

This series assumes some familiarity with the following technologies:

Elixir programming language and Phoenix web framework: We'll dig just enough into any of these items to prove a certain feature or situation, so having a basic understanding of both should be enough to get through the content. E.g. being able to generate and run a Phoenix application and knowing how an endpoint gets called up to a controller's function should be enough.
Container platform: A basic understanding of what containers are and what benefits they offer is important to get a better sense of why Kubernetes exists and how it complements Elixir. So, if you have written a Dockerfile, built and run an image you already have a good base to continue.
Kubernetes concepts: The following concepts are necessary to have a smooth progress through the series' articles: Pods, Deployments, Services, Nodes, Secrets, Control/Data plane, Namespaces, kubectl.

If this sounds like too much you are most certainly right, but through this series I'll do my best to keep the cognitive load to the minimum and provide explanations of each concept as we introduce them while adding references to learning resources to get all the information you might need to get a great learning experience.

This combination of tools offers a great deal of flexibility and power, but as we learned from Uncle Ben's most famous quote: with great power comes with great ~~responsibility~~ potential complexity.

If you are down to get your hands wet and your skin slightly burnt (as we are going on a sailing adventure with K8s 😉) please keep reading as we're going to level up your deployment game.

Docker + Kubernetes + Elixir + Phoenix = 🚀

Some people might ask: Why run Elixir/Phoenix applications on Kubernetes if the BEAM already offers slightly similar features like self-healing (i.e. Supervision trees)? Or would running on a container with fewer assigned vCPUs go against letting the BEAM effectively use all available cores on a host to run faster (as more schedules would be available increasing the chances of having more concurrent jobs running in parallel)? Or even, does adding K8s into the mix (assuming we are running our app on a VM or directly on bare metal) provide any extra benefit that outweighs the extra complexity introduced?

Instead of answering each question I would like to describe some of the benefits K8s provides that are not Elixir related:

Efficient and secure rollout (and rollback) of new versions: Because you are dealing with containers you get greater boot speeds compared to VMs which reduces the rollout times significantly, and most importantly rollbacks, creating a safe deployment environment to run applications on.
Horizontal scaling: Scale in the number of containers and hosts running them at a much higher speed than scaling VMs.
Efficient and controlled resource usage: By leveraging bin-packing and fine-tuning resource requirements you can efficiently run applications on a set of hosts.
Effectively run Distributed Elixir: By incorporating libreries like libcluster we can get clusters of Erlang nodes getting automatically formed while they can efficiently communicate within the cluster.
Use a heterogeneous set of nodes to run different types of applications on them. Think of running AI applications on GPU-optimized VMs while having the rest of the applications running on regular VMs, all orchestrated by the same cluster.
Run your application locally, on several cloud platforms like AWS, GCP, Azure and others, or even set up your on-premise clusters, while using the same mental model and most descriptors ready to be reused.
And much more: run several application versions at the same time, organize them, manage configuration, orchestrate storage, extend and customize the cluster behavior, etc.

So K8s becomes an abstraction layer between our cluster of hosts and our containers offering a more reliable abstraction to run our application. We can still get all the benefits of the BEAM but they operate at a higher level than K8s so most of the time they complement each other.

That's a lot 😮💨 But bear with me as I promise that if you continue and give the tutorial a try you'll unlock a new set of possibilities for your Elixir applications, and you'll be able to answer the initial questions by yourself.

Let's go back to basics to start learning through experimentation on how we can get all these benefits.

Run a simple Phoenix app on K8s

To grasp how a Phoenix app runs on K8s we'll start by creating a new project to run it on our local host by using a local Kubernetes cluster distribution provided by Docker Desktop.

This post uses the following tools and versions:

Erlang: 26.1.2
Elixir: 1.15.7 (compiled with Erlang/OTP 26)
Phoenix: 1.7.10
Docker Desktop : Docker version 24.0.6-rd
kubectl: v1.28.4

Each tool listed has a link to its installation doc. For Erlang/Elixir I recommend the asdf version manager.

To be able to run Kubernetes locally you'll need to enable it in Docker Desktop by opening Settings > Kubernetes and clicking on Enable Kubernetes.

This will restart the Docker Desktop application but once it completes you'll have a ready-to-use local K8s environment. To check that's true you can run:

$ kubectl versionClient Version: v1.28.4Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3Server Version: v1.28.2$ kubectl get nodesNAME             STATUS   ROLES           AGE   VERSIONdocker-desktop   Ready    control-plane   16h   v1.28.2

If you get anything different (besides the versions) check if docker is running (i.e. docker ps to list containers).

In case you feel adventurous there are other great options to run K8s locally like Docker's Kubernetes extension, Rancher Desktop, Microk8s and many others, but they are not required to complete this tutorial as we'll be using Docker Desktop.

Creating the app

The first thing we need is our Phoenix app so let's generate one (without ecto as we won't use it for now):

mix phx.new myapp --no-ectocd myapp

Next, we'll perform a few changes to the project to simulate it is a "real" application. So open your editor of choice and perform the following changes:

Add a new route to receive POST requests with an optional name query parameter and then use it to return "Hello #{name}" as text with a HTTP 200 status code.

lib/myapp_web/router.ex

defmodule MyappWeb.Router do# ... scope "/api", MyappWeb do    pipe_through :api    post "/hello", HelloController, :hello  end# ...end

lib/myapp_web/controllers/hello_controller.ex

defmodule MyappWeb.HelloController do  use MyappWeb, :controller  def hello(conn, params) do    name = Map.get(params, "name", "World")    conn |> put_status(200) |> text("Hello #{name}")  endend

Let's give it a try by running it and testing http://localhost:4000 with curl (or your HTTP client of choice).

# Start the servermix phx.server# Reach the route using curl (or your HTTP tool of choice)curl -X POST http://localhost:4000/api/hello -d 'name=BrewingElixir'

Nice! We go Hello BrewingElixir which proves our app works locally so we can proceed with the next steps.

Build a container image

We are entering the world of containers now and because our application's release is custom we need to define a Dockerfile to instruct Docker on what our app's image will contain. Luckily for us, we can use the release generator task from the Phoenix project to get a ready-to-user Dockerfile. Just run:

mix phx.gen.release --docker

Let's slow down to inspect and describe what the Dockerfile provides because is very important to understand what the image will contain as is the foundation of what will get executed. This same image is what will get used and managed by K8s once we deploy it.

# 1. Build-time configureable argumentsARG ELIXIR_VERSION=1.15.7ARG OTP_VERSION=26.1.2ARG DEBIAN_VERSION=bullseye-20231009-slimARG BUILDER_IMAGE="hexpm/elixir:${ELIXIR_VERSION}-erlang-${OTP_VERSION}-debian-${DEBIAN_VERSION}"ARG RUNNER_IMAGE="debian:${DEBIAN_VERSION}"# 2. The base image use to build the release.# The file has to stages to leave "build time" files out# from the runtime to slim down the final image to use.# Ref: https://docs.docker.com/build/building/multi-stage/FROM ${BUILDER_IMAGE} as builder# install build dependenciesRUN apt-get update -y && apt-get install -y build-essential git \    && apt-get clean && rm -f /var/lib/apt/lists/*_*# prepare build dirWORKDIR /app# install hex + rebarRUN mix local.hex --force && \    mix local.rebar --force# set build ENVENV MIX_ENV="prod"# install mix dependenciesCOPY mix.exs mix.lock ./RUN mix deps.get --only $MIX_ENVRUN mkdir config# copy compile-time config files before we compile dependencies# to ensure any relevant config change will trigger the dependencies# to be re-compiled.COPY config/config.exs config/${MIX_ENV}.exs config/RUN mix deps.compileCOPY priv privCOPY lib libCOPY assets assets# compile assetsRUN mix assets.deploy# Compile the releaseRUN mix compile# Changes to config/runtime.exs don't require recompiling the codeCOPY config/runtime.exs config/COPY rel relRUN mix release# start a new build stage so that the final image will only contain# the compiled release and other runtime necessitiesFROM ${RUNNER_IMAGE}RUN apt-get update -y && \  apt-get install -y libstdc++6 openssl libncurses5 locales ca-certificates \  && apt-get clean && rm -f /var/lib/apt/lists/*_*# Set the localeRUN sed -i '/en_US.UTF-8/s/^# //g' /etc/locale.gen && locale-genENV LANG en_US.UTF-8ENV LANGUAGE en_US:enENV LC_ALL en_US.UTF-8WORKDIR "/app"RUN chown nobody /app# set runner ENVENV MIX_ENV="prod"# Only copy the final release from the build stageCOPY --from=builder --chown=nobody:root /app/_build/${MIX_ENV}/rel/myapp ./USER nobody# If using an environment that doesn't automatically reap zombie processes, it is# advised to add an init process such as tini via `apt-get install`# above and adding an entrypoint. See https://github.com/krallin/tini for details# ENTRYPOINT ["/tini", "--"]CMD ["/app/bin/server"]

The file is ready to use but keep in mind this is where you'll come to add system dependencies required by our application. E.g. Installing ffmpeg if you intend to use it for video manipulation. In other cases, you'll want to use a different "base image" for your container image to increase security and/or reduce image size. Those are also actions that will be applied to this same file.

K8s truly excels is being able to run multiple Pods from different images to easily validate their correct functionality (think canary or blue/green deployments) while running the same underlying nodes.

Continuing with our application, let's build the docker image based on the Dockerfile we've just seen by running:

docker build -t myapp  .

This should effectively build the image and store it with the tag latest. You can verify this by running:

docker images myapp

Output:

REPOSITORY   TAG       IMAGE ID       CREATED              SIZEmyapp        latest    68e102c0eb9d   About a minute ago   125MB

Sweet! We have our image ready and is only 125MB! 😅 . You are probably wondering: Is that the best we can do? Is not, but should be enough for now. If you are curious about what to use when aiming at a small image check out Alpine Linux at Docker Hub. Images start at 5MB which is 4% of our current image size!

Now let's run it to verify the image is good.

docker run -p 4000:4000 myapp:latest

Oh no! The output looks starts like this:

** (RuntimeError) environment variable SECRET_KEY_BASE is missing. You can generate one by calling: mix phx.gen.secret

Ah! That's because we are running a release which executes config/runtime.exs during start-up which ends up raising an exception when SECRET_KEY_BASE is not provided. Let's do as suggested and provide the value to docker like this:

SECRET_KEY_BASE=$(mix phx.gen.secret)docker run -p 4000:4000 -e SECRET_KEY_BASE=$SECRET_KEY_BASE myapp:latest

Output:

13:59:38.371 [info] Running MyappWeb.Endpoint with cowboy 2.10.0 at :::4000 (http)13:59:38.372 [info] Access MyappWeb.Endpoint at https://example.com

Congratulations! You have a Phoenix application running within a Docker container.

You probably noticed the -p 4000:4000 parameter when running docker. This is necessary to instruct Docker to connect the host's port to the container's port. You can use a different one if you like (as long as is above 1024 and below 65535) as long as it connects to the listening port in the container which is 4000 for this application.

In another terminal run the following curl command to verify the app's code gets executed:

curl -X POST http://localhost:4000/api/hello -d 'name=BrewingElixirFromDocker'

Output:

Hello BrewingElixirFromDocker

Great! The app works as expected. You can stop the container by typing ctrl+c.

At this point, you might have noticed you had to:

Manually start/stop the app.
Provide the needed configuration to docker to let the app start correctly.
Set up the hosts and container ports to allow access to the service.

which is fine for testing apps locally, but what would happen if you need to perform a deployment with zero downtime? Or if the universe has a glitch that makes the BEAM unexpectedly crash? 💥 At this moment you would have to start looking for other tools that would help solve these issues. But look no further! That's one of the many features Kubernetes offers, so let's finally test this thing 😎

Deploy to Kubernetes

To deploy your application to K8s you'll use a combination of the following resources:

Deployment: provides declarative updates for Pods (and ReplicaSets). This is where we define what docker image to use.
Service: used to expose our application running in your cluster behind a single outward-facing endpoint. In simple terms, it connects a deployment to one or more ports.
ConfigMaps: an object used to store non-confidential data in key-value pairs.
Secrets: similar to ConfigMaps but are specifically intended to hold confidential data.

These resources are all defined as YAML files that get applied by kubectl which instructs K8s on what resources to create or update. This is key as K8s allows you to declaratively (and sometimes imperatively) define resources and the state you want for them. Can't stress how useful and important this is. It allows you to define the "end state" of the resource, even when something is happening mid-flight. K8s will "figure out" which steps are needed to get to the desired state effectively.

Continuing with the setup, go ahead and create a directory under the root folder of our Phoenix projects called infra.

mkdir infra

You'll place all K8s descriptors there for now. In more sophisticated deployments these files could be included in a separate repository with Terraform and Ansible files in case that's the IaC preference of the team/company.

Now create a deployment descriptor. This is usually a YAML file that declares what images to use within the Pods, the naming used within the cluster, what arguments to pass and many other configurations to instruct K8s on how to orchestrate the deployment. So go ahead and create:

infra/deployment.yaml

apiVersion: apps/v1kind: Deploymentmetadata:  name: myapp-deployment  labels:    app: myappspec:  replicas: 1  selector:    matchLabels:      app: myapp  template:    metadata:      labels:        app: myapp    spec:      containers:      - name: myapp        image: myapp:latest        ports:        - containerPort: 4000

You can see how the name of the deployment is defined as well as the image and ports used by the container.

With the following command, you'll instruct your Kubernetes cluster through kubectl to read the deployment descriptor and apply the desired state:

kubectl apply -f infra/deployment.yaml

Output:

deployment.apps/myapp-deployment created

This command returns almost instantaneously but the resources are still getting created. To check the state run:

kubectl get deploy

Output:

NAME               READY   UP-TO-DATE   AVAILABLE   AGEmyapp-deployment   0/1     1            0           7s

Or check the Pods associated with the deployment:

kubectl get pod

Output:

NAME                                READY   STATUS             RESTARTS   AGEmyapp-deployment-7cc9745dcb-bpkt6   0/1     ImagePullBackOff   0          3m

Something looks off as the READY column shows 0/1 deployments are ready, and the same when we list the Pods. By checking the STATUS of the latest output we can see it is reporting ImagePullBackOff. This is expected because the image provided can't be found by the cluster because is not stored in a registry. A registry is an image repository that K8s uses to look for images. By default, K8s will look for images at DockerHub and only be able to access them if they are public or the cluster has credentials configured to access private images.

For the sake of simplicity, we'll stick with Docker's product and create an account at DockerHub. Once your account is ready create a private repository for your docker image:

The namespace will be different so remember to change brewingelixir with the name of your namespace.

To be able to push to the right registry we need to tag our existing image (or build it with the desired tag). To achieve that run:

docker tag myapp:latest brewingelixir/myapp:0.0.1

A good practice for image tags is using Semantic Versioning (or some combination of semver + Git SHA). This way we keep the image builds unique and immutable. This will get us a more important benefit in K8s as this will help cache images in the nodes which speeds up rollouts and rollbacks.

This will create a new image name and tag. You can check they have the same image's id by running:

docker images | grep myapp

Output:

brewingelixir/myapp  0.0.1      366738717276   17 hours ago    125MBmyapp                latest     68e102c0eb9d   27 hours ago    125MB

You'll notice two images, one named myapp and another one similar to this one: brewingelixir/myapp. Finally, push this image to the DockerHub registry:

docker push brewingelixir/myapp:0.0.1

At this point your app is stored in DockerHub ready to be pulled by your Kubernetes cluster. In more sophisticated deployments you'll need to configure some credentials to instruct K8s on what to use to pull private images from other registries. But that's something we don't need to think about for the local version.

Continue with editing infra/deployment.yaml to update the image value:

...        image: brewingelixir/myapp:0.0.1...

And apply the change with kubectl:

kubectl apply -f infra/deployment.yaml

And let's check again if the pod is running:

kubectl get po

Output:

NAME                                READY   STATUS   RESTARTS      AGEmyapp-deployment-546cbb5f96-9ptjb   0/1     Error    1 (14s ago)   16s

We now have a different error 🤔 Let's inspect the events by executing:

kubectl describe pod myapp-deployment-546cbb5f96-9ptjb

The output should look similar to the following one:

Events:  Type     Reason     Age                From               Message  ----     ------     ----               ----               -------  Normal   Scheduled  87s                default-scheduler  Successfully assigned default/myapp-deployment-546cbb5f96-9ptjb to docker-desktop  Normal   Pulled     43s (x4 over 87s)  kubelet            Container image "brewingelixir/myapp:0.0.1" already present on machine  Normal   Created    43s (x4 over 87s)  kubelet            Created container myapp  Normal   Started    43s (x4 over 87s)  kubelet            Started container myapp  Warning  BackOff    13s (x6 over 83s)  kubelet            Back-off restarting failed container myapp in pod myapp-deployment-546cbb5f96-9ptjb_default(e5f972bc-d5fa-47df-b59f-e6e18ba31520)

It seems like K8s could fetch the docker image from DockerHub but is reporting ack-off restarting failed container myapp. Maybe the container's logs will give some extra insight:

kubectl logs myapp-deployment-546cbb5f96-9ptjb

Output:

ERROR! Config provider Config.Reader failed with:** (RuntimeError) environment variable SECRET_KEY_BASE is missing.You can generate one by calling: mix phx.gen.secret    /app/releases/0.1.0/runtime.exs:31: (file)    (elixir 1.15.7) src/elixir.erl:396: :elixir.eval_external_handler/3    (stdlib 5.1.1) erl_eval.erl:750: :erl_eval.do_apply/7    (stdlib 5.1.1) erl_eval.erl:494: :erl_eval.expr/6    (stdlib 5.1.1) erl_eval.erl:136: :erl_eval.exprs/6    (elixir 1.15.7) src/elixir.erl:375: :elixir.eval_forms/4    (elixir 1.15.7) lib/module/parallel_checker.ex:112: Module.ParallelChecker.verify/1    (elixir 1.15.7) lib/code.ex:543: Code.validated_eval_string/3Runtime terminating during boot ({#{message=><<101,110,118,105,114,111,110,109,101,110,116,32,118,97,114,105,97,98,108,101,32,83,69,67,82,69,84,95,75,69,89,95,66,65,83,69,32,105,115,32,109,105,115,115,105,110,103,46,10,89,111,117,32,99,97,110,32,103,101,110,101,114,97,116,101,32,111,110,101,32,98,121,32,99,97,108,108,105,110,103,58,32,109,105,120,32,112,104,120,46,103,101,110,46,115,101,99,114,101,116,10>>,'__struct__'=>'Elixir.RuntimeError','__exception__'=>true},[{elixir_eval,'__FILE__',1,[{file,"/app/releases/0.1.0/runtime.exs"},{line,31}]},{elixir,eval_external_handler,3,[{file,"src/elixir.erl"},{line,396},{error_info,#{module=>'Elixir.Exception'}}]},{erl_eval,do_apply,7,[{file,"erl_eval.erl"},{line,750}]},{erl_eval,expr,6,[{file,"erl_eval.erl"},{line,494}]},{erl_eval,exprs,6,[{file,"erl_eval.erl"},{line,136}]},{elixir,eval_forms,4,[{file,"src/elixir.erl"},{line,375}]},{'Elixir.Module.ParallelChecker',verify,1,[{file,"lib/module/parallel_checker.ex"},{line,112}]},{'Elixir.Code',validated_eval_string,3,[{Crash dump is being written to: erl_crash.dump...done

Oh! We forgot to set the value of theSECRET_KEY_BASE env var. This can easily be solved by creating a secret resource. But first, create a key using the handy phx.gen.secret mix task.

SECRET_KEY_BASE=$(mix phx.gen.secret)echo -n $SECRET_KEY_BASE | base64

The output of the last command will give you the base 64 representation of the generated value. K8s requires us to do this because any control character can easily create a syntax error in the YAML file.

Next, create a secret descriptor with the name infra/secret.yaml

Important: The names of these files have no relation with the resource they are creating.

apiVersion: v1kind: Secretmetadata:  name: myapp-secretdata:  key_base: c3Z3b3loQWt0MlJ2bytvRlBwMU4zMjhmNlhBVkRzaTE4cmVoZHRaUGlEMWJ1b0w5Y25sQnJwZjhRWnhETm5ScA==

To effectively create the secret in K8s run:

kubectl apply -f infra/secret.yaml

To verify the resource exists execute:

kubectl get secret

Output:

NAME           TYPE     DATA   AGEmyapp-secret   Opaque   1      17h

The last piece of this puzzle is connecting this secret to the deployment so the app knows where to fetch the value from. So edit infra/deployment.yaml to add the env element at the same level as the image:

....    spec:      containers:      - name: myapp        image: brewingelixir/myapp:0.0.1        ports:        - containerPort: 4000        env:        - name: SECRET_KEY_BASE           valueFrom:            secretKeyRef:              name: myapp-secret              key: key_base

You can see how SECRET_KEY_BASE is the name of the env var that will get extracted from myapp-secret secret under the key key_base.

You can save the change and apply by running: kubectl apply -f infra/deployment.yaml

Going back to running kubectl get pods list you can see the status is now Running!

NAME                                READY   STATUS    RESTARTS   AGEmyapp-deployment-78dd444d79-sdmpq   1/1     Running   0          1m

This is great, but how do we access your application? That's a task for a Service. So let's create one and associate it with the deployment.

Create infra/service.yaml and fil it wth:

apiVersion: v1kind: Servicemetadata:  name: myapp-servicespec:  selector:    app: myapp  ports:    - protocol: TCP      port: 4000      targetPort: 4000  type: LoadBalancer

Apply the change:

kubectl apply -f infra/service.yaml

Output:

service/myapp-service created

And check the service's resource state:

kubectl get svc

Output:

NAME            TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGEkubernetes      ClusterIP      10.96.0.1              443/TCP          51mmyapp-service   LoadBalancer   10.110.24.84   localhost     4000:32660/TCP   8m45s

Let's give the app a try by going to http://localhost:4000. And also try the curl command one more time:

curl -X POST http://localhost:4000/api/hello -d 'name=BrewingElixirFromK8s'

Output:

Hello BrewingElixirFromK8s

Awesome! You now have a Phoenix app running on K8s! Next, let's try a few K8s tricks to get a sense of the power of this powerhouse.

Perform a rollout

To get a sense of how fast and easy is to perform a rollout we'll simulate creating a new version to deploy it. To do so edit the HelloController controller and add an /echo route.

lib/myapp_web/router.ex

...  scope "/api", MyappWeb do    pipe_through :api    post "/hello", HelloController, :hello    post "/echo", HelloController, :echo  end...

Then add a new function within the HelloControler to return the params as json.

lib/myapp_web/controllers/hello_controller.ex

defmodule MyappWeb.HelloController do  #Some existing code.....  def echo(conn, params) do    conn |> put_status(200) |> json(params)  endend

After you verify that this works as expected by running mix phx.server and hitting it with curl, you can build and tag the new image to push it to the registry:

docker build -t brewingelixir/myapp:0.0.2 .docker push brewingelixir/myapp:0.0.2

This time we started by defining the tag you'll use to push from the start. This should save you from having to perform the tag step.

Finally, edit the deployment descriptor to update it with the new tag version.

infra/deploymnet.yaml

...     image: brewingelixir/myapp:0.0.2...

and apply the changes to the deployment:

kubectl apply -f infra/deployment.yaml

To watch the deployment or pod replacement in near real time you can run:

kubectl get deploy -w# orkubectl get pod -w

To verify the rollout is complete run:

kubectl rollout status deploy myapp-deployment

Expected output:

deployment "myapp-deployment" successfully rolled out

In case you want to verify the image currently used by the deployment is the same one you have just applied then run:

kubectl describe deploy myapp-deployment

It will provide all the runtime details of the myapp-deployment deployment object. Look for Image to verify the name and tag are the right ones.

"Huston, we have a problem" a.k.a. Rollback ASAP!

Let's imagine this deployment starts causing issues and you need to perform a rollback as fast as possible!

To go back to the immediate previous version just run:

kubectl rollout undo deployment/myapp-deployment

Verify the version is back to 0.0.1 by running:

kubectl describe deploy myapp-deployment | grep Image

Great! That was a really fast rollback. Doesn't that feel great?!

To go back to 0.0.2 just run kubectl apply -f infra/deployment.yaml again or create a new tag with the "fixes" so the app works as expected this time 🫡.

Scale-out our app

K8s has the concept of replicas which are more instances of the same pod running in the cluster. They are normally independent of each other but associated with the service to let this one redistribute the load between them.

$ kubectl scale deployment/myapp-deployment --replicas=10$ kubectl get pokubectl get poNAME                                READY   STATUS    RESTARTS   AGEmyapp-deployment-8674fb85bc-8qcjp   1/1     Running   0          2smyapp-deployment-8674fb85bc-clvgv   1/1     Running   0          2smyapp-deployment-8674fb85bc-fdv5t   1/1     Running   0          2smyapp-deployment-8674fb85bc-h5bmb   1/1     Running   0          3m19smyapp-deployment-8674fb85bc-ptn4c   1/1     Running   0          2smyapp-deployment-8674fb85bc-pzl26   1/1     Running   0          2smyapp-deployment-8674fb85bc-snnpl   1/1     Running   0          2smyapp-deployment-8674fb85bc-tpqnc   1/1     Running   0          2smyapp-deployment-8674fb85bc-v7lrq   1/1     Running   0          2smyapp-deployment-8674fb85bc-xwqvm   1/1     Running   0          2s

Wow, that was fast! You can even go to 0 if that's needed.

kubectl scale deployment/myapp-deployment --replicas=0

Manual scaling is fine if you have a predictable load that won't change significantly over time. In case you need to deal with significant highs and lows over time then you can use a combination of:

Horizontal Pod Autoscaler (HPA): adjusts the number of replicas of an application based on resources and/or custom metrics.
Vertical Pod Autoscaler (VPA): adjusts the resource requests and limits of a container.
Cluster Autoscaler (CA): adjusts the number of nodes in the cluster when pods fail to schedule or when nodes are underutilized.

But that's something for a future post in this series 😉

Note: The code for this post can be found here.

Conclusion and what's next in the series

If you've deployed applications to Kubernetes in the past (e.g. Go, Node, Java, etc) you might be thinking: This looks very similar to what needs to be done to deploy those non-Elixir applications. And you'll be right! The process is pretty much agnostic to the application running inside the container. It will only become slightly different when you aim at running Distributed Erlang applications where you'd need to let the application easily discover other Erlang nodes. Luckily, we can avoid having to deal with that situation as long as our application is fine without it. E.g. Using shared databases, Phoenix PubSub through a Redis cluster, etc.

In summary, with a couple of commands we end up with a scalable deployment for an application that performs safe rollouts and rollbacks in an environment that closely resembles the ones running on cloud providers. Isn't that great?!

Next in this series, we'll explore deploying this same app to the Internet, starting from simple K8s distributions like Rancher's K3s to full-featured distributions like EKS which is AWS managed Kubernetes solution.

CLI apps in Elixir. Part 2

Brewing Elixir — Sun, 26 Nov 2023 20:02:20 GMT

In this post, we'll explore each tool described in Part 1 to see for ourselves the benefits and limitations of each alternative with the hope we'll end up with enough knowledge to decide which one fits best for each use case.

A nice approach to easily compare alternatives is building the same app with each tool. That way you can easily spot the similarities and differences between them.

For the sake of simplicity, you are going to build a simplified version of the wc command. The wc command is short of "word count" and allows counting new lines, words, characters and a few more. But the core features we'll implement are:

Parse command line arguments.
Read from stdin and output to stdout.
Support reading a single file when provided as an argument.
Return the stats for newline, word and grapheme (this is not standard but we'll do it this way because it is nicer with UTF-8 files).

We'll ignore showing comprehensive help, well-formatted error messages, reading multiple files when provided and any other feature defined in its man page.

Now let's get into the code 🧑💻

Business logic

To make things easier to read and understand let's create a POEM (Plain Old Elixir Module(*)). We'll be able to use this module across implementations to focus on the differences.

(\) I've just made this up but took some inspiration from the Java world where classes holding only business logic are called POJOs (Plain Old Java Objects).*

Here's the definition of WC, a module that holds the logic to perform a subset of features offered by wc, specifically it counts graphemes, words and lines.

defmodule WC do  def run(args) do    args    |> parse_options()    |> execute()  end  def parse_options(args) do    OptionParser.parse(args,      aliases: [l: :lines, w: :words, c: :chars],      switches: [chars: :boolean, words: :boolean, lines: :boolean]    )  end  def execute(options) do    {file, opts} =      case options do        {opts, [], _} ->          {:stdio, opts}        {opts, [file | _], _} ->          {file, opts}      end    case read_file(file) do      {:ok, content} ->        content        |> count_content()        |> print_results(file, opts)      {:error, :file_not_found} ->        IO.puts("File not found: #{file}")        System.halt(1)    end  end  @default_opts [lines: true, words: true, chars: true]  def print_results(results, file, []) do    print_results(results, file, @default_opts)  end  def print_results(results, file, opts) do    result =      Enum.reduce(@default_opts, "", fn {key, _}, acc ->        if opts[key] do          acc <> "\t#{results[key]}"        else          acc        end      end)    if file == :stdio do      IO.puts(result <> " " <> "\n")    else      IO.puts(result <> " " <> file <> "\n")    end  end  def count_content(content) do    content    |> String.graphemes()    |> Enum.reduce(%{lines: 0, words: 0, chars: 0}, fn char, acc ->      cond do        char == "\n" ->          %{acc | lines: acc.lines + 1, chars: acc.chars + 1, words: acc.words + 1}        char in [" ", "\t"] ->          %{acc | words: acc.words + 1, chars: acc.chars + 1}        true ->          %{acc | chars: acc.chars + 1}      end    end)  end  def read_file(:stdio) do    {:ok, IO.read(:stdio, :all)}  end  def read_file(file) do    if File.exists?(file) do      File.read(file)    else      {:error, :file_not_found}    end  endend

Here's a summary of its features:

Supports -l to count lines, -w to count words and -c to count graphemes.
The first argument after the options should be a path to an existing file.
When no file is provided it reads from stdin.
When no option is provided it assumes the caller wants all stats (all switches are on).
Returns an error code of 1 when the file doesn't exist and 0 if the execution was successful.

Note: This is a naive implementation that takes some shortcuts to simplify the code for readability while still having some utility when running some examples.

Implementations

For testing purposes let's create a file named sample.txt with the following content:

This is onesimple textfile1234end line is this

From here on we'll focus only on the differences of each alternative. The full code can be found in this Github repo.

Also, will be using the $ character before a shell command to indicate it runs as a non-root user, but most importantly to differentiate a command from its output within the same code block.

Elixir Scripts

# Assume the previous WC module is included here. E.g.# defmodule WC do# ...args = System.argv()WC.run(args)

Let's call this file wc.exs and run a few examples:

Default run

$ elixir wc.exs sample.txt    5    11    51 sample.txt

Use a CLI pipe

$ cat sample.txt | elixir wc.exs    5    11    51 sample.txt

Pass specific parameters

$ elixir wc.exs -l sample.txt    5 sample.txt

Here you can see how the script gets interpreted by the elixir cli app and passes its arguments by taking everything after the wc.exs file. Notice how elixir needs to be installed as well as having the source code to run the app.

Mix Run

Once the project starts requiring more structure and code distribution the defacto standard tool to use is Mix. So let's create an app using mix and reuse wc.exs by promoting to a .ex file. Also copy the sample.txt file within the project only for convenience.

mix new app1cp wc.exs app1/lib/wc.excp sample.txt app1/cd app1

You should edit app1/lib/wc.ex by removing the last two lines and placing them in a new file called run.exs :

args = System.argv()WC.run(args)

Now let's run the application:

$ mix run run.exs sample.txt    5    11    51 sample.txt

Awesome! You can leverage Mix features to easily organize and improve your projects. You still need the source code to run it this way but this is a quick way to run scripts from a Mix project. Let's improve this by using Mix releases.

Mix Releases

Create another mix project like you did before but call it app2 to have a fresh start.

mix new app2cp wc.exs app2/lib/wc.excp sample.txt app2/cd app2

Remove the last 2 lines of wc.ex as before and create a module under lib/cli.ex with the following content:

defmodule CLI do  def run do    args = System.argv()    WC.run(args)  endend

This module will be the starting point for the app.

Next, you need to configure the project. For demo purposes, you'll create a tarball file of the project to be able to distribute it as a single file. So let's edit mix.exs and add:

def project do    [      ...      releases: releases()    ]  end  def releases do    [      app2: [        include_executables_for: [:unix],        applications: [runtime_tools: :permanent],        steps: [:assemble, :tar]      ]    ]  end

To build a release run:

MIX_ENV=prod mix release

We provided MIX_ENV=prod to build a release optimized for production use. If you don't pass the environment variable it will use dev by default.

The app is ready. Let's use eval and pass the Module.Function as the first argument and the rest will be provided to the CLI app as its arguments.

$ _build/prod/rel/app2/bin/app2 eval "CLI.run" -l sample.txt    5 sample.txt

Even stdin will work:

$ cat sample.txt | _build/dev/rel/app2/bin/app2 eval "CLI.run" -lw    5    11

Note: There's no filename in the output because it uses stdin as the source of information to parse.

This is all great and you can find the tarball containing the CLI app in _build/prod/app2-0.1.0.tar.gz. However, the person who will run this in their host still needs to uncompress and untar it (i.e. tar xvzf _build/prod/app2-0.1.0.tar.gz) to use it. In other words it isn't a single executable that you can pass around.

Let's check the next two options to address this final limitation while maintaining all the great features you collected so far.

Escript

Once again create a new project and reuse wc.exs like you did so far:

mix new app3cp wc.exs app3/lib/wc.excp sample.txt app3/cd app3

Note: Remember to remove the last 2 lines used to execute the module's function.

Next, set up the project to use escript and instruct which module should be used to kick off the app. Modify mix.exs to include:

  def project do    [      ...      escript: escript()    ]  end  def escript do    [main_module: CLI]  end

Create a file under lib/cli.ex with:

defmodule CLI do  def main(args) do    # No need to call System.argv() as it is provided by escript    # as an argument to this function    WC.run(args)  endend

To build the project use the escript.build task:

$ MIX_ENV=prod mix escript.buildGenerated app3 appGenerated escript app3 with MIX_ENV=prod

Success! You have a single binary file representing your CLI app. Let's check its type and then test it!

$ file app3app3: a /usr/bin/env escript script executable (binary data)

./app3 sample.txt    5    11    51 sample.txt

Very cool! This single file can be easily distributed as long as the limitations described in Part 1 don't affect your use case. In case some do then prepare your hot sauce because you'll need it for the next tasty solution 🔥🌯.

Burrito

Until now all alternatives were part of the standard Elixir distribution but thanks to the great work of the community and Burrito maintainers we now have a full-featured solution to build and distribute single binary apps for Elixir: https://github.com/burrito-elixir/burrito

Burrito requires Zig to be installed as well as xz so make sure you have them installed:

$ whereis xz$

Let's set up a fresh app and reuse the WC module:

mix new app4cp wc.exs app4/lib/wc.excp sample.txt app4/cd app4

Burrito is an external dependency so you'll need to add it to mix.exs under deps :

  defp deps do    [      {:burrito, github: "burrito-elixir/burrito"}    ]  end

And then fetch the dependency package using mix:

mix deps.get

Now let's set it up in mix.exs

def project do  [    # ... other project configuration    releases: releases()  ]enddef releases do  [    app4: [      steps: [:assemble, &Burrito.wrap/1],      burrito: [        targets: [          macos: [os: :darwin, cpu: :x86_64],          linux: [os: :linux, cpu: :x86_64]        ]      ]    ]  ]end

Sweet! Burrito leverages Mix releases which means you get all their benefits plus the ones from Burrito.

Next You need to define a starting point for the app, so edit mix.exs but this time add the following change to it:

def application do  [    ...    mod: {CLI, []}  ]end

CLI is just a module name so let's create it under lib/cli.ex

defmodule CLI do  use Application  def start(_type, _args) do    args = Burrito.Util.Args.get_arguments()    WC.run(args)    System.halt(0)  endend

To build the artifact let's run:

MIX_ENV=prod mix release

The targets can be found under the burrito_out directory within the current project. Without specifying a target you end up building all of them listed in your mix configuration file.

To test the app run:

$ ./burrito_out/app4_macos -l sample.txt    5 sample.txt

Awesome! Let's check the file's type:

$ file burrito_out/*burrito_out/app4_linux: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.0.0, strippedburrito_out/app4_macos: Mach-O 64-bit executable x86_64

Beautiful! That looks like executables for specific OS and architectures. For more details check out the Preparation and Requirements section of their readme.

🚨 Important: Burrito will install the app based on its mix version. If you perform a change to your code and run mix release without uninstalling the app you'll get the previous version executed not the current one. So make sure you either:

a. Bump the version in mix.exs

b. Uninstall the current version: burrito_out/app4_macos maintenance uninstall

Hope this last tip saves you some time or headaches 😉

Summary

All options are valid and useful but in general, Escript or Burrito solutions are what you want to use when building non-trivial single binaries CLI apps in Elixir. But if in doubt then start with a single .exs file and see how far you can get until you start needing more sophisticated solutions.

This concludes the second part of this series. Hope you enjoyed it and found it useful! 🍺

CLI apps in Elixir. Part 1

Brewing Elixir — Wed, 22 Nov 2023 21:46:24 GMT

From a programmer's perspective, one of the simplest and most flexible ways to interact with a computer is through a terminal. By only using plain text to get input and provide outputs, the program's interface not only becomes easy to reason about but also simple to reuse by other programs. This last feature is one of the main ideas behind the Unix Philosophy, specifically the rules of Modularity and Composition.

These types of programs are called command-line interface applications or CLI apps. They get started from a Shell process (e.g. Bash) through a Terminal (usually a virtual one like iTerm). In the case of Elixir, there are a couple of ways to kick off a CLI app but every one of them ends up creating a Beam process.

To better understand the default interfaces here's a high-level diagram of the CLI app running:

From the diagram we can see how the process (our program being executed) can interact with the operator (the person executing the CLI app) by using the stdin, stdout and stderr. The first one is used to get data into the process by either keyboard typing or another file stream, and the other two (stdout and stderr) outputs data. Here the programmer decides which stream to use for each case which usually end up having the stderr for general errors and stdout for anything else.

Thanks to the Shell we can pipe (connect) a process's stdout to another process's stdin to create a processing pipeline and through composition complete more complex tasks.

The are other ways to interact with the process that range from simple and common like OS signals (e.g. when the Shell gets a ctrl-c it sends a TERM signal to the foreground job) to more sophisticated IPC mechanisms. With the latter, we can interact with other processes in the same host or even remote ones allowing our program to perform more types of tasks.

CLI apps requirements

Now that we have a mental model of what a CLI program running from a Shell looks like we can start thinking about common CLI requirements to control what the process will do.

Get initial parameters: When a program starts it needs initial parameters to decide how to run (or not). These parameters can be provided by CLI command arguments, system environment variables, configuration files or any other mechanism programmed into the app (e.g. pull configuration from a well-known configuration server).
Run in the foreground: CLI tools are normally run by human operators (or indirectly via another script or program using it) and are normally expected to run in the foreground.
Prompt input interactively: Some CLI applications might need to get input interactively as they progress through their tasks. E.g. ask for root password to do sensitive tasks or ask for further configuration options.
Interact with the File System: Depending on the application, having access to the FS is important because large amounts of data are usually faster and more convenient to handle in files than using the standard streams.
Interact with other processes: With the shell or other running processes within the same host or remote ones.

This is not an exhaustive list and each application can require a different set of features to achieve its goals. In the next section we'll explore what tools exist in the Elixir ecosystem, what each option offers and what are their main downsides.

Tools to build CLIs app in Elixir

The main focus of Erlang/Elixir and specifically the Beam is building and running highly concurrent, fault-tolerant distributed systems, not CLI applications. However, that doesn't mean it doesn't offer a good starting point to cover use cases where a CLI app is needed.
In this section we'll go from the default solutions included by Elixir/Erlang to external projects that can be used to build full-featured CLI apps.

Elixir Scripts

The simplest tool to get the job done is Elixir Scripts. These are .exs files that are interpreted by the elixir command. E.g.

elixir my_cli.exs

It doesn't need any project structure (i.e. Mix project) and thanks to the Mix.install/2 addition (since Elixir 1.12) it can install external projects as part of the script execution. For inspiration check out this repo.

The simplicity comes with a cost, that could be acceptable depending on the use case, but in general they impose the following restrictions and limitations:

a. Needs the source code to run

b. It needs elixir to be installed on the host

c. Code organization doesn't scale well

d. Doesn't leverage Mix which is the default build tool for Elixir projects which provides tasks for creating, compiling, and testing Elixir projects, managing its dependencies, and much more.

One good use case for this type of solution is one-off tasks where the person who wrote the code is the same one who would execute it.

Mix Run

To easily enhance exs scripts we can create a Mix project and place the script within the project to let it use the modules defined in the project. From here it is easy to add dependencies, set up supervision trees, organize code in modules, add unit tests and much more.

To run the script from the context we just need to execute:

cd project_namemix run my_cli.exs

my_cli.exs has access to modules defined under lib/ and all dependencies defined in the project.

This alternative has solved downsides c. and d. from the Elixir scripts but it still requires a. and b. . But don't despair, this is something we can address with some of the alternatives to be described.

Mix Releases

A release is a self-contained artifact that contains compiled code for the current project.

From the docs:

Once a release is assembled, it can be packaged and deployed to a target, as long as the target runs on the same operating system (OS) distribution and version as the machine running the mix release command.

This means they don't even require Erlang or Elixir in the running hosts because it includes the Erlang VM and its runtime by default. They don't even require the source code by default which can be convenient for some cases.

This is great! We've eliminated all downsides from Elixir Scripts but there's one limitation to be aware of: releases are optimized to run Elixir/Beam applications, not CLI ones. This means they work great as daemons but have limited support for foreground CLI apps. There are two workarounds to slightly overcome these limitations:

a. Eval a function:

bin/RELEASE_NAME eval "IO.puts(:hello)"

b. Call a remote function:

bin/RELEASE_NAME rpc "IO.puts(:hello)"

In both cases, we can leverage all the benefits from the Beam and the environment where the process is running but stdin, stdout and stderr as well as the command arguments can become a bit challenging to work with. Also, the eval function doesn't start any application within the program by default, and the rpc function requires the release to be running to be able to execute successfully. For more details please check out the docs.

Escript

Similar limitations already existed for CLI apps even before Mix Releases or even Elixir existed! The solution is called Escript and is originally available in Erlang.

Luckily for us we can build escript from Mix projects to create a single, largely self-contained executable! They can run on any machine that has Erlang/OTP installed, and it doesn't require Elixir to be installed by default as Elixir is embedded as part of it. However, it does require Erlang/OTP to be installed on the host.

Setting up and running escript is as easy as configuring mix.exs, defining an entry module with a main/1 function, and then executing the following command to build the artifact:

mix escript.build

From here, a single file will be available to be used as an executable. E.g. Assuming the app is called example_app we type:

./example_app

This looks perfect but it has one downside: it doesn't support projects or dependencies that need to store or read from the priv directory. A well-known library called tzdata is one of those libraries. However, there are workarounds to overcome this limitation but it does leave the feeling we can probably do better.

Burrito

To overcome most if not all limitations, and be able to produce a single binary artifact, there's a fantastic OSS library called Burrito. It lets you wrap your meaty app so you can delight your CLI app users!

From the README.md:

Burrito is our answer to the problem of distributing Elixir CLI applications across varied environments, where we cannot guarantee that the Erlang runtime is installed, and where we lack the permissions to install it ourselves.

Burrito uses Mix releases so we get all their benefits as well as a self-extracting archive. It creates a native binary for macOS, Linux, and Windows (*).

In the next part of this series we'll explore Burrito in depth to build a complete CLI app.

(*) This can be configured and cross compilation depends on the build host.

Honorable mentions

Even though these are not strictly speaking alternatives to build CLI apps they are mentioned here because they are alternatives to achieve some of the goals a CLI app can do. And in some cases they are better alternatives for niche cases. E.g. Mix tasks.

Mix tasks and archive: Even though these are not tools to create general-purpose scripts they are great alternatives when we need to extend mix tooling to improve our development workflow. Tasks can live within the same mix project and be reused by other programmers within the team and can also leverage archiving to install tasks globally.
Livebook: yeah, you heard me right! This is a fantastic environment where you can code and run Elixir scripts, use Kino to display charts, run machine learning models, and organize the code for a human to understand it step by step just to name a few.
Docker: When we want to have full control of the environment where our app will run without requiring changes to the running environment we can always count on Docker. In recent years it has become ubiquitous so any CLI app that can accept the extra delay of running its app via docker can wrap the CLI with it and distribute it as a docker image.

Coming up Next

In Part 2 we'll explore each tool we have described to implement a well-known CLI app to see for ourselves where the benefits and limitations of each alternative are, with the hope you'll end with enough knowledge to decide what tool fits best for your future use cases 🚀

Unlocking the Power of Elixir's Enumerables

Brewing Elixir — Sun, 12 Nov 2023 21:03:26 GMT

Dealing with data structures is at the core of any programming activity and high-level languages like Elixir provide well-structured constructs in the standard library to easily work with them.

In this post, we'll go through how the Enum and Stream modules work with data types like List, Map and Stream through the use of the Enumerable and Collectable protocols to provide a batteries-included system that can also be reused and extended for other data structures.

High-level overview

To understand how all the pieces work together we first need to define which those pieces are:

Enumerables: Technically speaking these are any data type that implements the Enumerable protocol. We can think of them as collections that share a common way of being accessed.
Enum and Stream utility modules: Group functions to interact with enumerables mainly through the Enumerable and Collectable protocols. They have clear tradeoffs that lead to module separation.
List, Map, Stream data structures: These are the modules defining the data types and the specific functions to work with them.

From here we can organize these abstractions by separating which modules use the enumerable via protocol functions and which types implement the protocol.

Here we can see in the diagram how Stream and Enum utility functions don't access the types (List, Map, Function, etc) directly when dealing with them. This separation helps achieve two key extensibility benefits:

Utility functions can be reused by any Enumerable: Which means they don't need to know more about the data type than what the protocol requires.
Any data type can implement the Enumerable protocol to be reusable by the Utility functions.

In general, protocols were designed to achieve this separation and reusability. That means the Collectable protocol, which deals with traversing the data structure, also has similar separations and benefits.

But protocols don't need to live in isolation from each other. Enumerable and Collectable relation is well explained in the docs. Here's an extract of the core parts:

The Enumerable protocol is useful to take values out of a collection. To support a wide range of values, the functions provided by the Enumerable protocol do not keep shape. It was designed to support infinite collections, resources and other structures with fixed shape.
The Collectable module was designed to fill the gap left by the Enumerable protocol. If the functions in Enumerable are about taking values out, then a Collectable is about collecting those values into a structure.

To learn more about which modules implement the Enumerable protocol we can run iex -S mix from our mix project and the:

iex> Enumerable.__protocol__(:impls){:consolidated, [Date.Range, File.Stream, Function, GenEvent.Stream, HashDict, HashSet,  IO.Stream, List, Map, MapSet, Range, Stream]}iex> Collectable.__protocol__(:impls){:consolidated, [BitString, File.Stream, HashDict, HashSet, IO.Stream, List, Map, MapSet,  Mix.Shell]}

Now that we have a general understanding of the organization of enumerables we can continue with the main modules used to interact with them (besides their own module functions).

Enum and Stream

Elixir defines these two modules with functions to work with enumerables and collectables interchangeably most of the time. The key difference lies in the way functions return results.

Enum: focuses on eager operations. This means most functions included in this module will process the collection and return the final result right away.
Stream: operations are lazy, allowing processing functions to get chained together to process each element as needed.

Enum: Eager operations

To better understand what eagerness implies here's a simple example with calls to Enum functions linked together with pipe operators.

result = 1..100_000       |> Enum.map(fn item -> item * 10 end)       |> Enum.filter(fn item -> item > 10 end)       |> Enum.map(fn item -> (item + 3) / 2 end)       |> Enum.reduce(fn item, acc -> acc + item end)       |> dbg()

Each function operates on the result of the previous call which holds the final computed result for the intermedia operation.

By appending dbg/0 at the end of the pipe we can see how these intermediate lists are created.

1..100_000 #=> 1..100000|> Enum.map(fn item -> item * 10 end) #=> [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, 500, ...]|> Enum.filter(fn item -> item > 10 end) #=> [20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, 500, 510, ...]|> Enum.map(fn item -> (item + 3) / 2 end) #=> [11.5, 16.5, 21.5, 26.5, 31.5, 36.5, 41.5, 46.5, 51.5, 56.5, 61.5, 66.5, 71.5, 76.5, 81.5, 86.5, 91.5, 96.5, 101.5, 106.5, 111.5, 116.5, 121.5, 126.5, 131.5, 136.5, 141.5, 146.5, 151.5, 156.5, 161.5, 166.5, 171.5, 176.5, 181.5, 186.5, 191.5, 196.5, 201.5, 206.5, 211.5, 216.5, 221.5, 226.5, 231.5, 236.5, 241.5, 246.5, 251.5, 256.5, ...]|> Enum.reduce(fn item, acc -> acc + item end) #=> 25000399993.5

Now imagine having to deal with collections of millions of records. From there is easy to imagine how working with large lists and multiple eager calls can lead to high memory usage. And even more on multitenant systems.

A good rule of thumb when working with Enum is: Use it by default unless you know you'll deal with very large collections or memory consumption gets affected by the way long pipelines transform data. In that case, profile your application and evaluate how using lazy operations via the Stream module functions behave.

For a complete list of Enum functions please check the cheatshet and the module's docs.

Stream: lazy operations

Stream module functions are lazy and exist to solve some of the problems Enum creates due to its eager nature. Besides that it also provides other features not available in Enum like infinite collections.

To see the difference in action we'll take the previous Enumexample and rewrite it using Stream functions:

result = 1..100_000       |> Stream.map(fn item -> item * 10 end)       |> Stream.filter(fn item -> item > 10 end)       |> Stream.map(fn item -> (item + 3) / 2 end)       |> Enum.reduce(fn item, acc -> acc + item end)       |> dbg()

The final function needs to be an eager one (From the Enum module or Stream.run/1) to execute the stream.

The output of each pipe operation can be visualized here:

1..100_000 #=> 1..100000|> Stream.map(fn item -> item * 10 end) #=> #Stream<[enum: 1..100000, funs: [#Function<48.53678557/1 in Stream.map/2>]]>|> Stream.filter(fn item -> item > 10 end) #=> #Stream<[  enum: 1..100000,  funs: [#Function<48.53678557/1 in Stream.map/2>,   #Function<40.53678557/1 in Stream.filter/2>]]>|> Stream.map(fn item -> (item + 3) / 2 end) #=> #Stream<[  enum: 1..100000,  funs: [#Function<48.53678557/1 in Stream.map/2>,   #Function<40.53678557/1 in Stream.filter/2>,   #Function<48.53678557/1 in Stream.map/2>]]>|> Enum.reduce(fn item, acc -> acc + item end) #=> 25000399993.525000399993.5

We can see how there aren't any intermediate collection results on each pipe operation but we still get the same result as before. You probably noticed what is going on already so there's no need to explain that streams are like chainable functions that get executed (in order) for each element of the original enumerable 😉.

That's all very cool but Stream shines when it comes to:

Running a function concurrently on each element in an enumerable: By using Task.async_stream/2, Task.Supervisor.async_stream/6 or Task.Supervisor.async_stream_nolink/6 depending on the application requirements.
Need to emit a sequence of values from a resource: By using Stream.iterate/2, Stream.resource/3 , Stream.unfold/2 and other we can compute or get values to create our stream.

For a complete list of functions please check the module's docs.

Use cases

When it comes to choosing when to use Enum vs Streams a good rule of thumb is start using Enum by default but evaluate Stream when collections are large and pipelines are long. Nevertheless, there are also cases where Stream is the best initial choice and we'll see 3 cases where they mostly are.

Case 1: File processing

A very common use case for Streams involves: reading a file, doing some processing per line and finally writing the results to another one.

orig_file = "/path/to/file"dest_file = "/path/to/other/file"File.stream!(orig_file)|> Stream.map(&String.replace(&1, "#", "%"))|> Stream.into(File.stream!(dest_file))|> Stream.run()

We can use this template to process log files, jsonsd, csv, tsv an any other line-oriented file. File.stream!/3 also accepts modes to instruct the stream to uncompress or compress the stream which comes in very handy to deal with even larger files at the cost of CPU cycles.

Case 2: Processing enumerables concurrently

Sometimes our application can leverage concurrency by splitting operations and running them potentially in parallel. With streams, we can collect initial parameters into an enumerable and pass them to Task.async_stream/3 to let it process them.

["resource1", "resource2", "resource3"]|> Task.async_stream(fn item ->  fetch(item)end)|> Enum.to_list()

Here it will process 3 resources and call fetch for each of them concurrently. The default max equals the number of online schedulers. Most of the time this can map 1:1 with the number of cores or virtual CPUs the hardware or VM has. Assuming this is running on a 2vCPU VM only 2 of them will run concurrently and the third will wait for its turn when one of the two running resources is complete. By default, it timeouts after 5 seconds and when that happens the process that spawned the tasks exits.

The beauty of this function lies in its simplicity and configurability where we can control the max concurrency, timeouts, processing order and what to do during timeouts. E.g.

1..100_000|> Task.async_stream(fn item ->  process(item)end,  ordered: false,  max_concurrenty: 1000,  timeout: 60_000,  on_timeout: :kill_task)|> Stream.reject(fn   {:exit, _} -> false  _ -> true  end)|> Enum.to_list()

The only difference with the original example is how the resulting list is wrapped to accommodate the :kill_task option. For this example, we filter out any non-error simulating caring only about the side effect but knowing how many errors happened. Depending on your needs you can adapt how to process them through Stream or Enum.

Case 3: Remote Resource as a stream

Sometimes we can find resources that can be easily abstracted as streams to allow callers to emit values as needed. For instance, here's a sample where we abstract a particular resource that offers simple sequential access.

Stream.resource(  fn -> %{url: "http://example.com/some/resource", index: 0} end,  fn %{url: url, page: page} = resource ->    case fetch(url, page) do      {:ok, %{next_index: next_index} = result} ->          {[result], %{ resource | index: next_index }}      _ -> {:halt, resource}    end  end,  fn _ -> :ok end)

In real cases the index will take the form of a cursor but the idea is the same: The resource can now be accessed as a stream to leverage every function that handles them.

Conclusion

The true power of Elixir enumerables comes from the combination of reusable utility modules, well-defined protocols and existing data types ready to be used. Armed with these tools you can take on most tasks easily and when the abstractions are not enough you can easily extend or build on top of them to suit your needs.

To conclude with this post here are some general recommendations to keep in mind when working with these abstractions.

Use Enum by default: When in doubt start with this module and move to Streams as necessary.
Go for Stream for large data sets or resources that can be abstracted to emit values and will benefit from being treated as a stream.
If still unsure and don't want to throw money at the problem then profile to understand where is/are potential bottlenecks. Finally, create alternative solutions and benchmark them.
Anti-pattern:
- Using Streams from the start: Stream is not a silver bullet. Start simple unless you know the use case works better in general with Streams.
- Use Streams for everything to prevent scaling issues: This is a form of early optimization that could also cause the opposite in some cases.
- Reinvent the wheel by ignoring Enumerable, Collectable, Stream, Enum and others: For simple solutions is fine to do so but if you find yourself reimplementing some of these functions for your data types then start thinking about how to implement Enumerable and Collectable to give superpowers to your key data structures.

I hope you liked this post and hope you subscribe to my newsletter 💌

Brewing Elixir

Intro to Elixir Applications on Kubernetes

Requirements

Docker + Kubernetes + Elixir + Phoenix = 🚀

Run a simple Phoenix app on K8s

Creating the app

Build a container image

Deploy to Kubernetes

Perform a rollout

"Huston, we have a problem" a.k.a. Rollback ASAP!

Scale-out our app

Conclusion and what's next in the series

CLI apps in Elixir. Part 2

Business logic

Implementations

Elixir Scripts

Default run

Mix Run

Mix Releases

Escript

Burrito

Summary

CLI apps in Elixir. Part 1

CLI apps requirements

Tools to build CLIs app in Elixir

Elixir Scripts

Mix Run

Mix Releases

Escript

Burrito

Honorable mentions

Coming up Next

Unlocking the Power of Elixir's Enumerables

High-level overview

Enum and Stream

Enum: Eager operations

Stream: lazy operations

Use cases

Case 1: File processing

Case 2: Processing enumerables concurrently

Case 3: Remote Resource as a stream

Conclusion