The Erlangelist

Sequences

Mon, 14 Dec 20 00:00:00 +0000

Sequences

2020-12-14

It was the first day of my first Erlang-based system in production. I’ve invested some sensible amount of time to test and stabilize it. I’s were dotted, t’s were crossed and I felt confident that it would work reasonably well. The system broke in production within the first few hours of its life.

The breakage was caused by the excessive usage of the ++ operator. I was iteratively building a large list by appending new chunks to its end, which is extremely inefficient in Erlang. Why did I do it then? Because I didn’t know better :-) I incorrectly assumed, based on my experience with other languages, that in the expression list_a ++ list_b only list_b is being iterated (which would have been fine).

This incident took me down the path of prepends, right folds, and reverses, and taught me that dealing with sequences in Erlang is very different compared to what I’ve seen so far. After a couple of other issues I realized that choosing a data structure to represent a sequence is highly context-sensitive in Erlang, much more than in many other languages I’ve seen until that point. For better or worse (probably worse, more on that later), in BEAM languages we’re lacking a solid all-rounder, a structure which might be a sensible default choice for most cases.

For the purpose of this article, the term sequence means an ordered collection of things. For example, a list [:foo, :bar, :baz] is a sequence because the structure preserves the information that foo comes before bar, which in turn comes before baz. On the other hand, a MapSet containing those same elements will not preserve that order (in fact, the internal order in this particular case will be bar, baz, foo).

Sequences can be used in a bunch of different ways. For example sometimes we might need to walk it from start to end, in the given order. Other times we might want to work on elements based on their position (e.g. get the 3rd element, or replace the fifth one). This is known as a random-access operation, and it’s going to play an important role in choosing an optimal data structure to represent our sequence.

In this post I’ll go over a couple (but not all) of data structures that can be used to model a sequence. In particular we’ll take a look at lists, arrays, maps, and tuples, discuss their trade-offs, and compare their performances through a set of simple and fairly naive benchmarks. The results we’ll obtain won’t exactly be scientific-paper grade, but we should get some basic intuition about which structure works best in different scenarios :-)

Lists

A frequent choice for representing a sequence, lists deceptively resemble arrays from many other languages. However, lists are nothing like arrays, and if you treat them as such you might end up in problems. Think of lists as singly linked lists, and the trade-offs should become clearer.

Prepending an element to the list is very fast and creates zero garbage, which is the reason why lists are the fastest option for iteratively building a sequence the size of which is not known upfront. The same holds for reading and popping the list head (the first element). Getting the next element is a matter of a single pointer dereference, so walking the entire list (or some prefix of it) is also very efficient. Finally, it’s worth noting that lists receive some syntactic love from the language, most notably with first class support for pattern matching, which often leads to a nice expressive code.

These is pretty much the complete list of things lists are good at. They basically suck at everything else :-) Random-access read (fetching the n-th element) boils down to a sequential scan. Random-access writes will additionally have to rebuild the entire prefix up to and including the element which is changed. The length function (and consequently also Enum.count) will walk the entire list to count the number of elements.

Consequently, lists are not a general purpose sequence data structure, and treating them as such may get you into trouble. To be clear, doing an occasional iterative operation, e.g. Enum.at, length, or even List.replace_at doesn’t necessarily have to be a problem, especially if the list is small. On the other hand, performing frequent random-access operations against a larger list inside a tight loop is probably best avoided.

Arrays

Somewhat less well-known, the :array module from the Erlang’s stdlib offers fast random-access operations, and can also be used to handle sparse arrays. The memory overhead of an array is also significantly smaller compared to list. Arrays are the only data structure presented here which is completely implemented in Erlang code. Internally, arrays are represented as a tree of small tuples. As we’ll see from the benchmarks, small tuples have excellent random-access read & write performance. Relying on them allows :array to offer pretty good all-round performance. I wonder if the results would be even better if :array was implemented natively.

Maps

A sequence can be represented with a map where keys are element indices. Here’s a basic sketch:

# initialization
seq = %{}

# append
seq = Map.put(seq, map_size(seq), value)

# random-access read
Map.fetch!(seq, index)

# random-access write
Map.put(seq, index, new_value)

# sequential walk
Enum.each(
  0..(map_size(seq) - 1),
  fn index -> do_something_with(Map.fetch!(seq, index)) end
)

At first glance using a general-purpose k-v might seem hacky, but in my experience it can work quite well for moderately sized sequences. Map-powered sequences are my frequent choice if random-access operations, especially reads, are called for, and I’ve had good experiences with them, not only for basic one-dimensional sequences, but also for matrices (e.g. by using {x, y} for keys) and sparse arrays.

On the flip side, maps will suck where lists excel. Building a sequence incrementally is much slower. The same holds for sequential traversal through the entire sequence. However, maps will suck at these things much less than lists suck at random access. In addition, building a sequence and walking it are frequently one-off operations, while random access is more often performed inside a loop. Therefore, maps may provide a better overall performance, but only if you need random access. Otherwise, it’s probably better to stick with lists.

It’s also worth mentioning that maps will introduce a significantly higher memory overhead (about 3x more than arrays)

I personally consider maps to be an alternative to arrays. More often than not I start with maps for a couple of reasons:

Maps are slightly faster at reading from “medium sized” sequences (around 10k elements).
They can elegantly handle a wider range of scenarios (e.g. matrices, negative indices, prepends).
The interface is more “Elixiry”, while :array functions (like many other Erlang functions) take the subject (array) as the last argument.

Tuples

Tuples are typically used to throw a couple of values together, e.g. in Erlang records, keywords/proplists, or ok/error tuples. However, they can also be an interesting choice to handle random-access reads from a constant sequence, i.e. a sequence that, once built, doesn’t change. Here’s how we can implement a tuple-based sequence:

# We're building the complete list (which is fast), and then convert it into
# a tuple with a single fast call.
seq = build_list() |> List.to_tuple()

# random-access read
elem(seq, index)

# iteration
Enum.each(
  0..(tuple_size(seq) - 1),
  fn index -> do_something_with(elem(seq, index)) end
)

As we’ll see from the benchmark, random-access read from a tuple is a very fast operation. Moreover, compared to other structures, the memory overhead of tuples is going to be much smaller (about 20% less than arrays, 2x less than lists, and 3.7x less than maps). On the flip side, modifying a single element copies the entire tuple, and will therefore be pretty slow, except for very small tuples. Finally, It’s also worth mentioning that maximum tuple size is 16,777,215 elements, so tuples won’t work for unbounded collections.

Benchmarking

We’ll compare the performance of these different data structures in the following scenarios: iterative building, sequential walk, random-access reads and writes. The benching code can be found here. The results have been obtained on a i7-8565U CPU.

Before we start analyzing the results, I want to make a couple of important points. First, these benches are not very detailed, so don’t consider them as some ultimate proof of anything.

Moreover, bear in mind that data structure is only a part of the story. Often the bulk of the processing will take place outside of the generic data structure code. For example while iterating a sequence the processing time of each element will likely dominate over the iteration time, so a switch to a more efficient data structure might not lead to any relevant improvements in the grand scheme of things.

Sometimes a problem-specific optimization can lead to much more drastic improvements. For example, suppose the program is doing 1M random-access operations. If, taking the specific problem into account, we can change the algorithm to reduce that to e.g. 20 operations, we could get radical improvements, to the point where the choice of the data structure isn’t relevant anymore.

Therefore, treat these results carefully, and always measure in the context of the actual problem. Just because A is 1000x faster than B, doesn’t mean that replacing B with A will give you any significant performance gains.

Building a sequence

Let’s first compare the performance of building a sequence incrementally. Here are the results:

The measurements are performed on various sizes: 1, 2, 3, …, 10, 20, 30, …, 100, …, 1M. For the sake of better identification of each measurement, a few points on each line are highlighted (circles, squares, …).

The results demonstrate that lists are the fastest option for dynamically building a sequence. The 2nd- and the 3rd- best option also owe their results to lists. In both of these cases we first build a list, and then convert it to a target structure with List.to_tuple and :array.from_list respectively. List.to_tuple is particularly fast since it’s implemented natively. It takes a few milliseconds to transform a list of million elements into a tuple on my machine.

Gradually growing arrays or maps is going to be slower, with maps being particularly bad, taking almost a second to build a sequence of 1M elements. However, I’m usually not too worried about it. In the lower-to-medium range (up to about 10k elements) growing a map will be reasonably fast. When it comes to larger sequences, I personally try to avoid them if possible. If I need to deal with hundreds of thousand or millions of “things”, I usually look for other options, such as streaming or chunking, to keep the memory usage stable. Another alternative are ETS tables, which might work faster in such cases since they are non-persistent and reside off-heap (I discussed this a while ago in this post).

Memory usage

It’s also worth checking the memory usage of each structure. I used :erts_debug.size on a sequence of 100k :ok elements, and got the following results:

tuple: 100k words
array: 123k words
list: 200k words
map: 377k words

Unsurprisingly, tuples have the smallest footprint, with arrays adding some 20% extra overhead. A list will require one additional word per each item, and finally a map will take up quite a lot of extra space.

Walking a collection

Next up let’s see how long it takes to sequentially walk through the entire sequence. Note that tuples and maps don’t support ordered traversal, so we have to resort to random-access reads (get the 1st element, then the 2nd, …).

This test uses a plain tail recursion to sum all the elements in the sequence. The times therefore include more than just the iteration, but the overhead should be small enough not to affect the results:

Here we get the same ranking, with lists coming on top. Coupled with the previous benchmark this demonstrates the strengths of lists. You’re gonna have a hard time finding another structure which is as fast as lists at incremental growth and sequential iteration for sequences of an unknown size. If that is your nail, lists are probably your best hammer.

Tuples come very close, but growing them dynamically will still require building a list. That said, it’s worth noting that walking a tuple-powered sequence may even beat lists in some circumstances. The thing is that you can scan the tuple equally fast in both directions (front to end, or end to front). On the other hand, walking the list in the reverse order will require a body recursion which will add a bit of extra overhead, just enough to be slower than a tail-recursive tuple iteration.

Arrays also show pretty good performance owing to their first-class support for iterations through foldl and foldr. Both perform equally well, so an array can also be efficiently traversed in both directions. On the flip side, both functions are eager, and there’s no support for lazy iteration. In such cases you’ll either have to use positional reads or otherwise resort to throwing a result from inside the fold for early exit.

Maps are significantly slower than the rest of the pack, but not necessarily terrible in the medium range, which we can see if we plot the same data using logarithmic scale with base 10:

Up to 10k elements, a full sequential map traversal will run in less than one millisecond. This is still slower than other data structures, but it might suffice in many cases. It’s also worth noting that maps can be iterated much faster using :maps.iterator and :maps.next. Such iteration is as fast as array iteration, but it’s not ordered, and is therefore not included in this benchmark.

Random-access reads

Let’s turn our attention to random-access reads. This benchmark measures the time it takes to read every element through a positional read. Here are the results:

Note that y-axis value represents the time it takes to read the entire sequence, not just one element, which is for tuples and maps effectively the same as the sequential walk benchmark.

Also note that this graph uses log-10 scale for both axes, which allows us to better see the results for smaller sequences. This scale affects the shape of the curves. Note how the green line starts to ascend faster in the hundreds area. On a linear scale this would look like a standard U-shaped curve skyrocketing near the left edge of the graph.

This benchmark confirms that tuples are the fastest option for positional reads. Given how well they did in the previous two tests, they turn out to be quite a compelling option. This will change in the final test, but it’s worth noting that as long as you don’t need writes, tuples can give you fast random-access reads and sequential scans (both ways), plus they have the smallest memory footprint of the pack (though to build a tuple you’ll need to create a list first).

The 2nd and the 3rd place are occupied by maps and arrays. In the low-to-mid range maps are somewhat faster, while arrays take over in the lower 100k area. In throughput terms, you can expect about few million reads/sec from arrays/maps, and a few dozen million reads/sec from tuples.

The results for lists are a great example of why we should think about performance contextually. For larger sequences lists are terrible at random access. However, for very small collections (10 elements), the difference is less striking and absolute times might often be acceptable. Of course, if you need to do a lot of such reads, other options will still work better, but it’s still possible that lists might suffice. For example, keyword lists in Elixir are lists, and they work just fine in practice.

Random-access writes

Finally, let’s see how these structures perform at random-access write operations:

Just like with reads, this graph uses log-10 scale for both axes, with y-axis representing the time it takes to write to every element of the sequence.

On the tiny end (10 elements), tuples are the fastest. This is probably why they are the prevalent option in Erlang for bundling small fixed-size unrelated data (e.g. Erlang records are tuples). I’ve had a few situations where replacing a small map with a tuple (e.g. %{x: x, y: y} with {x, y}) brought significant improvements. That said, the difference usually doesn’t matter, so when I model structured data I still start with maps, using tuples only in exceptional situation where maximum speed is needed.

As soon as the sequence becomes slightly larger, the performance of tuples degrades significantly, while arrays and maps become much better choices, with arrays being consistently better. If frequent random-access writes on medium or larger sequences are what you need, arrays are likely the best option.

Finally it’s worth noting that for very small sequences (10 elements or less), lists will be reasonably fast at positional writes. Their performance might even suffice for a 100-element sequence, taking less than 1 microsecond for a single random-access write. Past that point you’ll probably be better off using something else.

Other operations

I conveniently skipped some other operations such as inserts, deletes, joins, splits. Generally, these will amount to O(n) operations for all of the mentioned structures. If such actions are performed infrequently, or taken on a small collection, the performance of the presented structures might suffice. Otherwise, you’ll need to look for something else.

For example, if you need to frequently insert items in the middle of a sequence, gb_trees could be a good choice. Implementing a priority queue can be as easy as using {priority, System.unique_integer([:monotonic])} for keys.

If you need a FIFO queue consider using :queue, which will give you amortized constant time for prepend, append, and pop operations.

Sometimes you’ll need to resort to hand-crafted data structures which are tailor-made to efficiently solve the particular problem you’re dealing with. For example, suppose we’re receiving chunks of items, where some chunks need to be prepended and others need to be appended. We could use the following approach:

# append seq2 to seq1
[seq1, seq2]

# prepend seq2 to seq1
[seq2, seq1]

If leaf elements can be lists, you can use tuple to combine two sequences ({seq1, seq2}). In either case this will be a constant time operation which requires no reshuffling of the structure internals. The final structure will be a tree that can be traversed with a body recursion. This should be roughly as fast as walking a flat list. If the input elements are strings which must be joined into a single big string, you can reach for iolist (a deeply nested list of binaries or bytes). See this post by Nathan Long for more details.

Summary

As you can see, the list of options is pretty big, and there’s no clear sensible default choice that fits all the cases. I understand that this can be frustrating, so let me try to provide some basic guidelines:

If you don’t need random-access operations, use lists.
For frequent random-access operations inside a loop, consider maps or arrays.
If the sequence size is fixed and very small, tuples could be the best option.
Tuples might also be a good choice if you’re only doing random-access reads.

Also keep in mind that performance of the structure often won’t matter, for example if you’re dealing with a small sequence and a small amount of operations. Aim for the reading experience first, and optimize the code only if needed. Finally, see if you can replace random-access operations with a chain of forward-only transformations, in which case lists will work well.

It would be nice if we could somehow simplify this decision process, perhaps with a good all-round data structure that wouldn’t necessary be the best fit for everything, but would be good enough at most things. One potential candidate could be Relaxed Radix Balanced Trees (aka RRB-Trees), data structure behind Clojure’s vectors. You can read more about RRB-Trees in this paper and this blog series. With fast times for operations such as append, random-access, join and split, RRB-Trees looks very interesting. Unfortunately, I’m not aware of an existing implementation for BEAM languages.

No data structure can perfectly fit all the scenarios, so I don’t expect RRB-Trees to magically eliminate the need for lists, arrays, or maps. We will still need to use different structures in different scenarios, considering their strengths and weaknesses. That said, I think that RRB-Trees could potentially simplify the initial choice of the sequence type in many cases, reducing the chance of beginner mistakes like the one mentioned at the start of the article.

Copyright 2020, Saša Jurić. This article is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
The article was first published on The Erlangelist site.
The source of the article can be found here.

Operating systems via development

Wed, 24 Jun 20 00:00:00 +0000

Operating systems via development

2020-06-24

About two years ago I decided to add HTTPS support to this site, using automatic certification via Let’s Encrypt. All the articles on the subject relied on a tool called certbot. A couple of variations were mentioned, some requiring the tool to run while the site is down, others using nginx + certbot combination. It seemed that installing and running some additional external tool(s) in production was mandatory.

At that point The Erlangelist was a standalone Elixir-powered system which required no external program. It seemed that now I have to start worrying about setting up additional services and interact with them using their custom DSLs. This would complicate operations, and create a disconnect between production and development. Any changes to the certification configuration would need to be tested directly in production, or alternatively I’d have to setup a staging server. Either way, testing of certification would be done manually.

Unhappy with this state I started the work on site_encrypt, a library which takes a different approach to automatic certification:

site_encrypt is a library dependency, not an external tool. You’re not required to install any OS-level package to use it.
The certification process and periodical renewal are running in the same OS process as the rest of the system. No other OS processes need to be started.
Everything is configured in the same project where the system is implemented.
Interaction with site_encrypt is done via Elixir functions and data. No yaml, ini, json, or other kind of DSL is required.
It’s trivial to run the certification locally, which reduces the differences between prod and local dev.
The support for automatic testing of the certification is provided. There’s no need to setup staging machines, or make changes directly on the production system.

This is an example of what I call “integrated operations”. Instead of being spread across a bunch of yamls, inis, jsons, and bash scripts, somehow all glued together at the OS-level, most of the operations is done in development, i.e. the same place where the rest of the system is implemented, using the same language. Such approach significantly reduces the technical complexity of the system. The Erlangelist is mostly implemented in Elixir, with only a few administrative tasks, such as installation of OS packages, users creation, port forwarding rules, and similar provisioning tasks being done outside of Elixir.

This also simplifies local development. The instructions to start the system locally are very simple:

Install build tools (Elixir, Erlang, nodejs)
Fetch dependencies
Invoke a single command to start the system

The locally started system will be extremely close to the production version. There is almost nothing of significance running on production which is not running locally. The only two differences of note I can think of are:

Ports 80/443 are forwarded in prod
The prod version uses Lets Encrypt for certification, while the local version uses a local CA server (more on this later).

Now, this may not sound like much for a simple blog host, but behind the scene The Erlangelist is a bit more than a simple request responder:

The Erlangelist system runs two separate web servers. The public facing server is the one you use to read this article. Another internal server uses the Phoenix Live Dashboard to expose some metrics.
A small hand-made database is running which collects, aggregates, and persists the reading stats, periodically removing older stats from the disk.
The system periodically renews the certificate.
Locally and on CI, another web server which acts as a local certificate authority (CA) is running.

In other words, The Erlangelist is more than just a blog, a site, a server, or an app. It’s a system consisting of multiple activities which collectively work together to support the full end-user service, as well as the operational aspects of the system. All of these activities are running concurrently. They don’t block each other, or crash each other. The system utilizes all CPU cores of its host machine. For more details on how this works take a look at my talk The soul of Erlang and Elixir.

Let’s take a closer look at site_encrypt.

Certification

Let’s Encrypt supports automatic certification via the ACME (Automatic Certificate Management Environment) protocol. This protocol describes the conversation between the client, which is a system wanting to obtain the certificate for some domain, and the server, which is the certificate authority (CA) that can create such certificate. In ACME conversation, our system asks the CA to provide the certificate for some domain, and the CA asks us to prove that we’re the owners of that domain. The CA gives us some random bytes, and then makes a request at our domain, expecting to get those same bytes in return. This is also called a challenge. If we successfully respond to the challenge, the CA will create the certificate for us. The real story is of course more involved, but this simplified version hopefully gives you the basic idea.

This conversation is an activity of the system. It’s a job which needs to be occasionally done to allow the system to provide the full service. If we don’t do the certification, we don’t have a valid certificate, and most people won’t use the site. Likewise, if I decide to shut the site down, the certification serves no purpose anymore.

In such situations my preferred approach is to run this activity together with the rest of the system. The less fragmented the system is, the easier it is to manage. Running some part of the system externally is fine if there are stronger reasons, but I don’t see such reasons in this simple scenario.

site_encrypt makes this task straightforward. Add a library dep, fill in some blanks, and you’re good to go. The certification configuration is provided by defining the certification function:

def certification do
  SiteEncrypt.configure(
    client: :native,
    domains: ["mysite.com", "www.mysite.com"],
    emails: ["contact@mysite.com", "another_contact@mysite.com"],
    db_folder: "/folder/where/site_encrypt/stores/files",
    directory_url: directory_url(),
  )
end

This code looks pretty declarative, but it is executable code, not just a collection of facts. And that means that we have a lot of flexibility to shape the configuration data however we want. For example, if we want to make the certification parameters configurable by the system operator, say via a yaml file, nothing stops us from invoking load_configuration_from_yaml() instead of hardcoding the data. Say we want to make only some parameters configurable (e.g. domains and email), while leaving the rest hardcoded. We can simply do Keyword.merge(load_some_params_from_yaml(), hardcoded_data). Supporting other kinds of config sources, like etcd or a database, is equally straightforward. You can always build declarative on top of imperative, while the opposite will require some imagination and trickery, such as running external configuration generators, and good luck managing that in production :-)

It’s also worth mentioning that site_encrypt internally ships with two lower-level modules, a sort of plumbing to this porcelain. There is a mid-level module which provides workflow-related operations, such as “create an account”, or “perform the certification”, and a lower-level module which provides basic ACME client operations. These modules can be used when you want a finer grained control over the certification process.

Reducing the dev-production mismatch

There’s one interesting thing happening in the configuration presented earlier:

def certification do
  SiteEncrypt.configure(
    # ...
    directory_url: directory_url(),
  )
end

The directory_url property defines the CA where site_encrypt will obtain the certificate. Instead of hardcoding this url, we’re invoking a function to compute it. This happens because we need to use different urls for production vs staging vs local development. Let’s take a look:

defp directory_url do
  case System.get_env("MODE", "local") do
    "production" -> "https://acme-v02.api.letsencrypt.org/directory"
    "staging" -> "https://acme-staging-v02.api.letsencrypt.org/directory"
    "local" -> {:internal, port: 4002}
  end
end

Here, we’re distinguishing production from staging from development based on the MODE OS env (easily replaceable with other source, owing to programmable API). If the env is not provided, we’ll assume that the system running locally.

On a production machine, we go to the real CA, while for staging we’ll use Let’s Encrypt staging site. But what about the {:internal, port: 4002} thing which we use in local development? If we pass this particular shape of data to site_encrypt, an internal ACME server will be started on the given port, a sort of a local mock of Let’s Encrypt. This server is running inside the same same OS process as the rest of the system.

So locally, site_encrypt will start a mock of Let’s Encrypt, and it will use that mock to obtain the certificate. In other words, locally the system will certify itself. Here’s an example of this in action on a local version of The Erlangelist:

$ iex -S mix phx.server

[info]  Running Erlangelist.Web.Blog.Endpoint at 0.0.0.0:20080 (http)
[info]  Running Erlangelist.Web.Blog.Endpoint at 0.0.0.0:20443 (https)
[info]  Running local ACME server at port 20081
[info]  Creating new ACME account for domain theerlangelist.com
[info]  Ordering a new certificate for domain theerlangelist.com
[info]  New certificate for domain theerlangelist.com obtained
[info]  Certificate successfully obtained!

Testability

Since local Erlangelist behaves exactly as the real one, we can test more of the system behaviour. For example, even on the local version HTTP requests are redirected to HTTPS. Here’s a test verifying this:

test "http requests are redirected to https" do
  assert redirected_to(Client.get("http://localhost/"), 301) ==
    "https://localhost/"
end

Likewise, redirection to www can also be tested:

test "theerlangelist.com is redirected to www.theerlangelist.com" do
  assert redirected_to(Client.get("https://theerlangelist.com/"), 301)
    == "https://www.theerlangelist.com/"
end

In contrast, external proxy rules, such as those defined in Nginx configuration are typically not tested, which means that some change in configuration might break something else in a way which is not obvious to the operator.

In addition, site_encrypt ships with a small helper for testing the certification. Here’s the relevant test:

test "certification" do
  clean_restart(Erlangelist.Web.Blog.Endpoint)
  cert = get_cert(Erlangelist.Web.Blog.Endpoint)
  assert cert.domains == ~w/theerlangelist.com www.theerlangelist.com/
end

During this test, the blog endpoint (i.e. the blog web server) will be restarted, with all previously existing certificates removed. During the restart, the endpoint will be certified via the local ACME server. This certification will go through the whole process, with no mocking (save for the fact that a local CA is used). HTTP requests will be made, some keys will be generated, the system will call CA, which will then concurrently make a request to the system, and ultimately the certificate will be obtained.

Once that’s all finished, the invocation of get_cert will establish an ssl connection to the blog server and fetch the certificate of the peer. Then we can assert the expected properties of the certificate.

Having such tests significantly increases my confidence in the system. Of course, there’s always a chance of something going wrong in production (e.g. if DNS isn’t correctly configured, and Let’s Encrypt can’t reach my site), but the possibility of errors is reduced, not only because of the tests, but also because a compiled language is used. For example, if I make a syntax error while changing the configuration, the code won’t even compile, let alone make it to production. If I make a typo, e.g. by specifying theerlangelist.org instead of theerlangelist.com, the certification test will fail. In contrast, external configurations are much harder to test, and so they typically end up being manually verified on staging, or in some cases only in production.

More automation

Beyond just obtaining the certificate, site_encrypt will periodically renew it. A periodic job is executed three times a day. This job checks the expiry date of the certificate, and starts the renewal process if the certificate is about to expire in 30 days. In addition, every time a certificate is obtained, site_encrypt can optionally generate a backup of its data. When the system is starting, if the site_encrypt database folder isn’t present and the backup file exists, site_encrypt will automatically restore the database from the backup.

As a user of site_encrypt you have to do zero work to make this happen, which significantly reduces the amount of operational work required, bringing the bulk of it to the regular development.

For more elaborate backup scenarios, site_encrypt provides a callback hook. In your endpoint module you can define the function which is invoked after the certificate is obtained. You can use this function to e.g. store the cert in an arbitrary secure storage of your choice. Notice how this becomes a part of the regular system codebase, which is the most convenient and logical place to express such task. The fact that this is running together with the rest of the system, also means it’s testable. Testing that the new certificate is correctly stored to desired storage is straightforward.

Tight integration

Since it runs in the same OS process, and is powered by the same language, site_encrypt can integrate much better with its client, which leads to some nice benefits. I mentioned earlier that certification is a conversation between our system and the CA server. Now, when we’re using the certbot tool, this dialogue turns into a three-party conversation. Instead of our system asking for the certificate, we ask certbot to do this on our behalf. However, the CA verification request (aka challenge) needs to be served by our site. Now, since certbot is an external tool, it treats our site as an opaque box. As a result, certbot doesn’t know when we responded to the CA challenge, and so it has to be a bit more conservative. Namely, certbot will sleep for about three seconds before it starts polling CA to see if the challenge has been answered.

The native Elixir ACME client runs in the same OS process, and so it can integrate much better. The ACME client is informed by the challenge handler that the challenge is fulfilled, and so it can use a much shorter delay to start polling the CA. In production this optimization isn’t particularly relevant, but on local dev, and especially in tests the difference becomes significant. The certification test via certbot takes about 6 seconds on my machine. The same test via the native client is about 800ms.

This tight integration offers some other interesting possibilities. With a bit of changes to the API, site_encrypt could support arbitrary storage for its database. It could also support coordination between multiple nodes, making it possible to implement a distributed certification, where an arbitrary node in the cluster initiates the certification, while any other node can successfully respond to the challenge, including even the nodes which came online after the challenge has been started.

Operations

With the bulk of the system behaviour described in Elixir code, the remaining operational tasks done outside of Elixir are exclusively related to preparing the machine to run the Erlangelist. These tasks involve creating necessary accounts, creating the folder structure, installing required OS packages (essentially just Docker is needed), and setting up a single systemd unit for starting the container.

The production is dockerized, but the production docker image is very lightweight:

FROM alpine:3.11 as site

RUN apk --no-cache upgrade && apk add --no-cache ncurses

COPY --from=builder /opt/app/site/_build/prod/rel/erlangelist /erlangelist

VOLUME /erlangelist/lib/erlangelist-0.0.1/priv/db
VOLUME /erlangelist/lib/erlangelist-0.0.1/priv/backup

WORKDIR /erlangelist
ENTRYPOINT ["/erlangelist/bin/erlangelist"]

The key part is the COPY instruction which adds the built release of the system to the image. This release will contain all the compiled binaries, as well as a minimal Erlang runtime system, and is therefore pretty much self-contained, requiring only one small OS-level package to be installed.

Final thoughts

Some might argue that using certbot with optionally Nginx or Caddy is simple enough, and I wouldn’t completely disagree. It’s perfectly valid to reach for external products to solve a technical challenge not related to the business domain. Such products can help us solve our problem quickly and focus on our core challenges. On the other hand, I feel that we should be more critical of the problems introduced by such products. As I’ve tried to show in this simple example, the integrated operations approach reduces the amount of moving parts and technologies used, bridges the gap between production and development, and improves the testability of the system. The implementation is simpler and at the same time more flexible, since the tool is driven by functions and data.

For this approach to work, you need a runtime that supports managing multiple system activities. BEAM, the runtime of Erlang and Elixir, makes this possible. For example, in many cases serving traffic directly with Phoenix, without having a reverse proxy in front of it, will work just fine. Features such as ETS tables or GenServer will reduce the need for tools like Redis. Running periodic jobs, regulating load, rate-limiting, pipeline processing, can all be done directly from Elixir, without requiring any external product.

Of course, there will always be cases where external tools will make more sense. But there will also be many cases where integrated approach will work just fine, especially in smaller systems not operating at the level of scale or complexity of Netflix, Twitter, Facebook, and similar. Having both options available would allow us to start with simple and move to an external tool only in more complicated scenarios.

This is the reason why I started the work on site_encrypt. The library is still incomplete and probably buggy, but these are issues that can be fixed with time and effort :-) I believe that the benefits of this approach are worth the effort, so I’ll continue the work on the library. I’d like to see more of such libraries appearing, giving us simpler options for challenges such as load balancing, proxying, or persistence. As long as there are technical challenges where running an external product is the only option, there is opportunity for simplification, and it’s up to us, the developers, to make that happen.

Periodic jobs in Elixir with Periodic

Mon, 27 Jan 20 00:00:00 +0000

Periodic jobs in Elixir with Periodic

2020-01-27

One cool thing about BEAM languages is that we can implement periodic jobs without using external tools, such as cron. The implementation of the job can reside in the same project as the rest of the system, and run in the same OS process as the other activities in the system, which can help simplify development, testing, and operations.

There are various helper abstractions for running periodic jobs in BEAM, such as the built-in :timer module from Erlang stdlib, and 3rd party libraries such as erlcron, quantum, or Oban.

In this article I’ll present my own small abstraction called Periodic which is a part of the Parent library. I wrote Periodic almost two years ago, mostly because I wasn’t particularly satisfied with the available options. Compared to most other periodic schedulers, Periodic makes some different choices:

Scheduling is distributed. Each job uses its own dedicated scheduler process.
Cron expressions are not supported.
There is no out-of-the box support for fixed schedules.

These choices may seem controversial, but there are reasons for making them. Periodic is built to be easy to use in simple scenarios, flexible enough to power various involved cases, and simple to grasp and reason about. To achieve these properties, Periodic is deliberately designed as a small and focused abstraction. Concerns such as declarative interfaces, back-pressure and load regulation, fixed scheduling, improved execution guarantees via persistence, are left to the client code. This means that as clients of Periodic, we sometimes have to invest some more work, but what we get in return is a simple and a flexible abstraction.

Simple usage

A periodic job can be started as follows:

Periodic.start_link(
  run: fn -> IO.puts("Hello, World!") end,
  every: :timer.seconds(5)
)

# after 5 seconds
Hello, World

# after 10 seconds
Hello, World

# ...

Unlike most other periodic libraries out there, Periodic doesn’t use the cron syntax. Even after many years of working with it, I still find that syntax cryptic, unintuitive, and limited. In contrast, I believe that Periodic.start_link(run: something, every: x) is clearer at expressing the intention.

Periodic accepts a couple of options which allow you to control its behaviour, such as dealing with overlapping jobs, interpreting delays, or terminating jobs which run too long. These options make Periodic more convenient than the built-in :timer functionality, with a comparable ease of use. I’m not going to discuss those options in this post, but you can take a look at docs for more details.

The interface of Periodic is small. The bulk of functionality is provided in the single module which exposes two functions: start_link for starting the process dynamically, and child_spec for building supervisor specs. Two additional modules are provided - one to assist with logging, and another to help with deterministic testing.

Controversially enough, Periodic doesn’t provide out-of-the-box support for fixed schedules (e.g. run every Wednesday at 5pm). This might seem like a big deficiency, while it’s in fact a deliberate design choice. I personally regard fixed scheduling as a nuanced challenge for which there is no one-size-fits-all solution, so it’s best to make the trade-offs explicit and leave the choices to the client. Of course it’s perfectly possible to power fixed scheduled jobs with Periodic, and I’ll present some approaches later on in this article.

Flexibility

Since it is based on plain functions invoked at runtime, Periodic is as flexible as it gets. You don’t need to use app or OS envs, but you may use them if they suit your purposes. You don’t need to define a dedicated module (although it is advised for typical cases), use some library module to inject the boilerplate, or pass anything at compile-time. In fact, Periodic is very runtime friendly, supporting various elaborate scenarios, such as on-demand starting/stopping of scheduled jobs.

Another dimension of Periodic’s flexibility is its process model. In Periodic, each job is powered by its own scheduler process. This is one of the core ideas behind Periodic which sets it apart from most other BEAM periodic schedulers I’ve seen.

As a result of this approach, different jobs are separate children in the supervision tree, and so, stopping an individual job is no different from stopping any other kind process. If you know how to do that with OTP, then you know everything you need to know. If you don’t, you’ll need to learn these techniques, but that knowledge will be applicable in the wide range of scenarios outside of Periodic.

Using supervision tree to separate runtime concerns gives us a fine grained control over job termination. Consider the following tree:

       MySystem
      /        \
    Db     CacheCleanup
   /  \
Repo  DbCleanup

In this system we run two periodic jobs (DbCleanup and CacheCleanup). If we want to stop the database part of the system, we can do that by stopping the Db supervisor, taking all db-related activities down, while keeping the cache cleanup alive.

Since schedulers are a part of the supervision tree, and a scheduler acts as a supervisor (courtesy of being powered by Parent.GenServer), various generic code that manipulates the process hierarchy will work with Periodic too. For example, if the job process is trapping exits, System.stop will wait for the job to finish, according to the job childspec (5 seconds by default).

Of course, this process design comes with some trade-offs. Compared to singleton scheduler strategies, Periodic will use twice the amount of processes. This shouldn’t be problematic if the number of jobs is “reasonable”, but it might hurt you if you want to run millions of jobs. However, in such case I don’t think that any generic periodic library will fit the challenge perfectly, and it’s more likely you’ll need to roll your own special implementation, perhaps using Parent.GenServer to help out with some mechanical concerns.

Speaking of Parent, it’s worth noting that this is the abstraction that handles supervisor aspect of the scheduler process, allowing the implementation of Periodic to remain focused and relatively small. The Periodic module currently has 410 LOC, 260 of which are user documentation. The code of Periodic is all about periodic execution concerns, such as ticking with Process.send_after, starting the execution, interpreting and handling user options, and emitting telemetry events. Such division of responsibilities makes both abstractions fairly easy to grasp and reason about, while enabling Parent.GenServer to be used in various other situations (see the Example section in the rationale document for details).

Fixed scheduling

Periodic doesn’t offer ready-made abstraction for fixed scheduling (e.g. run once a day at midnight). However, such behaviour can be easily implemented on top of the existing functionality using the :when filter. Here’s a basic sketch:

Periodic.start_link(
  every: :timer.minutes(1),
  when: fn -> match?(%Time{hour: 0, minute: 0}, Time.utc_now()) end,
  run: &run_job/0
)

The idea is to tick regularly in short intervals, and use the provided :when filter to decide if the job should be started.

Careful readers will spot some possible issues in this implementation. If the system (or the scheduler process) is down at the scheduled time, the job won’t be executed. Furthermore, it’s worth mentioning that Periodic doesn’t guarantee 100% interval precision. Though not very likely, it can (and occasionally will!) happen that in some interval a job is executed twice, while in another interval it’s not executed at all. Such situations will cause our daily job to be either skipped, or executed twice in the same minute. It’s worth noting that similar issues can be (and often are) present in other periodic scheduling systems, but at least in Periodic they are more explicit and clear, since they are present in our code, not in the internals of the abstraction.

If you don’t care about occasional missed or extra beat, the basic take presented above will serve you just fine. In fact, if I wanted to do some daily nice-to-have cleanup, this is the version I’d start with. Perhaps the code is not as short as 0 0 * * *, but on the upside it’s more explicit about its intention and possible consequences, and it is quite flexible. Implementing more elaborate schedules such as “run every 10 minutes during working hours, but once per hour otherwise” is a matter of adapting the :when function.

Abstracting

Our fixed scheduling code, while fairly short, might become a bit noisy and tedious if you want to run multiple fixed scheduled jobs. However, since Periodic interface is based on plain functions and arguments, nothing prevents you from generalizing the approach, for example as follows:

defmodule NaiveDaily do
  def start_link(hour, minute, run_job) do
    Periodic.start_link(
      every: :timer.minutes(1),
      when:
        fn -> match?(%Time{hour: ^hour, minute: &minute}, Time.utc_now()) end,
      run: run_job
    )
  end
end

And now, in your project you can do:

NaiveDaily.start_link(0, 0, &do_something/0)
NaiveDaily.start_link(8, 0, &do_something_else/0)

Taking this idea further, implementing a generic translator of cron syntax to Periodic should be possible and straightforward. In theory, Parent, the host library of Periodic, could ship with such abstractions, and one day some such helpers might be added to the library. For the time being though, I’m content with keeping the library small and focused, and I’ll consider expanding it after gathering some data from the usage in the wild.

Improving execution guarantees

Our basic naive implementation of the fixed scheduler gives us “maybe once” guarantees - a job will usually be executed once a day, occasionally it won’t be executed at all, while in some special circumstances it might be executed more than once in the same minute.

If we want to improve the guarantees, we need to expand the code. Luckily, since our approach is powered by a Turing-complete language, we can tweak the implementation to our needs. Here’s a basic sketch:

Periodic.start_link(
  every: :timer.minutes(1),
  when: fn -> not job_executed_today?() end,
  run: fn ->
    run_job()
    mark_job_as_executed_today()
  end
)

As the name suggests job_executed_today?/0 has to somehow figure out if we already ran the job. A simple version can be powered by a global in-memory data (e.g. using ETS), which should improve the chance of the job getting executed at least once a day, but it would also increase the chance of unwanted repeated executions.

If we opt to base the logic on some persistence storage (say a database), we can reduce the chance of repeated executions. Note however that an occasional duplicate might still happen if the system is shut down right after the job is executed, but before it’s marked as executed. In this case, we’ll end up executing the job again after the restart. This issue can only be eliminated in some special circumstances, such as:

The job manipulates the same database where we mark job as executed. In this case we can transactionally run the job and mark it as executed.
The target of the job supports idempotence, allowing us to safely rerun the job without producing duplicate side-effects.

Here’s a bit more involved scenario, which I actually had to solve in real-life. Suppose that we want to run a periodic cleanup during the night, but only if no other activity in the system is taking place. Moreover, while the job is running, all pending activities should wait. Here’s a basic sketch:

Periodic.start_link(
  on_overlap: :ignore,
  every: :timer.minutes(1),
  when: fn -> Time.utc_now().hour in 0..4 and not job_executed_today?() end,
  run: fn ->
    with_exclusive_lock(fn ->
      run_job()
      mark_job_as_executed_today()
    end)
  end
)

The implementation relies on some exclusive lock mechanism. In a simple version we can use :global.trans to implement a basic version of RW locking that would permit regular activities to grab the lock simultaneously (readers), while the job would be treated as a writer which grabs the lock exclusively to anyone else. Also note the usage of the on_overlap: :ignore option, which makes sure we don’t run multiple instances of the job simultaneously.

In a real-life scenario I used this approach, combined with ad-hoc persistence to a local file with :erlang.term_to_binary and its counterpart. The project was completely standalone, powered at runtime by a single BEAM instance, and nothing else running on the side.

This is a nice example of how we profit from the fact that the periodic execution is running together with the rest of the system. There’s a natural strong dependency between the job and other system activities, and we can model this dependency without needing to run external moving pieces, such as e.g. Redis. Our implementation is a straightforward representation of the problem, and it can even be easily tested!

The locking mechanism could also be used to ensure that the job is executed only on a single machine in the cluster:

Periodic.start_link(
  on_overlap: :ignore,
  every: :timer.minutes(1),
  when: &should_run?/0,
  run: fn ->
    with_exclusive_lock(fn ->
      # The repeated check makes sure the job hasn't been executed
      # on some other machine while we were waiting for the lock.
      if should_run?() do
        run_job()
        mark_job_as_executed()
      end
    end)
  end
)

In this version, with_exclusive_lock would be based on some shared locking mechanism, for example using database locks, or some distributed locking mechanism like :global.

Final thoughts

As an author, I’m admittedly very partial to Periodic. After all, I made it pretty much the way I wanted it. That said, I believe that it has some nice properties.

With a small and intention-revealing interface, simple process structure, and OTP compliance, I believe that Periodic is a compelling choice for running periodical jobs directly in BEAM. Assuming nothing about the preferences of different clients, sticking to plain functions, and using a simple process structure make Periodic very flexible, and allow clients to use it however they want to. Building specialized abstractions on top of Periodic, such as the sketched NaiveDaily is possible and straightforward.

The lack of dedicated support for fixed-time scheduling admittedly requires a bit more coding on the client part, but it also motivates the clients to consider the consequences and trade-offs. A naive solution, which should be roughly on par with what other similar libs are providing, is short and straightforward to implement. More demanding scenarios will require comparative effort in the code, but that’s something that can’t be avoided. On the plus side, all the approaches will share a similar pattern of when: &should_run?/0, run: &run/0, typically executed once a minute. Since the decision logic is implemented in Elixir, the client code has full freedom in the decision making process.

In summary, I hope that this article will motivate you to give Periodic a try. If you spot some problems or have some feature proposals, feel free to open up an issue on the project repo.

Rethinking app env

Mon, 21 May 18 00:00:00 +0000

Rethinking app env

2018-05-21

What is app env, and what should we use it for? The Elixir docs state:

OTP provides an application environment that can be used to configure the application.

In my experience, an app env of a moderately complex system will typically contain the following things:

Things which are system configuration
Things which aren’t system configuration
Things which, when modified at runtime, affect the system behaviour
Things which, when modified at runtime, don’t affect the system behaviour
Things which vary across different mix environments
Things which don’t vary across different mix environments
Things which are essentially code (e.g. MFA triplets and keys which are implicitly connected to some module)

In other words, app env tends to degenerate into a bunch of key-values arbitrarily thrown into the same place. In this article I’ll try to reexamine the way we use app env, and its closely related Elixir cousin config scripts (config.exs and friends), and propose a different approach to configuring Elixir systems. The ideas I’ll present might sound heretical, so I should warn you upfront that at the point of writing this, it’s just my personal opinion, and not the community standard, nor the approach suggested by the Elixir core team.

However, if you keep an open mind, you might find that these ideas might lead to some nice benefits:

Better organized configuration code
Complete flexibility to fetch configuration data from arbitrary sources
Much less bloat in config scripts and app env

There’s a long road ahead of us, so let’s kick off.

Live reconfiguration

Technically speaking, app env is a mechanism which allows us to keep some application specific data in memory. This data is visible to all processes of any app, and any process can change that data. Under the hood, the app env data sits in a publicly accessible ETS table named :ac_tab, so it has the same semantics as ETS.

So what is it really good for? Let’s see a simple example. Suppose we need to run a periodic job, and we want to support runtime reconfiguration. A simple implementation could look like this:

defmodule PeriodicJob do
  use Task

  def start_link(_arg), do: Task.start_link(&loop/0)

  defp loop() do
    config = Application.fetch_env!(:my_system, :periodic_job)
    Process.sleep(Keyword.fetch!(config, :interval))
    IO.puts(Keyword.fetch!(config, :message))

    loop()
  end
end

Notice in particular how we’re fetching the periodic job parameters from app env in every step of the loop. This allows us to reconfigure the behaviour at runtime. Let’s try it out:

iex> defmodule PeriodicJob do ... end

iex> Application.put_env(
        :my_system,
        :periodic_job,
        interval: :timer.seconds(1),
        message: "Hello, World!"
      )

iex> Supervisor.start_link([PeriodicJob], strategy: :one_for_one)

Hello, World!   # after 1 sec
Hello, World!   # after 2 sec
...

Now, let’s reconfigure the system:

iex> Application.put_env(
        :my_system,
        :periodic_job,
        interval: :timer.seconds(5),
        message: "Hi, World!"
      )

Hello, World!   # after at most 1 sec
Hi, World!      # 5 seconds later
Hi, World!      # 10 seconds later

So in this example, we were able to reconfigure a running system without restarting it. It’s also worth noting that you can do the same thing in a system running in production, either via a remote iex shell or using the :observer tool.

An important point is that this live reconfiguration works because the code doesn’t cache the app env data in a local variable. Instead, it refetches the configuration in every iteration. This is what gives us runtime configurability.

In contrast, if a piece of data is fetched from app env only once, then changing it at runtime won’t affect the behaviour of the system. Let’s see an example. Suppose we’re writing a web server, and want to configure it via app env. A simple plug-based code could look like this:

defmodule MySystem.Site do
  @behaviour Plug

  def child_spec(_arg) do
    Plug.Adapters.Cowboy2.child_spec(
      scheme: :http,
      plug: __MODULE__,
      options: [port: Application.fetch_env!(:my_system, :http_port)]
    )
  end

  ...
end

Let’s say that the HTTP port is initially set to 4000. We start the system, and try to reconfigure it dynamically by changing the port to 5000:

iex> Application.put_env(:my_system, :http_port, 5000)

Unsurprisingly, this will not affect the behavior of the system. The system will still listen on port 4000. To force the change, you need to force restart the parent supervisor. Why the parent supervisor, and not the process? Because in this case the app env is fetched in child_spec/1 which is only invoked while the parent is initializing.

So, in this plug example, the site can theoretically be dynamically reconfigured, but doing it is quite clumsy. You need a very intimate knowledge of the code to reapply the app env setting. So for all practical intents and purposes, the port app env setting is constant.

This begs the question: if an app env value is a constant which doesn’t affect the runtime behaviour, why keep it in app env in the first place? It’s one more layer of indirection, and so it has to be justified somehow.

Some possible reasons for doing it would be:

Varying configuration between different mix envs (dev, test, prod)
Consolidating system configuration into a single place
Dependency library requires it

While the third scenario can’t be avoided, I believe that for the first two, app env and config scripts are far from perfect. To understand why, let’s look at some config scripts issues.

Context conflation

Suppose you need to use an external database in your system, say a PostgreSQL database, and you want to work with it via Ecto. Such scenario is common enough that even Phoenix will by default generate the Ecto repo configuration for you when you invoke mix phx.new:

# dev.exs
config :my_system, MySystem.Repo,
  adapter: Ecto.Adapters.Postgres,
  username: "postgres",
  password: "postgres",
  database: "my_system_dev",
  hostname: "localhost",
  pool_size: 10

# test.exs
config :my_system, MySystem.Repo,
  adapter: Ecto.Adapters.Postgres,
  username: "postgres",
  password: "postgres",
  database: "my_system_test",
  hostname: "localhost",
  pool: Ecto.Adapters.SQL.Sandbox

# prod.secret.exs
config :my_system, MySystem.Repo,
  adapter: Ecto.Adapters.Postgres,
  username: "postgres",
  password: "postgres",
  database: "my_system_prod",
  pool_size: 15

You get different database configurations in dev, test, and prod. The prod.secret.exs file is git-ignored so you can freely point it to the local database, without the fear of compromising production or committing production secrets.

At first glance this looks great. We have varying configuration for different mix envs, and we have a way of running a prod-compiled version locally. However, this approach is not without its issues.

One minor annoyance is that, since you can’t commit prod.secret.exs to the repo, every developer in the team will have to populate it manually. It’s not a big issue, but it is a bit clumsy. Ideally, the development setup would work out of the box.

A more important issue is the production setup. If you’re running your system as an OTP release (which I strongly advise), you’ll need to host the secret file at the build server, not the production server. If you want to manage a separate staging server which uses a different database, you’ll need to somehow juggle with multiple secret configs on the build server, and separately compile the system for staging and production.

The approach becomes unusable if you’re deploying your system on client premises, which is a case we have at Aircloak (for a brief description of our system, see this post). In this scenario, the development team doesn’t know the configuration parameters, while the system admins don’t have the access to the code, nor Elixir/Erlang know-how. Therefore, config scripts can’t really work here.

Let’s take a step back. The root cause of the mentioned problems is that by setting up different db parameters in different mix envs we’re conflating compilation and runtime contexts. In my view mix env (dev/test/prod) is a compilation concern which determines variations between compiled version. So for example, in dev we might configure auto code recompiling and reloading, while in prod we’ll turn that off. Likewise, in dev and test, we might disable some system services (e.g. fetching from a Twitter feed), or use fake replacements.

However, a mix env shouldn’t assume anything about the execution context. I want to be able to run a prod compiled version locally, so I can do some local verification or benching for example. Likewise, once I assemble an OTP release for prod, in addition to running it on a production box, I want to run it on a staging server using a separate database.

These are not scenarios which can be easily handled with config scripts, and so it follows that config scripts are not a good fit for specifying the differences between different execution contexts.

Config script execution time

A better way to specify these differences is to use an external configuration source, say an OS env, an externally supplied file, or a KV such as etcd.

Let’s say that we decided to keep connection parameters in an OS env. The configuration code could look like this:

# config.exs
config :my_system, MySystem.Repo,
  adapter: Ecto.Adapters.Postgres,
  username: System.get_env("MY_SYSTEM_DB_USERNAME"),
  password: System.get_env("MY_SYSTEM_DB_PASSWORD"),
  database: System.get_env("MY_SYSTEM_DB_DATABASE"),
  hostname: System.get_env("MY_SYSTEM_DB_HOSTNAME")

# configure other variations in dev/test/prod.exs

And then, we can set different OS env vars on target machines, and now we can compile a prod version once, and run it on different boxes using different databases.

However, this will lead you to another problem. In the current Elixir (1.6), config scripts are evaluated during compilation, not the runtime. So if you’re using OTP releases and assemble them on a separate build server (which is a practice I recommend for any real-life project), this simply won’t fly today. The env parameters are retrieved during compilation, not during runtime, and so you end up with the same problem.

Admittedly, the Elixir team has plans to move the execution of config scripts to runtime, which means that this issue will be solved in the future. However, if you need to fetch the data from an external source such as a json file or an etcd instance, then this change won’t help you. It’s essentially a chicken-and-egg problem: app env values need to be resolved before the apps are started, and so even at runtime, the config script needs to run before a dependency such as e.g. JSON decoder or an etcd client is loaded. Consequently, if you need to fetch a value using a dependency library, config scripts is not the place to do it.

The thing is that config scripts are evaluated too soon. In the worst case, they’re evaluated during compilation on a build server, and in the best case they’re evaluated before any dependency is started. In contrast, the services of our system, such as repo, endpoint, or any other process, are started way later, sometimes even conditionally. Consequently, config scripts often force you to fetch the config values much sooner than the moment you actually need them.

Configuring at runtime

Given the issues outlined above, my strong opinion is that connection parameters to external services don’t belong to config scripts at all. So where do we configure the connection then? Previously, this required some trickery, but luckily Ecto and Phoenix have recently added an explicit support for runtime configuration in the shape of the init/2 callback.

So here’s one way how we could configure our database connection params:

defmodule MySystem.Repo do
  use Ecto.Repo, otp_app: :my_system

  def init(_arg, app_env_db_params), do:
    {:ok, Keyword.merge(app_env_db_params, db_config())}

  defp db_config() do
    [
      hostname: os_env!("MY_SYSTEM_DB_HOST"),
      username: os_env!("MY_SYSTEM_DB_USER"),
      password: os_env!("MY_SYSTEM_DB_PASSWORD"),
      database: os_env!("MY_SYSTEM_DB_NAME")
    ]
  end

  defp os_env!(name) do
    case System.get_env(name) do
      nil -> raise "OS ENV #{name} not set!"
      value -> value
    end
  end
end

With this approach, we have moved the retrieval of connection params to runtime. When the repo process is starting, Ecto will first read the app config (configured through a config script), and then invoke init/2 which can fill in the blanks. The big gain here is that init/2 is running at runtime, while your application is starting and when your dependencies have already been started. Therefore, you can now freely invoke System.get_env, or Jason.decode!, or EtcdClient.get, or anything else that suits your purposes.

Consolidating service configuration

One issue with the code above is that it’s now more difficult to use a different database in the test environment. This could be worked around with a System.put_env call placed in test_helper.exs. However, that approach won’t fly if the source of truth is a file or an etcd instance. What we really want is the ability to bypass the OS env check in test environment, and enforce the database name in a different way.

Config script give you a very convenient solution to this problem. You could provide the database name only in test.exs:

# test.exs

config :my_system, MySystem.Repo, database: "my_system_test"

And then adapt the configuration code:

defmodule MySystem.Repo do
  use Ecto.Repo, otp_app: :my_system

  def init(_arg, app_env_db_params), do:
    {:ok, Keyword.merge(app_env_db_params, db_config(app_env_db_params))}

  defp db_config(app_env_db_params) do
    [
      hostname: os_env!("MY_SYSTEM_DB_HOST"),
      username: os_env!("MY_SYSTEM_DB_USER"),
      password: os_env!("MY_SYSTEM_DB_PASSWORD"),
      database:
        Keyword.get_lazy(
          app_env_db_params,
          :database,
          fn -> os_env!("MY_SYSTEM_DB_NAME") end
        )
    ]
  end

  ...
end

While this will fix the problem, the solution leaves a lot to be desired. At this point, the database is configured in different config scripts and in the repo module. I personally find this quite confusing. To grasp the database configuration in a particular mix env, you need to consider at least three different files: config.exs, “#{Mix.env}.exs”, and the repo module source file. To make matters worse, the config files will be bloated with other unrelated configurations (e.g. endpoint settings), and the database configuration could even be dispersed throughout the config in the shape of:

# config.exs

config :my_system, MySystem.Repo, ...

# tens or hundreds of lines later

config :my_system, MySystem.Repo, ...

...

Let’s consider why do we even use config script in the first place. We already pulled database parameters to init/2, but why are other repo parameters still in config scripts? The reason is because it’s very convenient to encode variations between mix envs through config scripts. You just put stuff in the desired “#{Mix.env}.exs” and you’re good to go. However, you never get something for nothing, so you pay for this writing convenience by sacrificing the reading experience. Understanding the database configuration becomes much harder.

A better reading experience would be if the entire database configuration was consolidated in one place. Since we need to determine some parameters at runtime, init/2 has to be that place. But how can we encode variations between different mix envs? Luckilly, this is fairly simple with a light touch of Elixir metaprogramming:

defp db_config() do
  [
    # ...
    database: db_name()
  ]
end

# ...

if Mix.env() == :test do
  defp db_name(), do: "my_system_test"
else
  defp db_name(), do: os_env!("MY_SYSTEM_DB_NAME")
end

This code is somewhat more elaborate, but it’s now consolidated, and more explicit. This code clearly states that in the test env the database name is forced to a particular value, i.e. it’s not configurable. In contrast, the previous version is more vague about its constraints, and so leaves room for mistakes. If you’re renaming the repo module but forget to update the config script, you might end up running tests on your dev db and completely mess up your data.

It’s worth noting that you should only ever invoke Mix.env during compilation, so either at the module level (i.e. outside of named functions), or inside an unquote expression. Mix is not available at runtime, and even if it were, Mix.env can’t possibly give you a meaningful result. Remember, mix env is a compilation context, and so you can’t get it at runtime.

If you dislike the if/else noise of the last attempt, you can introduce a simple helper macro:

# ...

defmacrop env_specific(config) do
  quote do
    unquote(
      Keyword.get_lazy(
        config,
        Mix.env(),
        fn -> Keyword.fetch!(config, :else) end
      )
    )
  end
end

# ...

defp db_config() do
  [
    # ...
    database: env_specific(test: "my_system_test", else: os_env!(...))
  ]
end

Notice that this doesn’t change the semantics of the compiled code. Since env_specific is a macro, invoking it will make a compile-time decision to inject one code or another (a constant value or a function call). So for example, in test environment, the code os_env!(…) won’t be executed, nor even make it to the compiled version. Consequently, you can freely invoke anything you want here, such as json decoding, or fetching from etcd for example, and it will be executed at runtime, only in the desired mix env.

As an added bonus, the env_specific macro requires that the value is specified for the current mix env, or that there’s an :else setting. The macro will complain at compile time if the value is not provided.

To summarize, with a touch of metaprogramming we achieved the feature parity with config scripts, moved the retrieval of parameters to runtime, consolidated the repo configuration, and expressed variations between mix envs more clearly and with stronger guarantees. Not too shabby :-)

Consolidating system configuration

One frequent argument for app env and config scripts is that they allow us to consolidate all the parameters of the system in a single place. So, supposedly, config scripts become a go-to place which we can refer to when we want to see how some aspect of the system is configured.

However, as soon as you want to configure external services, such as a database, you’re left with two choices:

Shoehorn configuration into config scripts
Move configuration to runtime

In the first case, you’ll need to resort to all sorts of improvisations to make it work. As soon as you need to support multiple execution contexts, you’re in for a ride, and it won’t be fun :-). You might consider abandoning OTP releases completely, and just run iex -S mix in prod, which is IMO a very bad idea. Take my advice, and don’t go there :-)

This leaves you with the second option: some system parameters will be retrieved at runtime. And at this point, config script ceases to be the single place where system parameters are defined.

That’s not a bad thing though. To be honest, I think that config script is a poor place to consolidate configuration anyway. First of all, config scripts tend to be quite noisy, and contain all sorts of data, including the things which are not a part of system configuration at all.

Consider the following configuration generated by Phoenix:

config :my_system, MySystem.Repo, adapter: Ecto.Adapters.Postgres, ...

Is database adapter really a configurable thing? Can you just change this to, say, MySql and everything will magically work? In my opinion, unless you’ve explicitly worked to support this scenario it will fail spectacularly. Therefore, the adapter is not a parameter to your system, and hence it doesn’t belong here.

As an aside, at Aircloak, due to the nature of our system, the database adapter is a configurable parameter. However, what’s configured is not an Ecto adapter, but rather a particular setting specific to our system. That setting will affect how the system works with the database, but the internal variations are way more complex. Supporting different databases required a lot of work, beyond just passing one Ecto adapter or the other. We needed to support this scenario, and so we invested the effort to make it happen. If you don’t have such needs, then you don’t need to invest that effort, and database adapter is not your system’s configuration parameter. In theory you can change it, in practice you can’t :-)

Here’s another example of a config bloat:

config :my_system, MySystemWeb.Endpoint,
  # ...
  render_errors: [view: MySystemWeb.ErrorView, accepts: ~w(html json)],
  pubsub: [name: MySystem.PubSub, adapter: Phoenix.PubSub.PG2],
  # ...

These things are not configuration parameters. While in theory we could construct a scenario where this needs to be configurable, in most cases it’s just YAGNI. These are the parameters of the library, not of the system, and hence they only add bloat to config scripts and app env.

Another problem is that config scripts tend to be populated in an arbitrary way. My personal sentiment is that their content becomes a function of arbitrary decisions made by different people at different points in time. At best, developers make a sensible effort to keep things which “feels” like system configuration in config scripts. More often, the content of config scripts is determined the by demands of libraries, the defaults established by the code generators, and convenient ability to vary the values across different mix envs.

In summary, the place for the supposed consolidated system configuration will contain:

Some, but not all things which are system parameters
Some things which are not system parameters

Let’s take a step back here and consider why do we even want a consolidated system configuration.

One reason could be to make it easier for developers to find the parameters of the system. So if we need to determine database connection parameters, the HTTP port, or a logging level, we can just open up the config script, and find it there.

Personally, I have a hard time accepting this argument. First of all, the configuration IMO naturally belongs to the place which uses it. So if I’m interested in db connection parameters, I’d first look at the repo module. And if I want to know about the endpoint parameters, then I’d look at the endpoint module.

Such approach also makes it easier to grasp the configuration. When I read config scripts, I’m spammed with a bunch of unrelated data which I neither care about at the moment, nor can hold together in my head. In contrast, when I read a consolidated repo config in isolation, I can more easily understand it.

A more important reason for system config consolidation is to assist administration by external administrators. These people might not have the access to the source code, or maybe they’re not fluent in Elixir, so they can’t consult the code to discover system parameters. However, for the reasons I’ve stated above, I feel that config scripts won’t suffice for this task. As mentioned, database connection parameters will likely not be a part of the config script, and so the complete consolidation is already lost. In addition, if external admins are not fluent in Elixir, they could have problems understanding elixir scripts, especially if they are more dynamic.

If you plan on assisting administration, consider using well understood external configuration sources, such as ini, env, or json files, or KVs such as etcd. If you do that, then app env will not be needed, and config scripts will not suffice anyway, so you’ll likely end up with some variation of the configuration style proposed above, which is performed at runtime.

As a real-life example, the system we’re building at Aircloak is running on client premises, and has to be configured by the client’s administrators. We don’t have the access to their secrets, and they don’t have the access to our source code. To facilitate administration, we fetch system parameters from a json file which has to be provided by the administrators. We’ve explicitly and carefully cherry picked the parameters which belong to system configuration. Everything else is an implementation detail, and so it doesn’t cause bloat in the config. As a consequence, we know exactly which pieces of data can be provided in the configuration, and so we can validate the config file against a schema and fail fast if some key name is misspelled, or some data is not of the right type.

Configuring a Phoenix endpoint

Let’s take a look at a more involved example. This blog is powered by Phoenix, and the endpoint is completely configured at runtime. Therefore, the only endpoint-related config piece is the following:

config :erlangelist, Erlangelist.Web.Endpoint, []

The reason why we need an empty config is because Phoenix requires it.

All of the endpoint parameters are provided in init/2:

defmodule Erlangelist.Web.Endpoint do
  # ...

  def init(_key, phoenix_defaults),
    do: {:ok, Erlangelist.Web.EndpointConfig.config(phoenix_defaults)}
end

Since there are a lot of parameters and significant variations between different mix envs, I’ve decided to move the code into another module, to separate plug chaining from configuration assembly. The function Erlangelist.Web.EndpointConfig.config/1 looks like this:

def config(phoenix_defaults) do
  phoenix_defaults
  |> DeepMerge.deep_merge(common_config())
  |> DeepMerge.deep_merge(env_specific_config())
  |> configure_https()
end

Starting with the default values provided by Phoenix, we’ll apply some common settings, and then env-specific settings, and finally do some https specific tuning (which is needed due to auto certification with Let’s Encrypt).

Note that I’m doing a deep merge here, since env specific settings might partially overlap with the common ones. Since, AFAIK, deep merging is not available in Elixir, I’ve resorted to the DeepMerge library.

The common config determines the parameters which don’t vary between mix envs:

defp common_config() do
  [
    http: [compress: true, port: 20080],
    render_errors: [view: Erlangelist.Web.ErrorView, accepts: ~w(html json)],
    pubsub: [name: Erlangelist.PubSub, adapter: Phoenix.PubSub.PG2]
  ]
end

Notice how the http port is hardcoded. The reason is because it’s the same in all mix envs, and on all host machines. It always has this particular value, and so it’s a constant, not a config parameter. In production, the request arrive on port 80. However, this is configured outside of Elixir, by using iptables to forward the port 80 to the port 20080. Doing so allows me to run the Elixir system as a non-privileged user.

Since the variations between different envs are significant, I didn’t use the env_specific macro trick. Instead, I opted for the plain Mix.env based switch:

case Mix.env() do
  :dev -> defp env_specific_config(), do: # dev parameters
  :test -> defp env_specific_config(), do: # test parameters
  :prod -> defp env_specific_config(), do: # prod parameters
end

The complete version can be seen here.

This consolidation allows me to find the complete endpoint configuration in a single place - something which is not the case for config scripts. So now I can clearly see the differences between dev, test, and prod, without needing to simultaneously look at three different files, and a bunch of unrelated noise. It’s worth repeating that this code has the feature parity with config scripts. In particular, the dev- and the test-specific parameters won’t make it into the prod-compiled version.

Supporting runtime configurability

For fun and experiment, I also added a bit of runtime configurability which allows me to change some behaviour of the system without restarting anything.

When this site is running, I keep some aggregated usage stats, so I can see the read count per each article. This is implemented in a quick & dirty way using :erlang.term_to_binary and storing data into a file. I use a separate file for each day, and the system periodically deletes older files.

The relevant code sits in the Erlangelist.Core.UsageStats module, which is also responsible for its own configuration. The configuration specifies how often is the in-memory data flushed to disk, how often is the cleanup code invoked, and how many files are preserved during the cleanup. Here are the relevant pieces of the configuration code:

defmodule Erlangelist.Core.UsageStats do

  def start_link(_arg) do
    init_config()
    ...
  end

  defp init_config(),
    do: Application.put_env(:erlangelist, __MODULE__, config())

  defp config() do
    [
      flush_interval:
        env_specific(
          prod: :timer.minutes(1),
          else: :timer.seconds(1)
        ),
      cleanup_interval:
        env_specific(
          prod: :timer.hours(1),
          else: :timer.minutes(1)
        ),
      retention: 7
    ]
  end

  ...
end

Just like with endpoint and repo, the configuration is encapsulated in the relevant module. However, since I want to support dynamic reconfiguration, I’m explicitly storing this config into the app env before I start the process. Finally, I only ever access these parameters by directly invoking Application.fetch_env! (see here and here), without caching the values in variables. Therefore, changing any of these app env settings at runtime will affect the future behaviour of the system.

As a result of this style of configuration, the config scripts become very lightweight:

# config.exs
config :logger, :console,
  format: "$time $metadata[$level] $message\n",
  metadata: [:user_id]

config :erlangelist, Erlangelist.Web.Endpoint, []


# dev.exs
config :logger, level: :debug, console: [format: "[$level] $message\n"]
config :phoenix, :stacktrace_depth, 20


# test.exs
config :logger, level: :warn

And the full app env of the :erlangelist app is very small, consisting mostly of parameters which can affect the runtime behaviour of the system:

iex> Application.get_all_env(:erlangelist)
[
  {Erlangelist.Core.UsageStats,
   [flush_interval: 1000, cleanup_interval: 60000, retention: 7]},
  {Erlangelist.Web.Endpoint, []},
  {:included_applications, []}
]

Libraries and app env

Sometimes a dependency will require some app env settings to be provided, and so you’ll need to use config scripts. For example, the logging level of the Logger application is best configured in a config script. Logger actually supports runtime configuration, so you could set the logger level in your app start callback. However, at the point your app is starting, your dependencies are already started, so the setting might be applied too late. Thus such configuration is best done through a config script.

There are also many libraries, both Erlang and Elixir ones, which needlessly require the parameters to be provided via app config. If you’re a library author, be very cautious about opting for such interface. In most cases, a plain functional interface, where you take all the options as function parameters, will suffice. Alternatively (or as well), you could support a callback similarly to Ecto and Phoenix, where you invoke the init callback function, allowing the clients to provide the configuration at runtime.

There are some cases where requiring app config is the best choice (a good example is the aforementioned Logger), but such scenarios are few and far between. More often than not, a plain functional interface will be a superior option. Besides keeping things simple, and giving maximum flexibility to your users, you’ll also be able to better document and enforce the parameter types via typespecs.

I’d also like to caution against keeping code references in config scripts. MFAs or atoms which are implicitly tied to modules in the compiled code are an accident waiting to happen. If you rename the module, but forget to update the config, things will break. If you’re lucky, they will break abruptly in tests. If not, they will silently not work in prod, and you might face all sorts of strange issues which will be hard to troubleshoot.

If you’re a library author, try not to enforce your users to set a Foo.Bar app env key and define the module of the same name. This is rarely a good approach, if ever. There will be occasional cases where e.g. a module needs to be provided via app config. A good example is plugging custom log backends into the logger. But, again, such situations are not common, so think hard before making that choice. In most cases taking functions or callback modules via parameters will be a better option.

Final thoughts

In my impression, Elixir projects tend to overuse config scripts and app env. The reasons are likely historic. As far as I remember, even pure Erlang libraries frequently required, or at least promoted, app envs with no particular technical reasons.

I feel that this overuse is further amplified by Elixir config scripts, which are admittedly very convenient. They simplify the writing process, but they also make it easy to add bloat to app env. Consequently, we end up with config scripts which don’t describe the complete system configuration, but frequently contain things which are not configuration at all. Since they are executed at compile time, the config scripts can cause a lot of confusion, and will not work if you need to fetch parameters from other sources, such as OS env. Even if the Elixir team manages to move config execution to runtime, they will still likely be limited in what they can offer. Fetching from sources such as etcd or external files (json, ini) will require different solutions.

In my opinion, a better approach is to drive configuration retrieval at runtime, from the place which actually needs it. Fetch the site configuration in the endpoint module, and the repo configuration in the repo module. That will separate different configuration concerns, but will consolidate the parameters which naturally belong together. Most importantly, it will shift the configuration retrieval to runtime, giving you a much higher degree of flexibility.

Keep in mind that app env is just another data storage, and not a central place for all config parameters. That storage has its pros and cons, and so use it accordingly. If you read the data from app env only once during startup, then why do you need app env in the first place? If you’re copying the data from OS env to app env, why not just skip app env and always read it from an OS env instead? If you need to cache some parameters to avoid frequent roundtrips to an external storage, consider using a dedicated ets table or a caching library such as Cachex.

Since app env values can be changed at runtime, limit the app env usage to the pieces of data which can be used to change the system behaviour. Even in those cases, you’ll likely be better of without config scripts. Define the configuration data in the place where it is used, not in a common k-v store.

When your dependency requires an app env during its startup, your best option is to provide it via a config script. If you use config scripts only in such cases, they will be much smaller and easier to grasp. If you feel that the library needlessly requires app env setting, contact the maintainers and see if it can be improved.

Be careful about using config scripts to vary the behaviour between different mix envs. You can achieve the same effect with bit of Elixir metaprogramming. Doing so will help you consolidate your configuration, and keep the things which are not system parameters outside of app env and config script. Keep in mind that Mix functions shouldn’t be invoked at runtime, and that Mix.env has no meaning at runtime.

Make the distinction between compile time and execution contexts. If you compile with MIX_ENV=prod, you’ve compiled a production version, not the version that can only run on a production box. A prod compiled code should be easily invokable on a dev box, and on a staging machine. Consequently, variations between execution contexts are not variations between compilation contexts, and thus don’t belong to configuration scripts, nor any other mechanism relying on Mix.env.

Finally, if you do want to consolidate your system parameters to assist external administrators, consider using well understood formats, such as env, ini, or json files, or storages such as etcd. Cherry pick the parameters which are relevant, and leave out the ones which are implementation details. Doing so will keep your configuration in check, and make it possible to validate it during startup.

As a final parting gift, here is some recommended further reading:

Happy configuring! :-)

Copyright 2018, Saša Jurić. This article is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
The article was first published on The Erlangelist site.
The source of the article can be found here.

To spawn, or not to spawn?

Tue, 4 Apr 17 00:00:00 +0000

To spawn, or not to spawn?

2017-04-04

That is indeed the question! Whether it is better to keep everything in a single process, or to have a separate process for every piece of state we need to manage? In this post I’ll talk a bit about using and not using processes. I’ll also discuss how to separate complex stateful logic from concerns such as temporal behaviour and cross process communication.

But before starting, since this is going to be a long article, I want to immediately share my main points:

Use functions and modules to separate thought concerns.
Use processes to separate runtime concerns.
Do not use processes (not even agents) to separate thought concerns.

The construct “thought concern” here refers to ideas which exist in our mind, such as order, order item, and product for example. If those concepts are more complex, it’s worth implementing them in separate modules and functions to separate different concerns and keep each part of our code focused and concise.

Using processes (e.g. agents) for this is a mistake I see people make frequently. Such approach essentially sidesteps the functional part of Elixir, and instead attempts to simulate objects with processes. The implementation will very likely be inferior to the plain FP approach (or even an equivalent in an OO language). Keep in mind that there is a price associated with processes (memory and communication overhead). Therefore, reach for processes when there are some tangible benefits which justify that price. Code organization is not among those benefits, so that’s not a good reason for using processes.

Processes are used to address runtime concerns - properties which can be observed in a running system. For example, you’ll want to reach for multiple processes when you want to prevent a failure of one job to affect other activities in the system. Another motivation is when you want to introduce a potential for parallelism, allowing multiple jobs to run simultaneously. This can improve your performance, and open up potential for scaling in both directions. There are some other, less common cases for using processes, but again - separation of thought concerns is not one of them.

An example

But how do we manage a complex state then, if not with agents and processes? Let me illustrate the idea through a simple domain model of a reduced, and a slightly modified version of the blackjack game. The code I’ll show you (available here) powers a single round on the blackjack table.

A round is basically a sequence of hands, with each hand belonging to a different player. The round starts with the first hand. The player is initially given two cards and then makes a move: take one more card (a hit), or take a stand. In the former case, another card is given to the player. If the score of the player’s hand is greater than 21, the player is busted. Otherwise, the player can take another move (hit or stand).

The score of the hand is the sum of all the values of the cards, with numerical ranks (2-10) having their respective values, while jack, queen, and king have the value of 10. An ace card can be valued as 1 or as 11, whichever gives a better (but not busted) score.

The hand is finished if the player stands or busts. When a hand is finished, the round moves to the next hand. Once all the hands have been played, the winners are non-busted hands with the highest score.

To keep things simple, I didn’t deal with concepts such as dealer, betting, insurance, splitting, multiple rounds, people joining or leaving the table.

Process boundaries

So, we need to keep track of different types of states which change over time: a deck of cards, hands of each player, and the state of the round. A naive take on this, would be use multiple processes. We could have one process per each hand, another process for the deck of cards, and the “master” process that drives the entire round. I see people occasionally take similar approach, but I’m not at all convinced that it’s the proper way to go. The main reason is that the game is in its nature highly synchronized. Things happen one by one in a well defined order: I get my cards, I make one or more moves, and when I’m done, you’re next. At any point in time, there’s only one activity happening in a single round.

Using multiple processes to power a single round is therefore going to do more harm than good. With multiple processes, everything is concurrent, so you need to make additional effort to synchronize all the actions. You’ll also need to pay attention to proper process termination and cleanup. If you stop the round process, you need to stop all the associated processes as well. The same should hold in the case of a crash: an exception in a round, or a deck process should likely terminate everything (because the state is corrupt beyond repair). Maybe a crash of a single hand could be isolated, and that might improve fault-tolerance a bit, but I think this is a too fine level to be concerned about fault isolation.

So in this case, I see many potential downsides, and not a lot of benefits for using multiple processes to manage the state of a single round. However, different rounds are mutually independent. They have their own separate flows, hold their separate states, share nothing in common. Thus, managing multiple rounds in a single process is counter productive. It will increase our error surface (failure of one round will take everything down), and possibly lead to worse performance (we’re not using multiple cores), or bottlenecks (a long processing in a single round will paralyze all the others). There are clear wins if we’re running different rounds in separate processes, so that decision is a no-brainer :-)

I frequently say in my talks, that there’s a huge potential for concurrency in complex systems, so we’ll use a lot of processes. But to reap those benefits, we need to use processes where they make sense.

So, all things considered, I’m pretty certain that a single process for managing the entire state of a single round is the way to go. It would be interesting to see what would change if we introduced the concept of a table, where rounds are played perpetually, and players change over time. I can’t say for certain at this point, but I think it’s an interesting exercise in case you want to explore it :-)

Functional modeling

So, how can we separate different concerns without using multiple processes? By using functions and modules, of course. If we spread different parts of the logic across different functions, give those functions proper names, and maybe organize them into properly named modules, we can represent our ideas just fine, without needing to simulate objects with agents.

Let me show you what I mean by walking you through each part of my solution, starting with the simplest one.

A deck of cards

The first concept I want to capture is a deck of card. We want to model a standard deck of 52 cards. We want to start with a shuffled deck, and then be able to take cards from it, one by one.

This is certainly a stateful concept. Every time we take a card, the state of the deck changes. Despite that, we can implement the deck with pure functions.

Let me show you the code. I decided to represent the deck as a list of cards, each card being a map holding a rank and a suit. I can generate all the cards during compilation:

@cards (
  for suit <- [:spades, :hearts, :diamonds, :clubs],
      rank <- [2, 3, 4, 5, 6, 7, 8, 9, 10, :jack, :queen, :king, :ace],
    do: %{suit: suit, rank: rank}
)

Now, I can add the shuffle/0 function to instantiate a shuffled deck:

def shuffled(), do:
  Enum.shuffle(@cards)

And finally, take/1, which takes the top card from the deck:

def take([card | rest]), do:
  {:ok, card, rest}
def take([]), do:
  {:error, :empty}

The take/1 function returns either {:ok, card_taken, rest_of_the_deck}, or {:error, :empty}. Such interface forces a client (a user of the deck abstraction) to explicitly decide how to deal with each case.

Here’s how we can use it:

deck = Blackjack.Deck.shuffled()

case Blackjack.Deck.take(deck) do
  {:ok, card, transformed_deck} ->
    # do something with the card and the transform deck
  {:error, :empty} ->
    # deck is empty -> do something else
end

This is an example of what I like to call a “functional abstraction”, which is a fancy name for:

a bunch of related functions,
with descriptive names,
which exhibit no side-effects,
and are maybe extracted in a separate module

This to me is what corresponds to classes and objects in OO. In an OO language, I might have a Deck class with corresponding methods, here I have a Deck module with corresponding functions. Preferably (though not always worth the effort), functions only transform data, without dealing with temporal logic or side-effects (cross-process messaging, database, network requests, timeouts, …).

It’s less important whether these functions are sitting in a dedicated module. The code for this abstraction is quite simple and it’s used in only one place. Therefore, I could have also defined private shuffled_deck/0 and take_card/1 functions in the client module. This is in fact what I frequently do if the code is small enough. I can always extract later, if things become more complicated.

The important point is that the deck concept is powered by pure functions. No need to reach for an agent to manage a deck of cards.

The complete code of the module is available here.

A blackjack hand

The same technique can be used to manage a hand. This abstraction keeps track of cards in the hand. It also knows how to calculate the score, and determine the hand status (:ok or :busted). The implementation resides in the Blackjack.Hand module.

The module has two functions. We use new/0 to instantiate the hand, and then deal/2 to deal a card to the hand. Here’s an example that combines a hand and a deck:

# create a deck
deck = Blackjack.Deck.shuffled()

# create a hand
hand = Blackjack.Hand.new()

# draw one card from the deck
{:ok, card, deck} = Blackjack.Deck.take(deck)

# give the card to the hand
result = Blackjack.Hand.deal(hand, card)

The result of deal/2 will be in shape of {hand_status, transformed_hand}, where hand_status is either :ok or :busted.

Blackjack round

This abstraction, powered by the Blackjack.Round module, ties everything together. It has following responsibilities:

keeping the state of the deck
keeping the state of all the hands in a round
deciding who’s the next player to move
accepting and interpreting player moves (hit/stand)
taking cards from the deck and passing them to current hand
computing the winner, once all the hands are resolved

The round abstraction will follow the same functional approach as deck and hand. However, there’s an additional twist here, which concerns separation of the temporal logic. A round takes some time and requires interaction with players. For example, when the round starts, the first player needs to be informed about the first two card they got, and then they need to be informed that it’s their turn to make a move. The round then needs to wait until the player makes the move, and only then can it step forward.

My impression is that many people, experienced Erlangers/Elixorians included, would implement the concept of a round directly in a GenServer or :gen_statem. This would allow them to manage the round state and temporal logic (such as communicating with players) in the same place.

However, I believe that these two aspects need to be separated, since they are both potentially complex. The logic of a single round is already somewhat involved, and it can only get worse if we want to support additional aspects of the game, such as betting, splitting, or dealer player. Communicating with players has its own challenges if we want to deal with netsplits, crashes, slow or unresponsive clients. In these cases we might need to support retries, maybe add some persistence, event sourcing, or whatnot.

I don’t want to combine these two complex concerns together, because they’ll become entangled, and it will be harder to work with the code. I want to move temporal concerns somewhere else, and have a pure domain model of a blackjack round.

So instead I opted for an approach I don’t see that often. I captured the concept of a round in a plain functional abstraction.

Let me show you the code. To instantiate a new round, I need to call start/1:

{instructions, round} = Blackjack.Round.start([:player_1, :player_2])

The argument I need to pass is the list of player ids. These can be arbitrary terms, and will be used by the abstraction for various purposes:

instantiating a hand for each player
keeping track of the current player
issuing notifications to players

The function returns a tuple. The first element of the tuple is a list of instructions. In this example, it will be:

[
  {:notify_player, :player_1, {:deal_card, %{rank: 4, suit: :hearts}}},
  {:notify_player, :player_1, {:deal_card, %{rank: 8, suit: :diamonds}}},
  {:notify_player, :player_1, :move}
]

The instructions are the way the abstraction informs its client what needs to be done. As soon as we start the round, two cards are given to the first hand, and then the round instance awaits for the move by the player. So in this example, the abstraction instructs us to:

notify player 1 that it got 4 of hearts
notify player 1 that it got 8 of diamonds
notify player 1 that it needs to make a move

It is the responsibility of the client code to actually deliver these notifications to concerned players. The client code can be say a GenServer, which will send messages to player processes. It will also wait for the players to report back when they want to interact with the game. This is temporal logic, and it’s completely kept outside of the Round module.

The second element of the returned tuple, called round, is the state of the round itself. It’s worth noting that this data is typed as opaque. This means that client shouldn’t read the data inside the round variable. Everything the client needs will be delivered in the instruction list.

Let’s take this round instance one step further, by taking another card as player 1:

{instructions, round} = Blackjack.Round.move(round, :player_1, :hit)

I need to pass the player id, so the abstraction can verify if the right player is making the move. If I pass the wrong id, the abstraction will instruct me to notify the player that it’s not their turn.

Here are the instructions I got:

[
  {:notify_player, :player_1, {:deal_card, %{rank: 10, suit: :spades}}},
  {:notify_player, :player_1, :busted},
  {:notify_player, :player_2, {:deal_card, %{rank: :ace, suit: :spades}}},
  {:notify_player, :player_2, {:deal_card, %{rank: :jack, suit: :spades}}},
  {:notify_player, :player_2, :move}
]

This list tells me that player 1 got 10 of spades. Since it previously had 4 of hearts and 8 of diamonds, the player is busted, and the round immediately moves to the next hand. The client is instructed to notify player 2 that it got two cards, and that it should make a move.

Let’s make a move on behalf of player 2:

{instructions, round} = Blackjack.Round.move(round, :player_2, :stand)

# instructions:
[
  {:notify_player, :player_1, {:winners, [:player_2]}}
  {:notify_player, :player_2, {:winners, [:player_2]}}
]

Player 2 didn’t take another card, and therefore its hand is completed. The abstraction immediately resolves the winner and instructs us to inform both players about the outcome.

Let’s take a look at how Round builds nicely on top of Deck and Hand abstractions. The following function from the Round module takes a card from the deck, and gives it to the current hand:

defp deal(round) do
  {:ok, card, deck} =
    with {:error, :empty} <- Blackjack.Deck.take(round.deck), do:
      Blackjack.Deck.take(Blackjack.Deck.shuffled())

  {hand_status, hand} = Hand.deal(round.current_hand, card)

  round =
    %Round{round | deck: deck, current_hand: hand}
    |> notify_player(round.current_player_id, {:deal_card, card})

  {hand_status, round}
end

We take a card from the deck, optionally using the new deck if the current one is exhausted. Then we pass the card to the current hand, update the round with the new hand and deck status, add a notification instruction about the given card, and return the hand status (:ok or :busted) and the updated round. No extra process is involved in the process :-)

The notify_player invocation is a simple one-liner which pushes a lot of complexity away from this module. Without it, we’d need to send a message to some other process (say another GenServer, or a Phoenix channel). We’d have to find that process somehow, and consider cases when this process isn’t running. A lot of extra complexity would have to be bundled together with the code which models the flow of the round.

But thanks to the instructions mechanism, none of this happens, and the Round module stays focused on the rules of the game. The notify_player function will store the instruction entry. Then later, before returning, a Round function will pull all pending instructions, and return them separately, forcing the client to interpret those instructions.

As an added benefit, the code can now be driven by different kinds of drivers (clients). In the examples above, I drove it manually from the session. Another example is driving the code from tests. This abstraction can now be easily tested, without needing to produce or observe side-effects.

Process organization

With the basic pure model complete, it’s time to turn our attention to the process side of things. As I discussed earlier, I’ll host each round in a separate process. I believe this makes sense, since different rounds have nothing in common. Therefore, running them separately gives us better efficiency, scalability, and error isolation.

Round server

A single round is managed by the Blackjack.RoundServer module, which is a GenServer. An Agent could also serve the purpose here, but I’m not a fan of agents, so I’ll just stick with GenServer. Your preferences may differ, of course, and I totally respect that :-)

In order to start the process, we need to invoke the start_playing/2 function. This name is chosen instead of a more common start_link, since start_link by convention links to the caller process. In contrast, start_playing will start the round somewhere else in the supervision tree, and the process will not be linked to the caller.

The function takes two arguments: the round id, and the list of players. The round id is an arbitrary unique term which needs to be chosen by the client. The server process will be registered in an internal Registry using this id.

Each entry in the list of players is a map describing a client side of the player:

@type player :: %{id: Round.player_id, callback_mod: module, callback_arg: any}

A player is described with its id, a callback module, and a callback arg. The id is going to be passed to the round abstraction. Whenever the abstraction instructs the server to notify some player, the server will invoke callback_mod.some_function(some_arguments), where some_arguments will include round id, player id, callback_arg, and additional, notification-specific arguments.

The callback_mod approach allows us to support different kinds of players such as:

players connected through HTTP
players connected through a custom TCP protocol
a player in the iex shell session
automatic (machine) players

We can easily handle all these players in the same round. The server doesn’t care about any of that, it just invokes callback functions of the callback module, and lets the implementation do the job.

The functions which must be implement in the callback module are listed here:

@callback deal_card(RoundServer.callback_arg, Round.player_id,
  Blackjack.Deck.card) :: any
@callback move(RoundServer.callback_arg, Round.player_id) :: any
@callback busted(RoundServer.callback_arg, Round.player_id) :: any
@callback winners(RoundServer.callback_arg, Round.player_id, [Round.player_id])
  :: any
@callback unauthorized_move(RoundServer.callback_arg, Round.player_id) :: any

These signatures reveal that the implementation can’t manage its state in the server process. This is an intentional decision, which practically forces the players to run outside of the round process. This helps us keeping the round state isolated. If a player crashes or disconnects, the round server still keeps running, and can handle the situation, for example by busting a player if they fail to move within a given time.

Another nice consequence of this design is that testing of the server is fairly straightforward. The test implements the notifier behaviour by sending itself messages from every callback. Testing then boils down to asserting/refuting particular messages, and invoking RoundServer.move/3 to make the move on behalf of the player.

Sending notifications

When functions from the Round module return the instruction list to the server process, it will walk through them, and interpret them.

The notifications themselves are sent from separate processes. This is an example where we can profit from extra concurrency. Sending notifications is a task which is separate from the task of managing the state of the round. The notifications logic might be burdened by issues such as slow or disconnected clients, so it’s worth doing this outside of the round process. Moreover, notifications to different players have nothing in common, so they can be sent from separate processes. However, we need to preserve the order of notifications for each player, so we need a dedicated notification process per each player.

This is implemented in the Blackjack.PlayerNotifier module, a GenServer based process whose role is to send notification to a single player. When we start the round server with the start_playing/2 function, a small supervision subtree is started which hosts the round server together with one notifier server per each player in the round.

When the round server plays a move, it will get a list of instructions from the round abstraction. The server will then forward each instruction to the corresponding notifier server which will interpret the instruction and invoke a corresponding M/F/A to notify the player.

Hence, if we need to notify multiple players, we’ll do it separately (and possibly in parallel). As a consequence, the total ordering of messages is not preserved. Consider the following sequence of instructions:

[
  {:notify_player, :player_1, {:deal_card, %{rank: 10, suit: :spades}}},
  {:notify_player, :player_1, :busted},
  {:notify_player, :player_2, {:deal_card, %{rank: :ace, suit: :spades}}},
  {:notify_player, :player_2, {:deal_card, %{rank: :jack, suit: :spades}}},
  {:notify_player, :player_2, :move}
]

It might happen that player_2 messages arrives before player_1 is informed that it’s busted. But that’s fine, since those are two different players. The ordering of messages per each player is of course preserved, courtesy of player-specific notifier server process.

Before parting, I want to drive my point again: owing to the design and functional nature of the Round module, all this notifications complexity is kept outside of the domain model. Likewise, notification part is not concerned with the domain logic.

The blackjack service

The picture is completed in the form of the :blackjack OTP application (the Blackjack module). When you start the application, a couple of locally registered processes are started: an internal Registry instance (used to register round and notifier servers), and a :simple_one_for_one supervisor which will host process subtree for each round.

This application is now basically a blackjack service that can manage multiple rounds. The service is generic and not depending on a particular interface. You can use it with Phoenix, Cowboy, Ranch (for plain TCP), elli, or whatever else suits your purposes. You implement a callback module, start client processes, and start the round server.

You can see an example in the Demo module, which implements a simple auto player, a GenServer powered notifier callback, and a starting logic which starts the round with five players:

$ iex -S mix
iex(1)> Demo.run

player_1: 4 of spades
player_1: 3 of hearts
player_1: thinking ...
player_1: hit
player_1: 8 of spades
player_1: thinking ...
player_1: stand

player_2: 10 of diamonds
player_2: 3 of spades
player_2: thinking ...
player_2: hit
player_2: 3 of diamonds
player_2: thinking ...
player_2: hit
player_2: king of spades
player_2: busted

...

Here’s how a supervision tree looks like when we have five simultaneous rounds, each with five players:

Conclusion

So, can we manage a complex state in a single process? We certainly can! Simple functional abstractions such as Deck and Hand allowed me to separate concerns of a more complex round state without needing to resort to agents.

That doesn’t mean we need to be conservative with processes though. Use processes wherever they make sense and bring some clear benefits. Running different rounds in separate processes improves scalability, fault-tolerance, and the overall performance of the system. The same thing applies for notification processes. These are different runtime concerns, so there’s no need to run them in the same runtime context.

If temporal and/or domain logic are complex, consider separating them. The approach I took allowed me to implement a more involved runtime behaviour (concurrent notifications) without complicating the business flow of the round. This separation also puts me in a nice spot, since I can now evolve both aspects separately. Adding the support for dealer, split, insurance, and other business concepts should not affect the runtime aspect significantly. Likewise, supporting netsplits, reconnects, player crashes, or timeouts should not require the changes in the domain logic.

Finally, it’s worth keeping the end goal in mind. While I didn’t go there (yet), I always planned for this code to be hosted in some kind of a web server. So some decisions are taken to support this scenario. In particular, the implementation of RoundServer, which takes a callback module for each player, allows me to hook up with different kinds of clients powered by various technologies. This keeps the blackjack service agnostic of particular libraries and frameworks (save for standard libraries and OTP of course), and completely flexible.

Copyright 2017, Saša Jurić. This article is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
The article was first published on The Erlangelist site.
The source of the article can be found here.

Reducing the maximum latency of a bound buffer

Mon, 19 Dec 16 00:00:00 +0000

Reducing the maximum latency of a bound buffer

2016-12-19

Recently I came across two great articles on the Pusher blog: Low latency, large working set, and GHC’s garbage collector: pick two of three and Golang’s Real-time GC in Theory and Practice. The articles tell the story of how Pusher engineers reimplemented their message bus. The first take was done in Haskell. During performance tests they noticed some high latencies in the 99 percentile range. After they bared down the code they were able to prove that these spikes are caused by the GHC stop-the-world garbage collector coupled with a large working set (the number of in-memory objects). The team then experimented with Go and got much better results, owing to Go’s concurrent garbage collector.

I highly recommend both articles. The Pusher test is a great benching example, as it is focused on solving a real challenge, and evaluating a technology based on whether it’s suitable for the job. This is the kind of evaluation I prefer. Instead of comparing technologies through some shallow synthetic benchmarks, such as passing a token through the ring, or benching a web server which returns “200 OK”, I find it much more useful to make a simple implementation of the critical functionality, and then see how it behaves under the desired load. This should provide the answer to the question “Can I solve X efficiently using Y?”. This is the approach I took when I first evaluated Erlang. A 12 hours test of the simulation of the real system with 10 times of the expected load convinced me that the technology is more than capable for what I needed.

Challenge accepted

Reading the Pusher articles made me wonder how well would the Elixir implementation perform. After all, the underlying Erlang VM (BEAM) has been built with low and predictable latency in mind, so coupled with other properties such as fault-tolerance, massive concurrency, scalability, support for distributed systems, it seems like a compelling choice for the job.

So let me define the challenge, based on the original articles. I’ll implement a FIFO buffer that can handle two operations: push, and pull. The buffer is bound by some maximum size. If the buffer is full, a push operation will overwrite the oldest item in the queue.

The goal is to reduce the maximum latency of push and pull operations of a very large buffer (max 200k items). It’s important to keep this final goal in mind. I care about smoothing out latency spikes of buffer operations. I care less about which language gives me better worst-case GC pauses. While the root issue of the Pusher challenge is caused by long GC pauses, that doesn’t mean that I can solve it only by moving to another language. As I’ll demonstrate, relying on a few tricks in Elixir/Erlang, we can bypass GC completely and bring max latency into the microseconds area.

Measuring

To measure the performance, I decided to run the buffer in a separate GenServer powered process. You can see the implementation here.

The measurements are taken using Erlang’s tracing capabilities. A separate process is started, which sets up the trace of the buffer process. It receives start/finish times of push and pull operations as well as buffer’s garbage collections. It collects those times, and when asked, produces the final stats. You can find the implementation here.

Tracing will cause some slowdowns. The whole bench seems to take about 2x longer when the tracing is used. I can’t say how much does it affect the reported times, but I don’t care that much. If I’m able to get good results with tracing turned on, then the implementation should suffice when the tracing is turned off :-)

For those of you not familiar with Erlang, the word process here refers to an Erlang process - a lightweight concurrent program that runs in the same OS process and shares nothing with other Erlang processes. At the OS level, we still have just one OS process, but inside it multiple Erlang processes are running separately.

These processes have nothing in common, share no memory and can only communicate by sending themselves messages. In particular, each process has its own separate heap, and is garbage collected separately to other processes. Therefore, whatever data is allocated by the tracer process code will not put any GC pressure on the buffer. Only the data we’re actually pushing to the buffer will be considered during buffer’s GC, and can thus affect the latency of buffer operations. This approach demonstrates a great benefit of Erlang. By running different things in separate processes, we can prevent GC pressure in one process to affect others in the system. I’m not aware of any other lightweight concurrency platform which provides such guarantees.

The test first starts with a brief “stretch” warmup. I create the buffer with the maximum capacity of 200k items (the number used in the Pusher benches). Then, I push 200k items, then pull all of them, and then again push 200k items. At the end of the warmup, the buffer is at its maximum capacity.

Then the bench starts. I’m issuing 2,000,000 requests in cycles of 15 pushes followed by 5 pulls. The buffer therefore mostly operates in the “overflow” mode. In total, 1,000,000 pushes are performed on the full buffer, with further 500,000 pushes on a nearly full buffer. The items being pushed are 1024 bytes Erlang binares, and each item is different from others, meaning the test will create 1,500,000 different items.

The bench code resides here. The full project is available here. I’ve benched it using Erlang 19.1 and Elixir 1.3.4, which I installed with the asdf version manager. The tests are performed on my 2011 iMac (3,4 GHz Intel Core i7).

Functional implementation

First, I’ll try with what I consider an idiomatic approach in Elixir and Erlang - a purely functional implementation, based on the :queue module. According to docs, this module implements a double-ended FIFO queue in an efficient manner with most operations having an amortized O(1) running time. The API of the module provides most of the things needed. I can use :queue.in/2 and :queue.out/2 to push/pull items. There is no direct support for setting the maximum size, but it’s fairly simple to implement this on top of the :queue module. You can find my implementation here.

When I originally read the Pusher articles, I was pretty certain that such implementation will lead to some larger latency spikes. While there’s no stop-the-world GC in Erlang, there is still stop-the-process GC. An Erlang process starts with a fairly small heap (~ 2 Kb), and if it needs to allocate more than that, then the process is GC-ed and its heap is possibly expanded. For more details on GC, I recommend this article and this one.

In our test, this means that the buffer process will pretty soon expand to some large heap which needs to accommodate 200k items. Then, as we’re pushing more items and create the garbage, the GC will have a lot of work to do. Consequently, we can expect some significant GC pauses which will lead to latency spikes. Let’s verify this:

$ mix buffer.bench -m Buffer.Queue

push/pull (2000000 times, average: 6.9 μs)
  99%: 17 μs
  99.9%: 32 μs
  99.99%: 74 μs
  99.999%: 21695 μs
  100%: 37381 μs
  Longest 10 (μs): 27134 27154 27407 27440 27566 27928 28138 28899 33616 37381

gc (274 times, average: 8514.46 μs)
  99%: 22780 μs
  99.9%: 23612 μs
  99.99%: 23612 μs
  99.999%: 23612 μs
  100%: 23612 μs
  Longest 10 (μs): 21220 21384 21392 21516 21598 21634 21949 22233 22780 23612

Buffer process memory: 35396 KB
Total memory used: 330 MB

There’s a lot of data here, so I’ll highlight a few numbers I find most interesting.

I’ll start with the average latency of buffer operations. Averages get some bad reputation these days, but I still find them a useful metric. The observed average latency of 6.9 microseconds tells me that this implementation can cope with roughly 145,000 operations/sec without lagging, even if the buffer is completely full. If I can tolerate some latency variations, and don’t expect requests at a higher rate, then the :queue implementation should suit my needs.

Looking at the latency distributions, we can see that the max latency is ~ 37 milliseconds. That might be unacceptable, or it may be just fine, depending on the particular use case. It would be wrong to broadly extrapolate that this :queue powered buffer always sucks, or to proclaim that it works well for all cases. We can only interpret these numbers if we know the specifications and requirements of the particular problem at hand.

If you look closer at latency distributions of push/pull operations, you’ll see that the latency grows rapidly between four and five nines, where it transitions from two digits microseconds into a two digits milliseconds area. With 2M operations, that means we experience latency spikes for less than 200 of them. Again, whether that’s acceptable or not depends on constraints of the particular problem.

The printed GC stats are related only to the buffer process. We can see that 274 GCs took place in that buffer process, with high percentile latencies being in the two-digit milliseconds range. Unsurprisingly, there seems to be a strong correlation between GC times and latency spikes which start in the 4-5 nines percentile range.

Finally, notice how the buffer process heap size is 35 MB. You might expect it to be more around 200 MB, since the buffer holds 200k items, each being 1024 bytes. However, in this bench, the items are so called refc binaries, which means they are stored on the separate heap. The buffer process heap holds only references to these binaries, not the data itself.

Of course, the buffer process still has 200k live references on its heap, together with any garbage from the removed messages, and this is what causes latency spikes. So if I was just looking at worst-case GC times comparing them to other languages, Erlang wouldn’t fare well, and I might wrongly conclude that it’s not suitable for the job.

ETS based implementation

However, I can work around the GC limitation with ETS tables. ETS tables come in a couple of shapes, but for this article I’ll keep it simple by saying they can serve as an in-process in-memory key-value store. When it comes to semantics, ETS tables don’t bring anything new to the table (no pun intended). You could implement the same functionality using plain Erlang processes and data structure.

However, ETS tables have a couple of interesting properties which can make them perform very well in some cases. First, ETS table data is stored in a separate memory space outside of the process heap. Hence, if we use ETS table to store items, the buffer process doesn’t need to hold a lot of live references anymore, which should reduce its GC times. Moreover, the data in ETS tables is released immediately on removal. This means that we can completely remove GCs of a large set.

My implementation of an ETS based buffer is based on the Pusher’s Go implementation. Basically, I’m using ETS table to simulate a mutable array, by storing k-v pairs of (index, value) into the table. I’m maintaining two indices, one determines where I’m going to push the next item, another does the same for the pull operation. Originally they both start with the value of zero. Then, each push stores a (push_index, value) pair to the table, and increases the push index by one. If the push index reaches the maximum buffer size, it’s set to zero. Likewise, when pulling the data, I read the value associated with the pull_index key, and then increment the pull index. If the buffer is full, a push operation will overwrite the oldest value and increment both indices, thus making sure that the next pull operation will read from the proper location. The full implementation is available here.

Let’s see how it performs:

$ mix buffer.bench -m Buffer.Ets

push/pull (2000000 times, average: 6.53 μs)
  99%: 27 μs
  99.9%: 37 μs
  99.99%: 50 μs
  99.999%: 66 μs
  100%: 308 μs
  Longest 10 (μs): 76 80 83 86 86 96 106 186 233 308

gc (97062 times, average: 5.16 μs)
  99%: 10 μs
  99.9%: 20 μs
  99.99%: 30 μs
  99.999%: 44 μs
  100%: 44 μs
  Longest 10 (μs): 30 30 34 34 34 39 43 44 44 44

Buffer process memory: 30 KB
Total memory used: 312 MB

The average time of 6.53 microseconds is not radically better than the :queue powered implementation. However, the latency spikes are now much smaller. The longest observed latency is 308 microseconds, while in the five nines range, we’re already in the two-digits microseconds area. In fact, out of 2,000,000 pushes, only 4 of them took longer than 100 microseconds. Not too shabby :-)

Full disclosure: these results are the best ones I got after a couple of runs. On my machine, the max latency sometimes goes slightly above 1ms, while other numbers don’t change significantly. In particular, 99.999% is always below 100 μs.

Looking at GC stats, you can see a large increase in the number of GCs of the buffer process. In the :queue implementation, the buffer process triggered 274 GCs, while in this implementation we observe about 97,000 GCs. What’s the reason for this? Keep in mind that the buffer process still manages some data in its own heap. This includes indices for next push/pull operation, as well as temporary references to items which have just been pushed/pulled. Since a lot of requests arrive to the buffer process, it’s going to generate a lot of garbage. However, given that buffer items are stored in a separate heap of the ETS table, the buffer will never maintain a large live set. This corresponds to Pusher’s conclusions. The GC spike problem is not related to the amount of generated garbage, but rather to the amount of live working set. In this implementation we reduced that set, keeping the buffer process heap pretty small. Consequently, although we’ll trigger a lot of GCs, these will be pretty short. The longest observed GC of the buffer process took only 44 microseconds.

Final thoughts

Given Erlang’s stop-the-process GC properties, we might sometimes experience large pauses in some processes. However, there are some options at our disposal which can help us trim down large spikes. The main trick to control these pauses is to keep the process heap small. A large active heap coupled with frequent incoming requests is going to put more pressure on the GC, and latency is going to increase.

In this particular example, using ETS helped me reduce the heap size of the buffer process. Although the number of GCs has increased dramatically, the GC pauses were pretty short keeping the overall latency stable. While Erlang is certainly not the fastest platform around, it allows me to keep my latency predictable. I build the system, fine-tune it to reach desired performance, and I can expect less surprises in production.

It’s worth mentioning two other techniques that might help you reduce your GC spikes. One approach is to split the process that manages a big heap into multiple processes with smaller working sets. This will lead to fragmented GCs, and possibly remove spikes.

In some cases you can capitalize on the fact that the process memory is immediately released when the process terminates. If you need to perform a one-off job which allocates a lot of temporary memory, you can consider using Process.spawn which allows you to explicitly preallocate a larger heap when starting the process. That might completely prevent GC from happening in that process. You do your calculation, spit out the result, and finally terminate the process so all of its memory gets immediately reclaimed without ever being GC-ed.

Finally, if you can’t make some critical part of your system efficient in Erlang, you can always resort to in-process C with NIFs or out-process anything with ports, keeping Elixir/Erlang as your main platform and the “controller plane” of the system. Many options are on the table, which gives me a lot of confidence that I’ll be able to handle any challenge I encounter, no matter how tricky it might be.

Copyright 2016, Saša Jurić. This article is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
The article was first published on The Erlangelist site.
The source of the article can be found here.

Observing low latency in Phoenix with wrk

Sun, 12 Jun 16 00:00:00 +0000

Observing low latency in Phoenix with wrk

2016-06-12

Recently there were a couple of questions on Elixir Forum about observed performance of a simple Phoenix based server (see here for example). People reported some unspectacular numbers, such as a throughput of only a few thousand requests per second and a latency in the area of a few tens of milliseconds.

While such results are decent, a simple server should be able to give us better numbers. In this post I’ll try to demonstrate how you can easily get some more promising results. I should immediately note that this is going to be a shallow experiment. I won’t go into deeper analysis, and I won’t deal with tuning of VM or OS parameters. Instead, I’ll just pick a few low-hanging fruits, and rig the load test by providing the input which gives me good numbers. The point of this post is to demonstrate that it’s fairly easy to get (near) sub-ms latencies with a decent throughput. Benching a more real-life like scenario is more useful, but also requires a larger effort.

Building the server

I’m going to load test a simple JSON API:

$ curl -X POST \
    -H "Content-Type: application/json" \
    -d '{"a": 1, "b": 2}' \
    localhost:4000/api/sum

{"result":3}

It’s not spectacular but it will serve the purpose. The server code will read and decode the body, then perform the computation, and produce an encoded JSON response. This makes the operation mostly CPU bound, so under load I expect to see CPU usage near 100%.

So let’s build the server. First, I’ll create a basic mix skeleton:

$ mix phoenix.new bench_phoenix --no-ecto --no-brunch --no-html

I don’t need ecto, brunch, or html support, since I’ll be exposing only a simple API interface.

Next, I need to implement the controller:

defmodule BenchPhoenix.ApiController do
  use BenchPhoenix.Web, :controller

  def sum(conn, %{"a" => a, "b" => b}) do
    json(conn, %{result: a + b})
  end
end

And add a route:

defmodule BenchPhoenix.Router do
  use BenchPhoenix.Web, :router

  pipeline :api do
    plug :accepts, ["json"]
  end

  scope "/api", BenchPhoenix do
    pipe_through :api

    post "/sum", ApiController, :sum
  end
end

Now I need to change some settings to make the server perform better. In prod.exs, I’ll increase the logger level to :warn:

config :logger, level: :warn

By default, the logger level is set to :info meaning that each request will be logged. This leads to a lot of logging under load, which will cause the Logger to start applying back pressure. Consequently, logging will become a bottleneck, and you can get crappy results. Therefore, when measuring, make sure to avoid logging all requests, either by increasing the logger level in prod, or by changing the log level of the request to :debug in your endpoint (with plug Plug.Logger, log: :debug).

Another thing I’ll change is the value of the max_keepalive Cowboy option. This number specifies the maximum number of requests that can be served on a single connection. The default value is 100, meaning that the test would have to open new connections frequently. Increasing this value to something large will allow the test to establish the connections only once and reuse them throughout the entire test. Here’s the relevant setting in prod.exs:

config :bench_phoenix, BenchPhoenix.Endpoint,
  http: [port: 4000,
    protocol_options: [max_keepalive: 5_000_000]
  ],
  url: [host: "example.com", port: 80],
  cache_static_manifest: "priv/static/manifest.json"

Notice that I have also hardcoded the port setting to 4000 so I don’t need to specify it through the environment.

I also need to tell Phoenix to start the server when the system starts:

config :bench_phoenix, BenchPhoenix.Endpoint, server: true

I plan to run the system as the OTP release. This is a recommended way of running Erlang in production, and it should give me better performance than iex -S mix. To make this work, I need to add exrm as a dependency:

defp deps do
  [..., {:exrm, "~> 1.0"}]
end

Finally, I need to setup the load-test script. I’ll be using the wrk tool, so I’ll create the wrk.lua script:

request = function()
  a = math.random(100)
  b = math.random(100)
  wrk.method = "POST"
  wrk.headers["Content-Type"] = "application/json"
  wrk.body = '{"a":' .. a .. ',"b":' .. b .. '}'
  return wrk.format(nil, "/api/sum")
end

And that’s it! The server is now ready to be load tested. You can find the complete code here.

Running the test

I’ll be running tests on my 2011 iMac:

Model Name: iMac
Model Identifier: iMac12,2
Processor Name: Intel Core i7
Processor Speed:  3,4 GHz
Number of Processors: 1
Total Number of Cores:  4
Memory: 8 GB

Let’s start the OTP release:

$ MIX_ENV=prod mix do deps.get, compile, release && \
    rel/bench_phoenix/bin/bench_phoenix foreground

First, I’ll quickly verify that the server works:

$ curl -X POST \
    -H "Content-Type: application/json" \
    -d '{"a": 1, "b": 2}' \
    localhost:4000/api/sum

{"result":3}

And now I’m ready to start the test:

$ wrk -t12 -c12 -d60s --latency -s wrk.lua "http://localhost:4000"

The parameters here are rigged to make the results attractive. I’m using as few connections as needed (the number was chosen after a couple of trial runs) to get close to the server’s max capacity. Adding more connections would cause the test to issue more work than the server can cope with, so consequently the latency would suffer. If you’re running the test on your own machine, you might need to tweak these numbers a bit to get the best results.

Let’s see the output:

Running 1m test @ http://localhost:4000
  12 threads and 12 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   477.31us  123.80us   3.05ms   75.66%
    Req/Sec     2.10k   198.83     2.78k    62.43%
  Latency Distribution
     50%  450.00us
     75%  524.00us
     90%  648.00us
     99%    0.87ms
  1435848 requests in 1.00m, 345.77MB read
Requests/sec:  23931.42
Transfer/sec:      5.76MB

I’ve observed a throughput of ~ 24k requests/sec, with 99th percentile latency below 1ms, and the maximum observed latency at 3.05ms. I also started htop and confirmed that all cores were near 100% usage, meaning the system was operating near its capacity.

For good measure, I also ran a 5 minute test, to verify that the results are consistent:

Running 5m test @ http://localhost:4000
  12 threads and 12 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   484.19us  132.26us  12.98ms   76.35%
    Req/Sec     2.08k   204.89     2.80k    70.10%
  Latency Distribution
     50%  454.00us
     75%  540.00us
     90%  659.00us
     99%    0.89ms
  7090793 requests in 5.00m, 1.67GB read
Requests/sec:  23636.11
Transfer/sec:      5.69MB

The results seems similar to a 1 minute run, with a bit worrying difference in the maximum latency, which is now 13ms.

It’s also worth verifying how the latency is affected when the system is overloaded. Let’s use a bit more connections:

$ wrk -t100 -c100 -d1m --latency -s wrk.lua "http://localhost:4000"

Running 1m test @ http://localhost:4000
  100 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    13.40ms   24.24ms 118.92ms   90.16%
    Req/Sec   256.22    196.73     2.08k    74.35%
  Latency Distribution
     50%    4.50ms
     75%    9.35ms
     90%   36.13ms
     99%  100.02ms
  1462818 requests in 1.00m, 352.26MB read
Requests/sec:  24386.51
Transfer/sec:      5.87MB

Looking at htop, I observed that CPU is fully maxed out, so the system is completely using all the available hardware and operating at its max capacity. Reported latencies are quite larger now, since we’re issuing more work than the system can handle on the given machine.

Assuming the code is optimized, the solution could be to scale up and put the system on a more powerful machine, which should restore the latency. I don’t have such machine available, so I wasn’t able to prove it.

It’s also worth considering guarding the system against overloads by making it refuse more work than it can handle. Although that doesn’t seem like a perfect solution, it can allow the system to operate within its limits and thus maintain the latency within bounds. This approach would make sense if you have some fix upper bound on the acceptable latency. Accepting requests which can’t be served within the given time frame doesn’t make much sense, so it’s better to refuse them upfront.

Conclusion

I’d like to stress again that this was a pretty shallow test. The main purpose was to prove that we can get some nice latency numbers with a fairly small amount of effort. The results look promising, especially since they were obtained on my personal box, where both the load tester and the server were running, as well as other applications (mail client, browser, editor, …).

However, don’t be tempted to jump to conclusions too quickly. A more exhaustive test would require a dedicated server, tuning of OS parameters, and playing with the emulator flags such as +K and +s. It’s also worth pointing out that synthetic tests can easily be misleading, so be sure to construct an example which resembles the real use case you’re trying to solve.

Phoenix is modular

Mon, 22 Feb 16 00:00:00 +0000

Phoenix is modular

2016-02-22

A few days ago I saw this question on #elixir-lang channel:

Have any of you had the initial cringe at the number of moving parts Phoenix needs to get you to just “Hello World” ?

Coincidentally, on that same day I received a mail where a developer briefly touched on Phoenix:

I really like Elixir, but can’t seem to find happiness with Phoenix. Too much magic happening there and lots of DSL syntax, diverts from the simplicity of Elixir while not really giving a clear picture of how things work under the hood. For instance, they have endpoints, routers, pipelines, controllers. Can we not simplify endpoints, pipelines and controllers into one thing - say controllers…

I can sympathize with such sentiments. When I first looked at Phoenix, I was myself overwhelmed by the amount of concepts one needs to grasp. But after spending some time with the framework, it started making sense to me, and I began to see the purpose of all these concepts. I quickly became convinced that Phoenix provides reasonable building blocks which should satisfy most typical needs.

Furthermore, I’ve learned that Phoenix is actually quite modular. This is nice because we can trim it down to our own preferences (though in my opinion that’s usually not needed). In fact, it is possible to run a Phoenix powered server without a router, controller, view, and template. In this article I’ll show you how, and then I’ll provide some tips on learning Phoenix. But first, I’ll briefly touch on the relationship between Phoenix and Plug.

Phoenix and Plug

Phoenix owes its modularity to Plug. Many Phoenix abstractions, such as endpoint, router, or controller, are implemented as plugs, so let’s quickly recap the idea of Plug.

When a request arrives, the Plug library will create a Plug.Conn struct (aka conn). This struct bundles various fields describing the request (e.g. the IP address of the client, the path, headers, cookies) together with the fields describing the response (e.g. status, body, headers). Once the conn struct is initialized, Plug will call our function to handle the request. The task of our code is to take the conn struct and return the transformed version of it with populated output fields. The Plug library then uses the underlying HTTP library (for example Cowboy) to return the response. There are some fine-print variations to this concept, but they’re not relevant for this discussion.

So essentially, our request handler is a function that takes a conn and transforms it. In particular, each function that takes two arguments (a conn and arbitrary options) is called a plug. Additionally, a plug can be a module that implements two functions init/1 which provides the options, and call/2 which takes a conn and options, and returns the transformed conn.

Request handlers can be implemented as a chain of such plugs, with the help of Plug.Builder. Since a plug is basically a function, your request handler boils down to a chain of functions threading the conn struct. Each function takes a conn, does it’s own processing, and produces a transformed version of it. Then the next function in the chain is invoked to do its job.

Each plug in the chain can do various tasks, such as logging (Plug.Logger), converting the input (for example Plug.Head which transforms a HEAD request into GET), or producing the output (e.g. Plug.Static which serves files from the disk). It is also easy to write your own plugs, for example to authenticate users, or to perform some other custom action. For example, for this site I implemented a plug which counts visits, measures the processing time, and sends stats to graphite. Typically, the last function in the chain will be the “core” handler which performs some request-specific processing, such as data manipulation, or some computation, and produces the response.

When it comes to Phoenix, endpoint, router, and controllers are all plugs. Your request arrives to the endpoint which specifies some common plugs (e.g. serving of static files, logging, session handling). By default, the last plug listed in the endpoint is the router where request path is mapped to some controller, which is itself yet another plug in the chain.

Trimming down Phoenix

Since all the pieces in Phoenix are plugs, and plugs are basically functions, nothing stops you from removing any part out of the chain. The only thing you need for a basic Phoenix web app is the endpoint. Let’s see an example. I’ll create a simple “Hello World” web server based on Phoenix. This server won’t rely on router, controllers, views, and templates.

First, I need to generate a new Phoenix project with mix phoenix.new simple_server –no-ecto –no-brunch –no-html. The options specify I want to omit Ecto, Brunch, and HTML views from the generated project. This already makes the generated code thinner than the default version.

There are still some pieces that can be removed, and I’ve done that in this commit. The most important change is that I’ve purged all the plugs from the endpoint, reducing it to:

defmodule SimpleServer.Endpoint do
  use Phoenix.Endpoint, otp_app: :simple_server
end

All requests will end up in an endpoint which does nothing, so every request will result in a 500 error. This is a consequence of removing all the default stuff. There are no routers, controllers, views, or templates anymore, and there’s no default behaviour. The “magic” has disappeared and it’s up to us to recreate it manually.

Handling a request can now be as simple as:

defmodule SimpleServer.Endpoint do
  use Phoenix.Endpoint, otp_app: :simple_server

  plug :render

  def render(conn, _opts) do
    Plug.Conn.send_resp(conn, 200, "Hello World!")
  end
end

And there you have it! A Phoenix-powered “Hello World” in less than 10 lines of code. Not so bad :-)

Reusing desired Phoenix pieces

Since Phoenix is modular, it’s fairly easy to reintroduce some parts of it if needed. For example, if you want to log requests, you can simply add the following plugs to your endpoint:

plug Plug.RequestId
plug Plug.Logger

If you want to use the Phoenix router, you can add plug MyRouter where MyRouter is built on top of Phoenix.Router. Perhaps you prefer the Plug router? Simply implement MyRouter as Plug.Router.

Let’s see a different example. Instead of shaping strings manually, I’ll reuse Phoenix templates support, so I can write EEx templates.

First, I’ll create the web/templates/index.html.eex file:


  
    Hello World!

Then, relying on Phoenix.Template, I’ll compile all templates from the web/templates folder into a single module:

defmodule SimpleServer.View do
  use Phoenix.Template, root: "web/templates"
end

Now, I can call SimpleServer.View.render(“index.html”) to produce the output string:

defmodule SimpleServer.Endpoint do
  use Phoenix.Endpoint, otp_app: :simple_server

  plug :render

  def render(conn, _opts) do
    conn
    |> Plug.Conn.put_resp_content_type("text/html")
    |> Plug.Conn.send_resp(200, SimpleServer.View.render("index.html"))
  end
end

Finally, I need to set the encoder for the HTML format in config.exs:

# config.exs

config :phoenix, :format_encoders, html: Phoenix.HTML.Engine

# ...

And that’s it! The output is now rendered through a precompiled EEx template. And still, no router, controller, or Phoenix view has been used. You can find the complete solution here.

It’s worth noting that by throwing most of the default stuff out, we also lost many benefits of Phoenix. This simple server doesn’t serve static files, log requests, handle sessions, or parse the request body. Live reload also won’t work. You can of course reintroduce these features if you need them.

What’s the point?

To be honest, I usually wouldn’t recommend this fully sliced-down approach. My impression is that the default code generated with mix phoenix.new is a sensible start for most web projects. Sure, you have to spend some time understanding the flow of a request, and roles of endpoint, router, view, and template, but I think it will be worth the effort. At the end of the day, as Chris frequently said, Phoenix aims to provide the “batteries included” experience, so the framework is bound to have some inherent complexity. I wouldn’t say it’s super complex though. You need to take some time to let it sink in, and you’re good to go. It’s a one off investment, and not a very expensive one.

That being said, if you have simpler needs, or you’re overwhelmed by many different Phoenix concepts, throwing some stuff out might help. Hopefully it’s now obvious that Phoenix is quite tunable. Once you understand Plug it’s fairly easy to grasp how a request is handled in Phoenix. Tweaking the server to your own needs is just a matter of removing the plugs you don’t want. In my opinion, this is the evidence of a good and flexible design. All the steps are spelled out for you in your project’s code, so everything is explicit and you can tweak it as you please.

Learning tips

Learning Phoenix is still not a small task, especially if you’re new to Elixir and OTP. If your Elixir journey starts with Phoenix, you’ll need to learn the new language, adapt to functional programming, understand BEAM concurrency, become familiar with OTP, and learn Plug, Phoenix, and probably Ecto. While none of these tasks is a “rocket science”, there’s obviously quite a lot of ground to cover. Taking so many new things at once can overwhelm even the best of us.

So what can be done about it?

One possible approach is a full “bottom-up”, where you focus first on Elixir, learn its building blocks and familiarize yourself with functional programming. Then you can move to vanilla processes, then to OTP behaviours (most notably GenServer and Supervisor), and finally OTP applications. Once you gain some confidence there, you “only” need to understand Plug and Phoenix specifics, which should be easier if you built solid foundations. I’m not suggesting you need to fully master one phase before moving to the next one. But I do think that building some solid understanding of basic concepts will make it easier to focus on the next stage.

The benefit of this approach is that you get a steady incremental progress. Understanding concurrency is easier if you don’t have to wrestle with the language. Grasping Phoenix is easier if you’re already confident with Elixir, OTP, and Plug. The downside is that you’ll reach the final goal at the very end. You’re probably interested in Phoenix because you want to build scalable, distributed, real-time web servers, but you’ll spend a lot of time transforming lists with plain recursion, or passing messages between processes, before you’re even able to handle a basic request. It takes some commitment to endure this first period.

If you prefer to see some tangible results immediately, you could consider a “two-pass bottom-up” approach. In this version, you could first go through excellent official getting started guides on Elixir and Phoenix sites. These should get you up to speed more swiftly than reading a few hundred pages book(s), though you won’t get as much depth. On the plus side, you’ll be able to experiment and prototype much earlier in the learning process. Then you can start refining your knowledge in the second pass, perhaps by reading some books, watching videos, or reading the official docs.

There are of course many other strategies you can take, so it’s up to you to choose what works best for you. Whichever way you choose, don’t be overwhelmed by the amount of material. Try to somehow split the learning path into smaller steps, and take new topics gradually. It’s hard if not impossible to learn everything at once. It’s a process that takes some time, but in my opinion, the effort is definitely worth the gain. I’m a very happy customer of Erlang/OTP/Elixir/Phoenix, and I don’t think any other stack can give me the same benefits.

Driving Phoenix sockets

Mon, 25 Jan 16 00:00:00 +0000

Driving Phoenix sockets

2016-01-25

A few months ago, we’ve witnessed Phoenix team establishing 2 millions simultaneous connections on a single server. In the process, they also discovered and removed some bottlenecks. The whole process is documented in this excellent post. This achievement is definitely great, but reading the story begs a question: do we really need a bunch of expensive servers to study the behaviour of our system under load?

In my opinion, many issues can be discovered and tackled on a developer’s machine, and in this post, I’ll explain how. In particular, I’ll discuss how to programmatically “drive” a Phoenix socket, talk a bit about the transport layer, and cap it off by creating a half million of Phoenix sockets on my dev machine and explore the effects of process hibernation on memory usage.

The goal

The main idea is fairly simple. I’ll develop a helper SocketDriver module, which will allow me to create a Phoenix socket in a separate Erlang process, and then control it by sending it channel-specific messages.

Assuming we have a Phoenix application with a socket and a channel, we’ll be able to create a socket in a separate process by invoking:

iex(1)> {:ok, socket_pid} = SocketDriver.start_link(
          SocketDriver.Endpoint,
          SocketDriver.UserSocket,
          receiver: self
        )

The receiver: self bit specifies that all outgoing messages (the ones sent by the socket to the other side) will be sent as plain Erlang messages to the caller process.

Now I can ask the socket process to join the channel:

iex(2)> SocketDriver.join(socket_pid, "ping_topic")

Then, I can verify that the socket sent the response back:

iex(3)> flush

{:message,
 %Phoenix.Socket.Reply{payload: %{"response" => "hello"},
  ref: #Reference<0.0.4.1584>, status: :ok, topic: "ping_topic"}}

Finally, I can also push a message to the socket and verify the outgoing message:

iex(4)> SocketDriver.push(socket_pid, "ping_topic", "ping", %{})

iex(5)> flush
{:message,
 %Phoenix.Socket.Message{event: "pong", payload: %{}, ref: nil,
  topic: "ping_topic"}}

With such driver I can now easily create a bunch of sockets from the iex shell and play with them. Later on you’ll see a simple demo, but let’s first explore how can such driver be developed.

Possible approaches

Creating and controlling sockets can easily be done with the help of the Phoenix.ChannelTest module. Using macros and functions, such as connect/2, subscribe_and_join/4 and push/3, you can easily create sockets, join channels, and push messages. After all, these macros are made precisely for the purpose of programmatically driving sockets in unit tests.

This approach should work nicely in unit tests, but I’m not sure it’s appropriate for load testing. The most important reason is that these functions are meant to be invoked from within the test process. This is actually perfect for unit tests, but in a load test I’d like to be closer to the real thing. Namely I’d like to run each socket in a separate process, and at that point the amount of housekeeping I need to do increases, and I’m practically implementing a Phoenix socket transport (I’ll explain what this means in a minute).

In addition, Phoenix.ChannelTest seems to rely on some internals of sockets and channels, and its functions create one %Socket{} struct per each connected client, something which is not done by currently existing Phoenix transports.

So instead, I’ll implement SocketDriver as a partial Phoenix transport, namely a GenServer that can be used to create and control a socket. This will allow me to be closer to existing transports. Moreover, it’s an interesting exercise to learn something about Phoenix internals. Finally, such socket driver can be used beyond load testing purposes, for example to expose different access points which can exist outside of Cowboy and Ranch.

Sockets, channels, transports, and socket driver

Before going further, let’s discuss some terminology.

The idea of sockets and channels is pretty simple, yet very elegant. A socket is an abstracted long-running connection between the client and the server. Messages can be wired through websocket, long polling, or practically anything else.

Once the socket is established, the client and the server can use it to hold multiple conversations on various topic. These conversations are called channels, and they amount to exchanging messages and managing channel-specific state on each side.

The corresponding process model is pretty reasonable. One process is used for the socket, and one for each channel. If a client opens 2 sockets and joins 20 topics on each socket, we’ll end up with 42 processes: 2 * (1 socket process + 20 channel processes).

A Phoenix socket transport is the thing that powers the long running connection. Owing to transports, Phoenix.Socket, Phoenix.Channel, and your own channels, can safely assume they’re operating on a stateful, long-running connection regardless of how this connection is actually powered.

You can implement your own transports, and thus expose various communication mechanisms to your clients. On the flip side, implementing a transport is somewhat involved, because various concerns are mixed in this layer. In particular, a transport has to:

Manage a two-way stateful connection
Accept incoming messages and dispatch them to channels
React to channel messages and dispatch responses to the client
Manage the mapping of topics to channel processes in a HashDict (and usually the reverse mapping as well)
Trap exits, react to exits of channel processes
Provide adapters for underlying http server libraries, such as Cowboy

In my opinion that’s a lot of responsibilities bundled together, which makes the implementation of a transport more complex than it should be, introduces some code duplication, and makes transports less flexible than they could be. I shared these concerns with Chris and José, so there are chances this might be improved in the future.

As it is, if you want to implement a transport, you need to tackle the points above, save possibly one: in case your transport doesn’t need to be exposed through an http endpoint, you can skip the last point, i.e. you don’t need to implement Cowboy (or some other web library) adapter. This effectively means you’re not a Phoenix transport anymore (because you can’t be accessed through the endpoint), but you’re still able to create and control a Phoenix socket. This is what I’m calling a socket driver.

The implementation

Given the list above, the implementation of SocketDriver is fairly straightforward, but somewhat involved, so I’ll refrain from step-by-step explanation. You can find the full code here, with some basic comments included.

The gist of it is, you need to invoke some Phoenix.Socket.Transport functions at proper moments. First, you need to invoke connect/6 to create the socket. Then, for every incoming message (i.e. a message that was sent by the client), you need to invoke dispatch/3. In both cases, you’ll get some channel-specific response which you must handle.

Additionally, you need to react to messages sent from channel processes and the PubSub layer. Finally, you need to detect terminations of channel processes and remove corresponding entries from your internal state.

I should mention that this SocketDriver uses a non-documented Phoenix.ChannelTest.NoopSerializer - a serializer that doesn’t encode/decode messages. This will keep things simple, but it will also remove the encoding/decoding job out of the tests.

Creating 500k sockets & channels

With SocketDriver in place, we can now easily create a bunch of sockets locally. I’ll do this in the prod environment to mimic the production more closely.

A basic Phoenix server with a simple socket/channel can be found here. I need to compile it in prod (MIX_ENV=prod mix compile), and then I can start it with:

MIX_ENV=prod PORT=4000 iex --erl "+P 10000000" -S mix phoenix.server

The –erl “+P 10000000” option increases the default maximum number of processes to 10 millions. I plan to create 500k sockets, so I need a bit more than a million of processes, but to be on the safe side, I’ve chosen a much larger number. Creating sockets is now as simple as:

iex(1)> for i <- 1..500_000 do
          # Start the socket driver process
          {:ok, socket} = SocketDriver.start_link(
            SocketDriver.Endpoint,
            SocketDriver.UserSocket
          )

          # join the channel
          SocketDriver.join(socket, "ping_topic")
        end

It takes about a minute on my machine to create all these sockets and then I can fire up the observer. Looking at the System tab, I can see that about a million of processes are running, as expected:

I should also mention that I’ve changed the default logger level setting to :warn in prod. By default, this setting is :info which will dump a bunch of logs to the console. This in turn might affect the throughput of your load generator, so I raised this level to mute needless messages.

Also, to make the code runnable out of the box, I removed the need for the prod.secret.exs file. Obviously a very bad practice, but this is just a demo, so we should be fine. Just keep in mind to avoid developing any production on top of my (or your own) hacky experiments :-)

Hibernating processes

If you take a closer look at the image above, you’ll see that the memory usage of about 6GB is somewhat high, though I wouldn’t call it excessive for so many created sockets. I’m not sure whether Phoenix team did some memory optimizations, so there’s possibility this overhead might be reduced in future versions.

As it is, let’s see whether process hibernation can help us reduce this memory overhead. Note that this is a shallow experiment, so don’t draw any hard conclusions. This will be more like a simple demo of how we can quickly gain some insights by creating a bunch of sockets on our dev box, and explore various routes locally.

First a bit of theory. You can reduce the memory usage of the process by hibernating it with :erlang.hibernate/3. This will trigger the garbage collection of the process, shrink the heap, truncate the stack, and put the process in the waiting state. The process will be awoken when it receives a message.

When it comes to GenServer, you can request the hibernation by appending the :hibernate atom to most of return tuples in your callback functions. So for example instead of {:ok, state} or {:reply, response, state}, you can return {:ok, state, :hibernate} and {:reply, response, state, :hibernate} from init/1 and handle_call/3 callbacks.

Hibernation can help reducing memory usage of processes which are not frequently active. You pay some CPU price, but you get some memory in return. Like most other things in life, hibernation is a tool, not a silver bullet.

So let’s see whether we can gain something by hibernating socket and channel processes. First, I’ll modify SocketDriver by adding :hibernate to init, handle_cast, and handle_info callbacks in SocketDriver. With these changes, I get following results:

This is about 40% less memory used, which seems promising. It’s worth mentioning that this is not a conclusive test. I’m hibernating my own socket driver, so I’m not sure whether the same saving would happen in the websocket transport, which is not GenServer based. However, I’m somewhat more certain that hibernating might help with long polling, where a socket is driven by a GenServer process, which is similar to SocketDriver (in fact, I consulted Phoenix code a lot while developing SocketDriver).

In any case, these tests should be retried with real transports, which is one reason why this experiment is somewhat contrived and non-conclusive.

Regardless, let’s move on and try to hibernate channel processes. I modified deps/phoenix/lib/phoenix/channel/server.ex to make the channel processes hibernate. After recompiling deps and creating 500k sockets, I noticed additional memory saving of 800MB:

After hibernating sockets and channels, the memory usage is reduced by more than 50%. Not too shabby :-)

Of course, it’s worth repeating that the hibernation comes with a price which is CPU usage. By hibernating, we force some work to be done immediately, so it should be used carefully and the effects on performance should be measured.

Also, let me stress again that this is a very shallow test. At best these results can serve as an indication, a clue as to whether hibernation might help. Personally, I think it’s a useful hint. In a real system the state of your channels might be more complex, and they might perform various transformations. Thus, in some cases, occasional hibernation might bring some nice savings. Therefore, I think Phoenix should allow us to request hibernation of our channel processes through callback tuples.

Conclusion

The main point of this article is that by driving Phoenix sockets, you can quickly gain some insights on how your system behaves under a more significant load. You can start the server, kick off some synthetic loader, and observe the system’s behaviour. You can gather feedback and try some alternatives more quickly, and in the process you don’t need to shell out tons of money for beefy servers, nor spend a lot of time tweaking the OS settings to accommodate a lot of open network sockets.

Of course, don’t mistake this for a full test. While driving sockets can help you get some insights, it doesn’t paint the whole picture, because network I/O is bypassed. Moreover, since the loader and the server are running on the same machine, thus competing for the same resources, the results might be skewed. An intensive loader might affect the performance of the server.

To get the whole picture, you’ll probably want to run final end-to-end tests on production-like server with separate client machines. But you can do this less often and be more confident that you’ve handled most problems before you moved to the more complicated stage of testing. In my experience, a lot of low-hanging fruit can be picked by exercising the system locally.

Finally, don’t put too much faith in synthetic tests, because they will not be able to completely simulate the chaotic and random patterns of the real life. That doesn’t mean such tests are useless, but they’re definitely not conclusive. As the old saying goes: “There’s no test like production!” :-)

Elixir 1.2 and Elixir in Action

Wed, 6 Jan 16 00:00:00 +0000

Elixir 1.2 and Elixir in Action

2016-01-06

Elixir 1.2 is out, and this is the second minor release since Elixir in Action has been published, so I wanted to discuss some consequences of new releases on the book’s material.

Since the book focuses on concurrency and OTP principles, most of its content is still up to date. OTP is at this point pretty stable and not likely to be significantly changed. Moreover, Elixir 1.2 is mostly downwards compatible, meaning that the code written for earlier 1.x versions should compile and work on 1.2. In some cases, some minor modifications might be required, in which case the compiler should emit a corresponding error/warning.

All that said, some information in the book is not completely accurate anymore, so I’d like to point out a few things.

Updated code examples

Some minor changes had to be made to make the code examples work, most notably relaxing the versioning requirement in mix.exs. You can find the 1.2 compliant code examples here.

Deprecating Dict and Set

Elixir 1.2 requires Erlang 18.x which brings a couple of big changes. You can see the highlights here, but in the context of EiA, the most important improvement deals with maps which now perform well for large datasets. Consequently, HashDict and HashSet are becoming redundant.

Therefore the Elixir core team decided to deprecate following k-v and set related modules: Dict, Set, HashDict, and HashSet. These modules are soft deprecated, meaning that they will in fact still work, but their usage is discouraged as they are marked for subsequent removal. If you’re developing for Elixir 1.2+ you’re encouraged to use plain maps for k-v structure, and the new type MapSet (internally also powered by maps) for sets.

There’s one important caveat: if your code must work on Elixir 1.0 and 1.1, then you should in fact still prefer HashDict and HashSet. The reason is that the older Elixir version can run on Erlang 17, so if your code uses large maps, the performance might suffer.

For Elixir in Action, this means that all the code that’s using HashDict should be changed to use maps. Most notably, the Todo.List abstraction should internally use maps to maintain the entries field. You can see the changes in this commit.

Protocol consolidation

Starting Elixir 1.2, protocols are consolidated by default in all build environments (you can change this by setting consolidate_protocols: false in the project config). As a result, the subsection “Protocol Consolidation” (page 326) becomes redundant. With new Elixir you don’t need to worry about consolidation.

Embedded build and permanent applications

This change has been introduced way back in Elixir 1.0.4, and there’s a nice post by José Valim on the subject.

The gist of the story is that two mix properties are introduced which allow you to configure the “embedded build” and “permanent applications”. By default, new projects generated with Elixir 1.0.4+ will have these properties set to true for prod mix environment. If you generated your project before 1.0.4, you should add the following options to your mix.exs (in the project/0):

def project do
  [
    # ...
    build_embedded: Mix.env == :prod,
    start_permanent: Mix.env == :prod
  ]
end

When the :build_embedded option is set to true, the target folder will not contain symlinks. Instead, all data that needs to be in that folder (e.g. the content of the priv folder) will be copied.

The start_permanent option, if set to true, will cause the OTP application to be started as permanent. If the application crashes, that is if the top-level supervisor terminates, the whole BEAM node will be terminated. This makes sense in the production, because it allows you to to detect application crash in another OS process, and do something about it.

As José explains, it’s sensible to set both options to true for production environment. In contrast, you probably want to have them unset during development for convenience.

That’s all folks :-)

Yep, nothing else of the book content is affected by new changes to Elixir. However, many cool features have been introduced since Elixir 1.0, such as the with special form, or the mix profile.fprof task. Therefore, I suggest reading through the changelogs of recent releases :-)

Happy coding!

Open-sourcing Erlangelist

Sun, 1 Nov 15 00:00:00 +0000

Open-sourcing Erlangelist

2015-11-01

It is a great pleasure to announce that The Erlangelist is now open-sourced. Last week I have made the switch to the completely rewritten, new version of the site, and with this post I’m also making the repository public.

Changelog

From the end user’s perspective there aren’t many changes:

Comments are now powered by Disqus.
Some of the past articles are not migrated to the new site. You can still find them at the old blogger site.
All articles are now licensed under a under a Creative Commons Attribution-NonCommercial 4.0 International License.
Privacy policy is now included.
The UI went through some minor cosmetic changes (though sadly it still reflects how much I suck at UI/UX).
Article links have changed, but old urls still work. If your site points to this blog (thanks for the plug), the link should still work (even if pointing to a non-migrated article)

Internals

The Erlangelist site was previously hosted on Blogger, but now it’s fully rewritten from scratch and self-hosted on a cheap VPS. I plan on writing more in-depth posts in the future, but here’s a general technical overview.

The site is powered by Elixir and Phoenix. All requests are accepted directly in the Erlang VM, i.e. there’s no nginx or something similar in front.

PostgreSQL is used to store some data about each request, so I can later get some basic server usage stats (views, visitors, referers).

In addition, I’m running a local instance of the freegeoip.net site, which allows me to determine your geolocation. Currently, only your country information is used. This is stored in the database request log, because I’d like to know where my visitors come from. In addition, I use this information to explicitly ask EU based users to allow usage of cookies.

Finally, Graphite is used to visualize some general stats. I use collectd to gather system metrics, and Exometer for some basic Erlang VM information.

All of the components (save collectd) are running inside Docker containers which are started as systemd units. The initial server setup is done with Ansible, and the deploy is performed with git push to the server.

Why?

First of all, I want to point out the obvious: implementing a web server from scratch is clearly a wrong approach to write a blog.

It requires a lot of time and energy in dealing with the whole stack: backend implementation, frontend & UI, server administration, deployment, monitoring, and whatnot. And still, the final product is in many ways, if not all, inferior to alternatives such as Medium, GitHub pages, or Blogger. Such solutions allow you to focus on your writing without worrying about anything else. Furthermore, they are surely more stable and battle-tested, offer better reliability and higher capacity.

My implementation of the Erlangelist site lacks in all of these properties, being ridden with all the typical developer sins: NIH, over- and under- engineering, ad-hoc hacky shortcuts, home-grown patterns, wheel reinvention, poor testing, bash scripts (full confession: I love bash scripts), and many more. Also, hosting the blog on a single cheap VPS doesn’t really boost its availability.

So, why did I do it then? Because I wanted to try out Phoenix and The Erlangelist was a good lab rat candidate. It’s a pretty simple service and it’s used in real life. Much to my surprise, people read these articles, and occasionally even mention some of them in their own posts. On the other hand, the blog receives only a few hundred views per day, so the site is really not highly loaded, nor super critical. Occasional shorter downtime shouldn’t cause much disturbance, and it might even go completely unnoticed.

The challenge was thus manageable in the little extra time I was able to spare, and so far the system seems to be doing well. As an added bonus, now I’m able to see all the requests, something that was not possible on Blogger. One thing I immediately learned after switching to the new site is that people seem to use the RSS feed of the blog. I had no idea this was still a thing, and almost forgot to port the feed. Thanks to Janis Miezitis for providing nice RSS on Phoenix guide.

The experience

I had a great time implementing this little site. The server code is fairly simple, and doesn’t really use a lot of Phoenix. The site boils down to hosting a few static pages, so there wasn’t much need for some fancy features such as channels. Regardless, I was quite impressed with what I’ve seen so far. It was pretty easy to get started, especially owing to excellent online guides by Lance Halvorsen.

Working on the “devops” tasks related to producing and deploying the release was another interesting thing. This is where I spent most of the effort, but I’ve also learned a lot in the process.

So altogether, the experience so far has been pretty nice, and I’m very excited that this blog is finally powered by the same technology it promotes :-)

Copyright 2015, Saša Jurić. This article is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
The article was first published on The Erlangelist site.
The source of the article can be found here.

Outside Elixir: running external programs with ports

Tue, 18 Aug 15 00:00:00 +0000

Outside Elixir: running external programs with ports

2015-08-18

Occasionally it might be beneficial to implement some part of the system in something other than Erlang/Elixir. I see at least two reasons for doing this. First, it might happen that a library for some particular functionality is not available, or not as mature as its counterparts in other languages, and creating a proper Elixir implementation might require a lot of effort. Another reason could be raw CPU speed, something which is not Erlang’s forte, although in my personal experience that rarely matters. Still, if there are strong speed requirement in some CPU intensive part of the system, and every microsecond is important, Erlang might not suffice.

There may exist some other situations where Erlang is possibly not the best tool for the job. Still, that’s not necessarily the reason to dismiss it completely. Just because it’s not suitable for some features, doesn’t mean it’s not a good choice to power most of the system. Moreover, even if you stick with Erlang you still can resort to other languages to implement some parts of it. Erlang provides a couple of techniques to do this, but in my personal opinion the most compelling option is to start external programs from Erlang via ports. This is the approach I’d consider first, and then turn to other alternatives in some special cases. So in this article, I’ll talk about ports but before parting, I’ll briefly mention other options and discuss some trade-offs.

Basic theory

An Erlang port is a process-specific resource. It is owned by some process and that process is the only one that can talk to it. If the owner process terminates, the port will be closed. You can create many ports in the system, and a single process can own multiple ports. It’s worth mentioning that a process can hand over the ownership of the port to another process.

An examples of ports are file handles and network sockets which are connected to the owner process and closed if that process terminates. This allows proper cleanup in an well structured OTP application. Whenever you take down some part of the supervision tree, all resources owned by terminated processes will be closed.

From the implementation standpoint, ports come in two flavors. They can either be powered by a code which runs directly in the VM itself (port driver), or they can run as an external OS process outside of the BEAM. Either way, the principles above hold and you use mostly the same set of functions exposed in the Port module - tiny wrappers around port related functions from the :erlang module. In this article I’ll focus on ports as external processes. While not the fastest option, I believe this is often a sensible approach because it preserves fault-tolerance properties.

Before starting, I should also mention the Porcelain library, by Alexei Sholik, which can simplify working with ports in some cases. You should definitely check it out, but in this article I will just use the Port module to avoid the extra layer of abstraction.

First take

Let’s see a simple example. In this exercise we’ll introduce the support for running Ruby code from the Erlang VM. Under the scene, we’ll start a Ruby process from Erlang and send it Ruby commands. The process will eval those commands and optionally send back responses to Erlang. We’ll also make the Ruby interpreter stateful, allowing Ruby commands to share the same state. Of course, it will be possible to start multiple Ruby instances and achieve isolation as well.

The initial take is simple. To run an external program via port, you need to open a port via Port.open/2, providing a command to start the external program. Then you can use Port.command/2 to issue requests to the program. If the program sends something back, the owner process will receive a message. This is pretty resemblant to the classic message passing approach.

On the other side, the external program uses standard input/output to talk to its owner process. Basically, it needs to read from stdin, decode the input, do its stuff, and optionally print the response on stdout which will result in a message back to the Erlang process. When the program detects EOF on stdin, it can assume that the owner process has closed the port.

Let’s see this in action. First, we’ll define the command to start the external program, in this case a Ruby interpreter:

cmd = ~S"""
  ruby -e '
    STDOUT.sync = true
    context = binding

    while (cmd = gets) do
      eval(cmd, context)
    end
  '
"""

This is a simple program that reads lines from stdin and evals them in the same context, thus ensuring that the side effect of the previous commands is visible to the current one. The STDOUT.sync = true bit ensures that whatever we output is immediately flushed, and thus sent back to the owner Erlang process.

Now we can start the port:

port = Port.open({:spawn, cmd}, [:binary])

The second argument contains port options. For now, we’ll just provide the :binary option to specify that we want to receive data from the external program as binaries. We’ll use a couple of more options later on, but you’re advised to read the official documentation to learn about all the available options.

Assuming you have a Ruby interpreter somewhere in the path, the code above should start a corresponding OS process, and you can now use Port.command/2 to talk to it:

Port.command(port, "a = 1\n")
Port.command(port, "a += 2\n")
Port.command(port, "puts a\n")

This is fairly straightforward. We just send some messages to the port, inserting newlines to make sure the other side gets them (since it uses gets to read line by line). The Ruby program will eval these expressions (since we’ve written it that way). In the very last expression, we print the contents of the variable. This last statement will result in a message to the owner process. We can receive this message as usual:

receive do
  {^port, {:data, result}} ->
    IO.puts("Elixir got: #{inspect result}")
end

# Elixir got: "3\n"

The full code is available here.

Program termination

It’s worth noting again, that a port is closed when the owner process terminates. In addition, the owner process can close the port explicitly with Port.close/1. When a port is closed the external program is not automatically terminated, but pipes used for communication will be closed. When the external program reads from stdin it will get EOF and can do something about it, for example terminate.

This is what we already do in our Ruby program:

while (cmd = gets) do
  eval(cmd, context)
end

By stopping the loop when gets returns nil we ensure that the program will terminate when the port is closed.

There are a few caveats though. Notice how we eval inside the loop. If the code in cmd takes a long time to run, the external program might linger after the port is closed. This is simply due to the fact that the program is busy processing the current request, so it can’t detect that the other side has closed the port. If you want to ensure immediate termination, you can consider doing processing in a separate thread, while keeping the main thread focused on the communication part.

Another issue is the fact that closing the port closes both pipes. This may present a problem if you want to directly use tools which produce their output only after they receive EOF. In the context of port, when this happens, both pipes are already closed, so the tool can’t send anything back via stdout. There are quite a few discussion on this issue (see here for example). Essentially, you shouldn’t worry about it if you implement your program to act as a server which waits for requests, does some processing, and optionally spits out the result. However, if you’re trying to reuse a program which is not originally written to run as a port, you may need to wrap it in some custom script, or resort to libraries which offer some workarounds, such as the aforementioned Porcelain.

Packing messages

The communication between the owner process and the port is by default streamed, which means there are no guarantees about message chunks, so you need to somehow parse messages yourself, character by character.

In the previous example the Ruby code relies on newlines to serve as command separators (by using gets). This is a quick solution, but it prevents us from running multiline commands. Moreover, when receiving messages in Elixir, we don’t have any guarantees about chunking. Data is streamed back to us as it is printed, so a single message might contain multiple responses, or a single response might span multiple messages.

A simple solution for this is to include the information about the message size in the message itself. This can be done by providing the {:packet, n} option to Port.open/2:

port = Port.open({:spawn, cmd}, [:binary, {:packet, 4}])

Each message sent to the port will start with n bytes (in this example 4) which represent the byte size of the rest of the message. The size is encoded as an unsigned big-endian integer.

The external program then needs to read this 4 bytes integer, and then get the corresponding number of bytes to obtain the message payload:

def receive_input
  encoded_length = STDIN.read(4)                # get message size
  return nil unless encoded_length

  length = encoded_length.unpack("N").first     # convert to int
  STDIN.read(length)                            # read message
end

Now we can use receive_input in the eval loop:

while (cmd = receive_input) do
  eval(cmd, context)
end

These changes allow the Elixir client to send multi-line statements:

Port.command(port, "a = 1")
Port.command(port, ~S"""
  while a < 10 do
    a *= 3
  end
""")

When the Ruby program needs to send a message back to Erlang, it must also include the size of the message:

def send_response(value)
  response = value.inspect
  STDOUT.write([response.bytesize].pack("N"))
  STDOUT.write(response)
  true
end

Elixir code can then use send_response to make the Ruby code return something. To prove that responses are properly chunked, let’s send two responses:

Port.command(port, ~S"""
  send_response("response")
  send_response(a)
""")

Which will result in two messages on the Elixir side:

receive do
  {^port, {:data, result}} ->
    IO.puts("Elixir got: #{inspect result}")
end

receive do
  {^port, {:data, result}} ->
    IO.puts("Elixir got: #{inspect result}")
end

# Elixir got: "\"response\""
# Elixir got: "27"

The complete code is available here.

Encoding/decoding messages

The examples so far use plain string as messages. In more involved scenarios you may need to deal with various data types. There’s no special support for this. Essentially a process and a port exchange byte sequences, and it is up to you to implement some encoding/decoding scheme to facilitate data typing. You can resort to popular formats such as JSON for this purpose.

In this example, I’ll use Erlang’s External Term Format (ETF). You can easily encode/decode any Erlang term to ETF via :erlang.term_to_binary/1 and :erlang.binary_to_term/1. A nice benefit of this is that you don’t need any third party library on the Elixir side.

Let’s see this in action. Instead of plain strings, we’ll send {:eval, command} tuples to the Ruby side. The Ruby program will execute the command only if it receives :eval tagged tuple. In addition, when responding back, we’ll again send the message as tuple in form of {:response, value}, where value will also be an Erlang term.

On the Elixir side we’ll introduce a helper lambda to send {:eval, command} tuples to the port. It will simply pack the command into a tuple and encode it to ETF binary:

send_eval = fn(port, command) ->
  Port.command(port, :erlang.term_to_binary({:eval, command}))
end

The function can then be used as:

send_eval.(port, "a = 1")
send_eval.(port, ~S"""
  while a < 10 do
    a *= 3
  end
""")
send_eval.(port, "send_response(a)")

On the Ruby side, we need to decode ETF byte sequence. For this, we need to resort to some 3rd party library. After a quick (and very shallow) research, I opted for erlang-etf. We need to create a Gemfile with the following content:

source "https://rubygems.org"

gem 'erlang-etf'

And then run bundle install to fetch gems.

Now, in our Ruby code, we can require necessary gems:

require "bundler"
require "erlang/etf"
require "stringio"

Then, we can modify the read_input function to decode the byte sequence:

def receive_input
  # ...

  Erlang.binary_to_term(STDIN.read(length))
end

The eval loop now needs to check that the input message is a tuple and that it contains the :eval atom as the first element:

while (cmd = receive_input) do
  if cmd.is_a?(Erlang::Tuple) && cmd[0] == :eval
    eval(cmd[1], context)
  end
end

Then we need to adapt the send_response function to encode the response message as {:response, value}:

def send_response(value)
  response = Erlang.term_to_binary(Erlang::Tuple[:response, value])
  # ...
end

Going back to the Elixir side, we now need to decode the response message with :erlang.binary_to_term/1:

receive do
  {^port, {:data, result}} ->
    IO.puts("Elixir got: #{inspect :erlang.binary_to_term(result)}")
end

# Elixir got: {:response, 27}

Take special note how the received value is now an integer (previously it was a string). This happens because the response is now encoded to ETF on the Ruby side.

The complete code is available here.

Bypassing stdio

Communication via stdio is somewhat unfortunate. If in the external program we want to print something, perhaps for debugging purposes, the output will just be sent back to Erlang. Luckily, this can be avoided by instructing Erlang to use file descriptors 3 and 4 for communication with the program. Possible caveat: I’m not sure if this feature will work on Windows.

The change is simple enough. We need to provide the :nouse_stdio option to Port.open/2:

port = Port.open({:spawn, cmd}, [:binary, {:packet, 4}, :nouse_stdio])

Then, in Ruby, we need to open files 3 and 4, making sure that the output file is not buffered:

@input = IO.new(3)
@output = IO.new(4)
@output.sync = true

Finally, we can simply replace references to STDIN and STDOUT with @input and @output respectively. The code is omitted for the sake of brevity.

After these changes, we can print debug messages from the Ruby process:

while (cmd = receive_input) do
  if cmd.is_a?(Erlang::Tuple) && cmd[0] == :eval
    puts "Ruby: #{cmd[1]}"
    res = eval(cmd[1], context)
    puts "Ruby: => #{res.inspect}\n\n"
  end
end

puts "Ruby: exiting"

Which gives the output:

Ruby: a = 1
Ruby: => 1

Ruby:   while a < 10 do
    a *= 3
  end
Ruby: => nil

Ruby: send_response(a)
Ruby: => true

Elixir got: {:response, 27}
Ruby: exiting

The code is available here.

Wrapping the port in a server process

Since the communication with the port relies heavily on message passing, it’s worth managing the port inside a GenServer. This gives us some nice benefits:

The server process can provide an abstract API to its clients. For example, we could expose RubyServer.cast and RubyServer.call. The first operation just issues a command without producing the output. The second one will instruct Ruby program to invoke send_response and send the response back. In addition, the server process will handle the response message by notifying the client process. The coupling between Erlang and the program remains in the code of the server process.
The server process can include additional unique id in each request issued to the port. Ruby program will include this id in the response message, so the server can reliably match the response to a particular client request.
The server process can be notified if the Ruby program crashes, and in turn crash itself.

Let’s see an example usage of such server:

{:ok, server} = RubyServer.start_link

RubyServer.cast(server, "a = 1")
RubyServer.cast(server, ~S"""
  while a < 10 do
    a *= 3
  end
""")

RubyServer.call(server, "Erlang::Tuple[:response, a]")
|> IO.inspect

# {:response, 27}

Of course, nothing stops you from creating another Ruby interpreter:

{:ok, another_server} = RubyServer.start_link
RubyServer.cast(another_server, "a = 42")
RubyServer.call(another_server, "Erlang::Tuple[:response, a]")
|> IO.inspect

# {:response, 42}

These two servers communicate with different interpreter instances so there’s no overlap:

RubyServer.call(server, "Erlang::Tuple[:response, a]")
|> IO.inspect

# {:response, 27}

Finally, a crash in the Ruby program will be noticed by the GenServer which will in turn crash itself:

RubyServer.call(server, "1/0")

# ** (EXIT from #PID<0.48.0>) an exception was raised:
#     ** (ErlangError) erlang error: {:port_exit, 1}
#         ruby_server.ex:43: RubyServer.handle_info/2
#         (stdlib) gen_server.erl:593: :gen_server.try_dispatch/4
#         (stdlib) gen_server.erl:659: :gen_server.handle_msg/5
#         (stdlib) proc_lib.erl:237: :proc_lib.init_p_do_apply/3

The implementation is mostly a rehash of the previously mentioned techniques, so I won’t explain it here. The only new thing is providing of the :exit_status option to Port.open/2. With this option, we ensure that the owner process will receive the {port, {:exit_status, status}} message, and do something about the port crash. You’re advised to try and implement such GenServer yourself, or analyze my basic solution.

Alternatives to ports

Like everything else, ports come with some associated trade-offs. The most obvious one is the performance hit due to encoding and communicating via pipes. If the actual processing in the port is very short, this overhead might not be tolerable. With a lot of hand waving I’d say that ports are more appropriate when the external program will do some “significant” amount of work, something that’s measured at least in milliseconds.

In addition, ports are coupled to the owner (and vice-versa). If the owner stops, you probably want to stop the external program. Otherwise the restarted owner will start another instance of the program, while the previous instance won’t be able to talk to Erlang anymore.

If these issues are relevant for your specific case, you might consider some alternatives:

Port drivers (sometimes called linked-in drivers) have characteristics similar to ports, but there is no external program involved. Instead, the code, implemented in C/C++, is running directly in the VM.
NIFs (native implemented functions) can be used to implement Erlang functions in C and run them inside the BEAM. Unlike port drivers, NIFs are not tied to a particular process.
It is also possible to make your program look like an Erlang node. Some helper libraries are provided for C and Java. Your Erlang node can then communicate with the program, just like it would do with any other node in the cluster.
Of course, you can always go the “microservices” style: start a separate program, and expose some HTTP interface so your Erlang system can talk to it.

The first two alternatives might give you significant speed improvement at the cost of safety. An unhandled exception in a NIF or port driver will crash the entire BEAM. Moreover, both NIFs and port-drivers are running in scheduler threads, so you need to keep your computations short (<= 1ms), otherwise you may end up compromising the scheduler. This can be worked around with threads and usage of dirty schedulers, but the implementation might be significantly more involved.

The third option provides looser coupling between two parties, allowing them to restart separately. Since distributed Erlang is used, you should still be able to detect crashes of the other side.

A custom HTTP interface is more general than an Erlang-like node (since it doesn’t require an Erlang client), but you lose the ability to detect crashes. If one party needs to detect that the other party has crashed, you’ll need to roll your own health checking (or reuse some 3rd party component for that).

I’d say that nodes and separate services seem suitable when two parties are more like peers, and each one can exist without the other. On the other hand, ports are more interesting when the external program makes sense only in the context of the whole system, and should be taken down if some other part of the system terminates.

As you can see, there are various options available, so I think it’s safe to say that Erlang is not an island. Moving to Erlang/Elixir doesn’t mean you lose the ability to implement some parts of the system in other languages. So if for whatever reasons you decide that something else is more suitable to power a particular feature, you can definitely take that road and still enjoy the benefits of Erlang/Elixir in the rest of your system.

Optimizing a function with the help of Elixir macros

Thu, 6 Aug 15 00:00:00 +0000

Optimizing a function with the help of Elixir macros

2015-08-06

Author: Tallak Tveide

Today I have a pleasure of hosting a post by Tallak Tveide, who dived into Elixir macros, came back alive, and decided to share his experience with us. This is his story.

In this blog post we will cover optimizing an existing function for certain known inputs, using macros. The function that we are going to optimize is 2D matrix rotation. The problem was chosen for it’s simplicity. When I first used these techniques there were a few extra complexities that have been left out, please keep this in mind if the code seems like overkill.

If you are unfamiliar with macros, this blog post may be difficult to read. In that case one tip is to read Saša Jurić’s articles about Elixir macros first, then revisit this post.

Two dimensional vector rotation

We want to take a vector {x, y} and apply any number of translate and rotate transforms on it. We want to end up with code looking like:

transformed_point =
  {10.0, 10.0}
  |> rotate(90.0)
  |> translate(1.0, 1.0)

The translate function would simply look something like this:

defmodule TwoD do
  def translate({x, y}, dx, dy) do
    {x + dx, y + dy}
  end
end

And then the rotate function might look like:

defmodule TwoD do
  @deg_to_rad 180.0 * :math.pi

  def rotate({x, y}, angle) do
    radians = angle * @deg_to_rad
    { x * :math.cos(radians) - y * :math.sin(radians),
      x * :math.sin(radians) + y * :math.cos(radians) }
  end
end

The first subtle macro magic is already happening at this point. We are precalculating the module attribute @deg_to_rad at compile time to avoid calling :math.pi and performing a division at runtime.

I have left out translate from here on for clarity.

The idea

When I first started to look at these transforms, most of my rotations were in multiples of 90 degrees. For these operations, :math.sin(x) and math.cos(x) will return the values -1.0, 0.0 or 1.0, and the rotate function is reduced to reordering and changing signs of the vector tuple values in {x, y}.

If we spelled the code out, it would look something like this:

defmodule TwoD do
  def rotate({x, y}, 90.0), do: {-y, x}
  def rotate({x, y}, 180.0), do: {-x, -y}
  # ... more optimized versions here

  # failing an optimized match, use the generic rotate
  def rotate({x, y}, angle) do
    radians = angle * @deg_to_rad
    { x * :math.cos(radians) - y * :math.sin(radians),
      x * :math.sin(radians) + y * :math.cos(radians) }
  end
end

For this particular problem, the code above, without macros, is most readable, maintainable and is also as efficient as any other code.

The first attempt

There are basically just four variants at [0, 90, 180, 270] degrees that are interesting to us as sin and cos are cyclic. Our initial approach will select one of these four variants based on a parameter, and then inject some code into the TwoD module:

  defmodule TwoD.Helpers do
    @deg_to_rad 180.0 * :math.pi

    def rotate({x, y}, angle) do
      radians = angle * @deg_to_rad
      { x * :math.cos(radians) - y * :math.sin(radians),
        x * :math.sin(radians) + y * :math.cos(radians) }
    end

    defmacro def_optimized_rotate(angle_quoted) do
      # angle is still code, so it must be evaluated to get a number
      {angle, _} = Code.eval_quoted(angle_quoted)

      x_quoted = Macro.var(:x, __MODULE__)
      y_quoted = Macro.var(:y, __MODULE__)
      neg_x_quoted = quote do: (-unquote(Macro.var(:x, __MODULE__)))
      neg_y_quoted = quote do: (-unquote(Macro.var(:y, __MODULE__)))

      # normalize to 0..360; must add 360 in case of negative angle values
      normalized = angle |> round |> rem(360) |> Kernel.+(360) |> rem(360)

      result_vars_quoted = case normalized do
        0 ->
          [x_quoted, y_quoted]
        90 ->
          [neg_y_quoted, x_quoted]
        180 ->
          [neg_x_quoted, neg_y_quoted]
        270 ->
          [y_quoted, neg_x_quoted]
        _ ->
          raise "Optimized angles must be right or straight"
      end

      # at last return a quoted function definition
      quote do
        def rotate({x, y}, unquote(angle * 1.0)) do
          {unquote_splicing(result_vars_quoted)}
        end
      end
    end
  end

  defmodule TwoD do
    require TwoD.Helpers

    # Optimized versions of the code
    TwoD.Helpers.def_optimized_rotate(-270)
    TwoD.Helpers.def_optimized_rotate(-180)
    TwoD.Helpers.def_optimized_rotate(-90)
    TwoD.Helpers.def_optimized_rotate(0)
    TwoD.Helpers.def_optimized_rotate(90)
    TwoD.Helpers.def_optimized_rotate(180)
    TwoD.Helpers.def_optimized_rotate(270)

    def rotate(point, angle), do: TwoD.Helpers.rotate(point, angle)
  end

The rotate function has been moved to the TwoD.Helpers module, and then replaced with a simple forwarding call. It will be useful when we later want to test our optimized function towards the unoptimized one.

When I first implemented def_optimized_rotate I was caught a bit off guard as the parameters to the macro are not available as the simple numbers that I passed them. The parameter angle_quoted is actually passed as a block of code. So in order for the macro to be able to precalculate the code, we have to add {angle, _} = Code.eval_quoted angle_quoted at the top of our macro to expand the code for the number into an actual value.

Please note that I would not recommend using Code.eval_quoted for reasons that will hopefully become clear later.

For this particular problem, I am quite happy spelling out all the seven values that I want to optimize. But if there were many more interesting optimizations (for instance if the rotation was in 3D), spelling all of these out is not a good option. Let’s wrap the macro call in a for comprehension instead.

Inserting dynamic module definitions

Before writing the for comprehension, let’s look at how a function may be defined dynamically. We’ll start by making a function that simply returns it’s name, but that name is assigned to a variable at compile time, before the function is defined:

defmodule Test do
  function_name = "my_test_function"

  def unquote(function_name |> String.to_atom)() do
    unquote(function_name)
  end
end

And when run it in IEx, we get:

iex(2)> Test.my_test_function
"my_test_function"

The thing to note is that when we are defining a module, we are in a way already inside an implicit quote statement, and that we may use unquote to expand dynamic code into our module. The first unquote inserts an atom containing the function name, the second inserts the return value.

Actually, I have yet to see unquote used like this in a module definition. Normally you would prefer to use module attributes as often as possible, as they will automatically unquote their values. On the other hand, it seems unquote offers a bit more flexibility.

defmodule Test do
  @function_name "my_test_function"

  def unquote(@function_name |> String.to_atom)() do
    @function_name
  end
end

Our next step is to let the for comprehension enumerate all the angles that we want to optimize. Our TwoD module now looks like this:

defmodule TwoD do
  require TwoD.Helpers

  @angles for n <- -360..360, rem(n, 90) == 0, do: n

  # Optimized versions of the code
  for angle <- @angles, do: TwoD.Helpers.def_optimized_rotate(angle)

  # This general purpose implementation will serve any other angle
  def rotate(point, angle), do: TwoD.Helpers.rotate(point, angle)
end

This introduces a new problem to our code. Our macro def_optimized_rotate now receives the quoted reference to angle which is not possible to evaluate in the macro context. Actually our first implementation implicitly required that the angle parameter be spelled out as a number. It seems wrong that the user of our macro has to know that the parameter must have a particular form.

This is the first time we will see a pattern with macro programming, and one reason to be wary of using macros: The macro might work well in one instance, but changes made in code outside of the macro could easily break it. To paraphrase a saying: The code is far from easy to reason about.

Delaying the macro logic

If the mountain will not come to Muhammad, Muhammad must go to the mountain.

There are two ways to use the angle values from the for comprehension in our macro:

move the for comprehension into our macro, thus hardcoding the optimized angles
inject everything into the resulting module definition

We’ll choose the latter option beacuse I think it is more clear that the optimized angles are stated in the TwoD module rather than in the macro.

There is no way to evaluate the code in the macro parameter correctly inside the macro. Instead we must move all the code into a context where the parameter may be evaluated correctly.

defmodule TwoD.Helpers do
  @deg_to_rad :math.pi / 180.0

  def rotate({x, y}, angle) do
    radians = angle * @deg_to_rad
    { x * :math.cos(radians) - y * :math.sin(radians),
      x * :math.sin(radians) + y * :math.cos(radians) }
  end

  defmacro def_optimized_rotate(angle) do
    quote(bind_quoted: [angle_copy: angle], unquote: false) do
      x_quoted = Macro.var(:x, __MODULE__)
      y_quoted = Macro.var(:y, __MODULE__)
      neg_x_quoted = quote do: (-unquote(Macro.var(:x, __MODULE__)))
      neg_y_quoted = quote do: (-unquote(Macro.var(:y, __MODULE__)))

      # normalize to 0..360; must add 360 in case of negative angle values
      normalized = angle_copy |> round |> rem(360) |> Kernel.+(360) |> rem(360)

      result_vars_quoted = case normalized do
        0 ->
          [x_quoted, y_quoted]
        90 ->
          [neg_y_quoted, x_quoted]
        180 ->
          [neg_x_quoted, neg_y_quoted]
        270 ->
          [y_quoted, neg_x_quoted]
        _ ->
          raise "Optimized angles must be right or straight"
      end

      def rotate({unquote_splicing([x_quoted, y_quoted])}, unquote(1.0 * angle_copy)) do
        {unquote_splicing(result_vars_quoted)}
      end
    end
  end
end

Compared to the initial rotate function, this code is admittedly quite dense. This is where I gradually realize why everyone warns against macro overuse.

The first thing to note is that all the generated code is contained inside a giant quote statement. Because we want to insert unquote calls into our result (to be evaluated inside the module definition), we have to use the option unquote: false.

We may no longer use unquote to insert the angle parameter quoted. To mend this, we add the option bind_quoted: [angle_copy: angle]. The result of adding the bind_quoted option is best shown with an example:

iex(1)> angle = quote do: 90 * 4.0
{:*, [context: Elixir, import: Kernel], [90, 4.0]}

iex(2)> Macro.to_string(quote(bind_quoted: [angle_copy: angle]) do
...(2)> rot_x = TwoD.Helpers.prepare_observed_vector {1, 0}, angle_copy, :x
...(2)> # more code
...(2)> end) |> IO.puts
(
  angle_copy = 90 * 4.0
  rot_x = TwoD.Helpers.prepare_observed_vector({1, 0}, angle_copy, :x)
)
:ok

bind_quoted is really quite simple. It just adds an assignment before any other code. This also has the benefit of ensuring that the parameter code is only evaluated once. Seems we should be using bind_quoted rather than inline unquoting in most circumstances.

As we don’t really use the angle in the macro anymore, we no longer need Code.eval_quoted. I admit using it was a bad idea in the first place.

This is the second time the macro stopped working due to changes in the calling code. It seems the first version of out macro worked more or less by accident. The code:

def rotate({x, y}, unquote(angle_copy)) do
  {unquote_splicing(result_vars_quoted)}
end

had to be replaced with:

def rotate({unquote_splicing([x_quoted, y_quoted])}, unquote(angle_copy)) do
  {unquote_splicing(result_vars_quoted)}
end

The reason for this being that the quoted code for the result did not, due to macro hygiene, map directly to {x,y}.

This does the trick, and the code now works as intended.

Testing

To test the code, we will compare the output of our optimized function and the generic implementation. The test might look like this:

# in file test/two_d_test.exs
defmodule TwoD.Tests do
  use ExUnit.Case, async: true
  alias TwoD.Helpers, as: H

  @point {123.0, 456.0}

  def round_point({x, y}), do: {round(x), round(y)}

  test "optimized rotates must match generic version" do
    assert (TwoD.rotate(@point, -270.0) |> round_point) ==
      (H.rotate(@point, -270.0) |> round_point)

    assert (TwoD.rotate(@point, 0.0) |> round_point) ==
      (H.rotate(@point, 0.0) |> round_point)

    assert (TwoD.rotate(@point, 90.0) |> round_point) ==
      (H.rotate(@point, 90.0) |> round_point)
  end

  test "the non right/straight angles should still work" do
    assert (TwoD.rotate(@point, 85.0) |> round_point) ==
      (H.rotate(@point, 85.0) |> round_point)
  end
end

Benchmarking the results

A final difficulty remains: we are still not sure whether our optimized code is actually running, or the generic implementation is still handling all function calls.

If the optimization is working, a benchmark should show us that. In any event it is useful to measure that the optimization is worthwhile. I decided to use the benchwarmer package for this. The mix.exs file is modified to include:

  defp deps do
    [
      { :benchwarmer, "~> 0.0.2" }
    ]
  end

And then we’ll add a simple benchmark script like this:

# in file lib/mix/tasks/benchmark.ex
defmodule Mix.Tasks.Benchmark do
  use Mix.Task

  def run(_) do
    IO.puts "Checking optimized vs unoptimized"
    Benchwarmer.benchmark(
      [&TwoD.Helpers.rotate/2, &TwoD.rotate/2], [{123.0, 456.0}, 180.0]
    )

    IO.puts "Checking overhead of having optimizations"
    Benchwarmer.benchmark(
      [&TwoD.Helpers.rotate/2, &TwoD.rotate/2], [{123.0, 456.0}, 182.0]
    )
  end
end

in turn giving us:

$ mix benchmark
Checking optimized vs unoptimized
*** &TwoD.Helpers.rotate/2 ***
1.6 sec   524K iterations   3.18 μs/op

*** &TwoD.rotate/2 ***
1.4 sec     2M iterations   0.71 μs/op

Checking overhead of having optimizations
*** &TwoD.Helpers.rotate/2 ***
1.3 sec     1M iterations   1.34 μs/op

*** &TwoD.rotate/2 ***
1.8 sec     1M iterations   1.78 μs/op

I find it a bit interesting that we are getting a 4X speedup for the straight and right angles, while at the same time the general purpose call is 20% slower. Neither of these results should come as a big surprise.

In conclusion, this technique is worthwhile if you have a slow computation that is mostly called with a specific range of arguments. It also seems wise to factor in the loss of readability.

You may browse the complete source code at GitHub

Thanks

Thanks to @mgwidmann for pointing out that unquote is so useful inside a module definition.

Thanks to Saša Jurić for getting me through difficult compiler issues, and then helping me out with the code examples and text.

Beyond Task.Async

Fri, 31 Jul 15 00:00:00 +0000

Beyond Task.Async

2015-07-31

In this post I’ll talk about less typical patterns of parallelization with tasks. Arguably, the most common case for tasks is to start some jobs concurrently with Task.async and then collect the results with Task.await. By doing this we might run separate jobs in parallel, and thus perform the total work more efficiently. This can be done very elegantly with async/await without the much overhead in the code.

However, async/await have some properties which may not be suitable in some cases, so you might need a different approach. That is the topic of this post, but first, let’s quickly recap the basic async/await pattern.

Parallelizing with async/await

Async/await makes sense when we need to perform multiple independent computations and aggregate their results into the total output. If computations take some time, we might benefit by running them concurrently, possibly reducing the total execution time from sum(computation_times) to max(computation_times).

The computation can be any activity such as database query, a call to a 3rd party service, or some CPU bound calculation. In this post, I’ll just use a contrived stub:

defmodule Computation do
  def run(x) when x > 0 do
    :timer.sleep(x)  # simulates a long-running operation
    x
  end
end

This “computation” takes a positive integer x, sleeps for x milliseconds, and returns the number back. It’s just a simulation of a possibly long running operation.

Now, let’s say that we need to aggregate the results of multiple computations. Again, I’ll introduce a simple stub:

defmodule Aggregator do
  def new, do: 0
  def value(aggregator), do: aggregator

  def add_result(aggregator, result) do
    :timer.sleep(50)
    aggregator + result
  end
end

This is just a simple wrapper which sums input numbers. In real life, this might be a more involved aggregator that somehow combines results of multiple queries into a single “thing”.

Assuming that different computations are independent, there is potential to run them concurrently, and this is where tasks come in handy. For example, let’s say we need to run this computation for ten different numbers:

defmodule AsyncAwait do
  def run do
    :random.seed(:os.timestamp)

    1..10
    |> Enum.map(fn(_) -> :random.uniform(1000) end)
    |> Enum.map(&Task.async(fn -> Computation.run(&1) end))
    |> Enum.map(&Task.await/1)
    |> Enum.reduce(Aggregator.new, &Aggregator.add_result(&2, &1))
    |> Aggregator.value
  end
end

This is a fairly simple technique. First, we generate some random input and start the task to handle each element. Then, we await on results of each task and reduce responses into the final value. This allow us to improve the running time, since computations might run in parallel. The total time should be the time of the longest running computation plus the fixed penalty of 500 ms (10 * 50 ms) to include each result into the total output. In this example it shouldn’t take longer than 1500 ms to get the final result.

Properties of async/await

Async/await is very elegant and brings some nice benefits, but it also has some limitations.

The first problem is that we await on results in the order we started the tasks. In some cases, this might not be optimal. For example, imagine that the first task takes 500 ms, while all others take 1 ms. This means that we’ll process the results of short-running tasks only after we handle the slow task. The total execution time in this example will be about 1 second. From the performance point of view, it would be better if we would take results as they arrive. This would allow us to aggregate most of the results while the slowest task is still running, reducing the execution time to 550 ms.

Another issue is that it’s not easy to enforce a global timeout. You can’t easily say, “I want to give up if all the results don’t arrive in 500 ms”. You can provide a timeout to Task.await (it’s five seconds by default), but this applies only to a single await operation. Hence, a five seconds timeout actually means we might end up waiting 50 seconds for ten tasks to time out.

Finally, you should be aware that async/await pattern takes the all-or-nothing approach. If any task or the master process crashes, all involved processes will be taken down (unless they’re trapping exits). This happens because Task.async links the caller and the spawned task process.

In most situations, these issues won’t really matter, and async/await combo will be perfectly fine. However, sometimes you might want to change the default behaviour.

Eliminating await

Let’s start by making the “master” process handle results in the order of arrival. This is fairly simple if we rely on the fact that Task.async reports the result back to the caller process via a message. We can therefore receive a message, and check if it comes from one of our task. If so, we can add the result to the aggregator.

To do this, we can rely on Task.find/2 that takes the list of tasks and the message, and returns either {result, task} if the message corresponds to the task in the list, or nil if the message is not from a task in the given list:

defmodule AsyncFind do
  def run do
    :random.seed(:os.timestamp)

    1..10
    |> Enum.map(fn(_) -> :random.uniform(1000) end)
    |> Enum.map(&Task.async(fn -> Computation.run(&1) end))
    |> collect_results
  end

  defp collect_results(tasks, aggregator \\ Aggregator.new)

  defp collect_results([], aggregator), do: Aggregator.value(aggregator)
  defp collect_results(tasks, aggregator) do
    receive do
      msg ->
        case Task.find(tasks, msg) do
          {result, task} ->
            collect_results(
              List.delete(tasks, task),
              Aggregator.add_result(aggregator, result)
            )

          nil ->
            collect_results(tasks, aggregator)
        end
    end
  end
end

Most of the action happens in collect_results. Here, we loop recursively, waiting for a message to arrive. Then we invoke Task.find/2 to determine whether the message comes from a task. If yes, we delete the task from the list of pending tasks, aggregate the response and resume the loop. The loop stops when there are no more pending tasks in the list. Then, we simply return the aggregated value.

In this example I’m using explicit receive, but in production you should be careful about it. If the master process is a server, such as GenServer or Phoenix.Channel, you should let the underlying behaviour receive messages, and invoke Task.find/2 from the handle_info callback. For the sake of brevity, I didn’t take that approach here, but as an exercise you could try to implement it yourself.

One final note: by receiving results as they arrive we lose the ordering. In this case, where we simply sum numbers, this doesn’t matter. If you must preserve the ordering, you’ll need to include an additional order info, and then sort the results after they are collected.

Handling timeouts

Once we moved away from Task.await, the master process becomes more flexible. For example, we can now easily introduce a global timeout. The idea is simple: after the tasks are started, we can use Process.send_after/3 to send a message to the master process after some time:

defmodule Timeout do
  def run do
    # exactly the same as before
  end

  defp collect_results(tasks) do
    timeout_ref = make_ref
    timer = Process.send_after(self, {:timeout, timeout_ref}, 900)
    try do
      collect_results(tasks, Aggregator.new, timeout_ref)
    after
      :erlang.cancel_timer(timer)
      receive do
        {:timeout, ^timeout_ref} -> :ok
        after 0 -> :ok
      end
    end
  end

  # ...
end

Here, we create the timer, and a reference which will be a part of the timeout message. Then we enqueue the timeout message to be sent to the master process after 900 ms. Including the reference in the message ensures that the timeout message will be unique for this run, and will not interfere with some other message.

Finally, we start the receive loop and return it’s result.

Take special note of the after block where we cancel the timer to avoid sending a timeout message if all the results arrive on time. However, since timer works concurrently to the master process, it is still possible that the message might have been sent just before we canceled the timer, but after all the results are already collected. Thus, we do a receive with a zero timeout to flush the message if it’s already in the queue.

With this setup in place, we now need to handle the timeout message:

defp collect_results([], aggregator, _), do: {:ok, Aggregator.value(aggregator)}
defp collect_results(tasks, aggregator, timeout_ref) do
  receive do
    {:timeout, ^timeout_ref} ->
      {:timeout, Aggregator.value(aggregator)}

    msg ->
      case Task.find(tasks, msg) do
        {result, task} ->
          collect_results(
            List.delete(tasks, task),
            Aggregator.add_result(aggregator, result),
            timeout_ref
          )

        nil -> collect_results(tasks, aggregator, timeout_ref)
      end
  end
end

The core change here is in lines 4-5 where we explicitly deal with the timeout. In this example, we just return what we currently have. Depending on the particular use case, you may want to do something different, for example raise an error.

Explicitly handling errors

The next thing we’ll tackle is error handling. Task.async is built in such a way that if something fails, everything fails. When you start the task via async the process will be linked to the caller. This holds even if you use Task.Supervisor.async. As the result, if some task crashes, the master process will crash as well, taking down all other tasks.

If this is not a problem, then Task.async is a perfectly valid solution. However, sometimes you may want to explicitly deal with errors. For example, you might want to just ignore failing tasks, reporting back whatever succeeded. Or you may want to keep the tasks running even if the master process crashes.

There are two basic ways you can go about it: catch errors in the task, or use Task.Supervisor with start_child.

Catching errors

The simplest approach is to encircle the task code with a try/catch block:

Task.async(fn ->
  try do
    {:ok, Computation.run(...)}
  catch _, _ ->
    :error
  end
end)

Then, when you receive results, you can explicitly handle each case, ignoring :error results. The implementation is mostly mechanical and left to you as an exercise.

I’ve occasionally seen some concerns that catching is not the Erlang/Elixir way, so I’d like to touch on this. If you can do something meaningful with an error, catching is a reasonable approach. In this case, we want to collect all the successful responses, so ignoring failed ones is completely fine.

So catching is definitely a simple way of explicitly dealing with errors, but it’s not without shortcomings. The main issue is that catch doesn’t handle exit signals. Thus, if the task links to some other process, and that other process terminates, the task process will crash as well. Since the task is linked to the master process, this will cause the master process to crash, and in turn crash all other tasks. The link between the caller and the task also means that if the master process crashes, for example while aggregating, all tasks will be terminated.

To overcome this, we can either make all processes trap exits, or remove the link between processes. Trapping exits might introduce some subtle issues (see here for some information), so I’ll take the second approach.

Replacing async

The whole issue arises because async links the caller and the task process, which ensures “all-or-nothing” property. This is a perfectly fine decision, but it’s not necessarily suitable for all cases. ~~I wonder whether linking should be made optional, but I don’t have a strong opinion at the moment.~~ Update: Task.Supervisor.async_nolink has been introduced in Elixir 1.2, which allows you to start a task which is not linked to the caller.

As it is, Task.async currently establishes a link, and if we want to avoid this, we need to reimplement async ourselves. Here’s what we’ll do:

Start a Task.Supervisor and use Task.Supervisor.start_child to start tasks.
Manually implement sending of the return message from the task to the caller.
Have the master process monitor tasks so it can be notified about potential crashes. Explicitly handle such messages by removing the crashed task from the list of tasks we await on.

The first point allows us to run tasks in a different part of the supervision tree from the master. Tasks and the master process are no longer linked, and failure of one process doesn’t cause failure of others.

However, since we’re not using async anymore, we need to manually send the return message to the caller process.

Finally, using the monitor ensures that the master process will be notified if some task crashes and can stop awaiting on their results.

This requires more work, but it provides stronger guarantees. We can now be certain that:

A failing task won’t crash anyone else.
The master process will be informed about the task crash and can do something about it.
Even a failure of master process won’t cause tasks to crash.

If the third property doesn’t suit your purposes, you can simply place the master process and the tasks supervisor under the same common supervisor, with one_for_all or rest_for_one strategy.

This is what I like about Erlang fault-tolerance approach. There are various options with strong guarantees. You can isolate crashes, but you can also connect failures if needed. Some scenarios may require more work, but the implementation is still straightforward. Supporting these scenarios without process isolation and crash propagation would be harder and you might end up reinventing parts of Erlang.

Let’s implement this. The top-level run/0 function is now changed a bit:

defmodule SupervisedTask do
  def run do
    :random.seed(:os.timestamp)
    Task.Supervisor.start_link(name: :task_supervisor)

    work_ref = make_ref

    1..10
    |> Enum.map(fn(_) -> :random.uniform(1000) - 500 end)
    |> Enum.map(&start_computation(work_ref, &1))
    |> collect_results(work_ref)
  end

  # ...
end

First, a named supervisor is started. This is a quick hack to keep the example short. In production, this supervisor should of course reside somewhere in the supervision hierarchy.

Then, a work reference is created, which will be included in task response messages. Finally, we generate some random numbers and start our computations. Notice the :random.uniform(1000) - 500. This ensures that some numbers will be negative, which will cause some tasks to crash.

Tasks now have to be started under the supervisor:

defp start_computation(work_ref, arg) do
  caller = self

  # Start the task under the named supervisor
  {:ok, pid} = Task.Supervisor.start_child(
    :task_supervisor,
    fn ->
      result = Computation.run(arg)

      # Send the result back to the caller
      send(caller, {work_ref, self, result})
    end
  )

  # Monitor the started task
  Process.monitor(pid)
  pid
end

Finally, we need to expand the receive loop to handle :DOWN messages, which we’ll receive when the task terminates:

defp collect_results(tasks, work_ref) do
  timeout_ref = make_ref
  timer = Process.send_after(self, {:timeout, timeout_ref}, 400)
  try do
    collect_results(tasks, work_ref, Aggregator.new, timeout_ref)
  after
    :erlang.cancel_timer(timer)
    receive do
      {:timeout, ^timeout_ref} -> :ok
      after 0 -> :ok
    end
  end
end

defp collect_results([], _, aggregator, _), do: {:ok, Aggregator.value(aggregator)}
defp collect_results(tasks, work_ref, aggregator, timeout_ref) do
  receive do
    {:timeout, ^timeout_ref} ->
      {:timeout, Aggregator.value(aggregator)}

    {^work_ref, task, result} ->
      collect_results(
        List.delete(tasks, task),
        work_ref,
        Aggregator.add_result(aggregator, result),
        timeout_ref
      )

    {:DOWN, _, _, pid, _} ->
      if Enum.member?(tasks, pid) do
        # Handling task termination. In this case, we simply delete the
        # task from the list of tasks, and wait for other tasks to finish.
        collect_results(List.delete(tasks, pid), work_ref, aggregator, timeout_ref)
      else
        collect_results(tasks, work_ref, aggregator, timeout_ref)
      end
  end
end

This is mostly straightforward, with the major changes happening in lines 29-36. It’s worth mentioning that we’ll receive a :DOWN message even if the task doesn’t crash. However, this message will arrive after the response message has been sent back, so the master process will first handle the response message. Since we remove the task from the list, the subsequent :DOWN message of that task will be ignored. This is not super efficient, and we could have improved this by doing some extra bookkeeping and demonitoring the task after it returns, but I refrained from this for the sake of brevity.

In any case, we can now test it. If I start SupervisedTask.run, I’ll see some errors logged (courtesy of Logger), but I’ll still get whatever is collected. You can also try it yourself. The code is available here.

Reducing the boilerplate

As we moved to more complex patterns, our master process became way more involved. The plain async/await has only 12 lines of code, while the final implementation has 66. The master process is burdened with a lot of mechanics, such as keeping references, starting a timer message, and handling received messages. There’s a lot of potential to extract some of that boilerplate, so we can keep the master process more focused.

There are different approaches to extracting the boilerplate. If a process has to behave in a special way, you can consider creating a generic OTP-like behaviour that powers the process. The concrete implementation then just has to fill in the blanks by providing necessary callback functions.

However, in this particular case, I don’t think creating a behaviour is a good option. The thing is that the master process might already be powered by a behaviour, such as GenServer or Phoenix.Channel. If we implement our generic code as a behaviour, we can’t really combine it with another behaviour. Thus, we’ll always need to have one more process that starts all these tasks and collects their results. This may result in excessive message passing, and have an impact on performance.

An alternative is to implement a helper module that can be used to start tasks and process task related messages. For example, we could have the following interface for starting tasks:

runner = TaskRunner.run(
  [
    {:supervisor1, {SomeModule, :some_function, args}},
    {:supervisor2, {AnotherModule, :some_function, other_args}},
    {:supervisor3, fn -> ... end},
    # ...
  ],
  timeout
)

Under the hood, TaskRunner would start tasks under given supervisors, setup work and timer references, and send the timeout message to the caller process. By allowing different tasks to run under different supervisors, we have more flexibility. In particular, this allows us to start different tasks on different nodes.

The responsibility of receiving messages now lies on the caller process. It has to receive a message either via receive or for example in the handle_info callback. When the process gets a message, it has to first pass it to TaskRunner.handle_message which will return one of the following:

nil - a message is not task runner specific, feel free to handle it yourself
{{:ok, result}, runner} - a result arrived from a task
{{:task_error, reason}, runner} - a task has crashed
{:timeout, runner} - timeout has occurred

Finally, we’ll introduce a TaskRunner.done?/1 function, which can be used to determine whether all tasks have finished.

This is all we need to make various decision in the client process. The previous example can now be rewritten as:

defmodule TaskRunnerClient do
  def run do
    :random.seed(:os.timestamp)
    Task.Supervisor.start_link(name: :task_supervisor)

    1..10
    |> Enum.map(fn(_) -> :random.uniform(1000) - 500 end)
    |> Enum.map(&{:task_supervisor, {Computation, :run, [&1]}})
    |> TaskRunner.run(400)
    |> handle_messages(Aggregator.new)
  end


  defp handle_messages(runner, aggregator) do
    if TaskRunner.done?(runner) do
      {:ok, Aggregator.value(aggregator)}
    else
      receive do
        msg ->
          case TaskRunner.handle_message(runner, msg) do
            nil -> handle_messages(runner, aggregator)

            {{:ok, result}, runner} ->
              handle_messages(runner, Aggregator.add_result(aggregator, result))

            {{:task_error, _reason}, runner} ->
              handle_messages(runner, aggregator)

            {:timeout, _runner} ->
              {:timeout, Aggregator.value(aggregator)}
          end
      end
    end
  end
end

This is less verbose than the previous version, and the receive loop is now focused only on handling of success, error, and timeout, without worrying how these situations are detected.

The code is still more involved than the simple async/await pattern, but it offers more flexibility. You can support various scenarios, such as stopping on first success or reporting the timeout back to the user while letting the tasks finish their jobs. If this flexibility is not important for your particular scenarios, then this approach is an overkill, and async/await should do just fine.

I will not describe the implementation of TaskRunner as it is mostly a refactoring of the code from SupervisedTask. You’re advised to try and implement it yourself as an exercise. A basic (definitely not complete or tested) take can be found here.

Parting words

While this article focuses on tasks, in a sense they serve more as an example to illustrate concurrent thinking in Erlang.

Stepping away from Task.await and receiving messages manually allowed the master process to be more flexible. Avoiding links between master and the tasks decoupled their lives, and gave us a better error isolation. Using monitors made it possible to detect failures and perform some special handling. Pushing everything to a helper module, without implementing a dedicated behaviour, gave us the generic code that can be used in different types of processes.

These are in my opinion more important takeaways of this article. In the future the Elixir team may introduce additional support for tasks which will make most of these techniques unnecessary. But the underlying reasoning should be applicable in many other situations, not necessarily task-related.

Speaking at ElixirConf EU

Mon, 26 Jan 15 00:00:00 +0000

Speaking at ElixirConf EU

2015-01-26

I’m very excited that my talk on high-availability got accepted for the first European Elixir conference.

For the past three years, I have been evangelizing Erlang and Elixir here in Croatia, at our local annual WebCampZg event. In addition, I invested a lot of effort writing the book on Elixir which is now in its final stages.

All this work has been motivated by my own positive experience using Erlang and Elixir in production. Some four years ago, I started using Erlang almost by chance, and it helped me immensely in building a long-polling based HTTP push server together with the supporting data provider system. The more I worked with Erlang, the more I got fascinated with how useful it is when it comes to building server-side systems. Ultimately, it became my tool of choice for development of complex backend systems that need to provide reliable service.

I reached for Elixir two years ago, when I started this blog, hoping it will help me showcase the magic behind Erlang to OO developers. I was really surprised with the level of maturity and integration with Erlang, even at that early stage. Pretty soon, I started introducing Elixir in production, and discovered it further boosts my productivity.

Two years later, Elixir 1.0 is out, the ecosystem is growing, and we have great libraries such as Phoenix and Ecto, leveraging Elixir to further improve developer productivity.

Moreover, there’s a lot of learning material available. In addition to excellent online getting started guides and reference, there are three published books, with Elixir in Action almost finished, and two more in the making. The present looks great, and the future is even more promising. These are exciting times, and a great chance to jump aboard and start using Elixir and Erlang.

So if you happen to be interested, grab a ticket for ElixirConfEU while it’s available. Hope I’ll see you there!

While I’m in the announcement mode, I’ll also mention that we’re starting a local FP group here in Zagreb, Croatia. Details about the introductory drinkup can be found here, so if you happen to live nearby, come and visit us for some functional chat.

Conway's Game of Life in Elixir

Mon, 24 Nov 14 00:00:00 +0000

Conway's Game of Life in Elixir

2014-11-24

About a month ago, on Elixir Quiz site there was a Conway’s Game of Life challenge. While I didn’t find the time to participate in the challenge, I played with the problem recently, and found it very interesting.

So in this post, I’m going to break down my solution to the problem. If you’re not familiar with Game of Life rules, you can take a quick look here.

My solution is simplified in that I deal only with square grids. It’s not very hard to extend it to work for any rectangle, but I wanted to keep things simple.

Functional abstraction

The whole game revolves around the grid of cells which are in some state (dead or alive), and there are clear rules that determine the next state of each cell based on the current state of its neighbours. Thus, I’ve implemented the Conway.Grid module that models the grid. Let’s see how the module will be used.

The initial grid can be created with Conway.Grid.new/1:

# Creates 5x5 grid with random values
grid = Conway.Grid.new(5)

# Creates grid from the given cell data
grid = Conway.Grid.new([
  [1, 1, 0],
  [0, 1, 0],
  [1, 1, 0]
])

As can be deducted from the second example, a cell state can be either zero (not alive) or one (alive).

Once the grid is instantiated, we can move it a step forward with Conway.Grid.next/1:

grid = Conway.Grid.next(grid)

Finally, we can query grid’s size, and the value of each cell:

Conway.Grid.size(grid)

# Returns 0 or 1 for the cell at the given location
Conway.Grid.cell_status(grid, x, y)

This is all we need to manipulate the grid and somehow display it.

This is a simple decoupling technique. The game logic is contained in the single module, but the “driving” part of the game, i.e. the code that repeatedly moves the game forward, is left out.

This allows us to use the core game module in different contexts. In my example, I’m using Conway.Grid from a simplistic terminal client, but it’s easy to use the module from a GenServer for example to push updates to various connected clients, or from unit tests to verify that state transition works properly.

Another nice benefit of this approach is that we can use :erlang.term_to_binary/1 to serialize the structure and persist the grid state, and then later deserialize it and resume playing the grid.

This is what I like to call a functional abstraction. Notice in previous examples how we use Conway.Grid without knowing its internal representation. The module abstracts away its internal details. In particular, as clients, we don’t care what data type is used for the module. All we know that creator and updater functions return a “grid”, and all functions from Conway.Grid know how to work with that grid.

The module thus abstracts some concept, and does so relying on a pure functional (immutable) data structure. Hence, a functional abstraction.

Note: Frequently, the term type is used for this. I’m not particular fan of this terminology. To me, the only true Elixir types are the ones supported by BEAM. All others, such as HashDict, HashSet, Range, Erlang’s :gb_trees, and even structs, are somehow composed from those basic types.

Choosing the data representation

Update: As Greg and leikind pointed out in comments, the approach I’m taking here is neither efficient nor flexible, because I’m keeping and processing all cells, instead of dealing only with live ones. You can find the alternative version, where only live cells are kept in a HashSet here. The nice thing is that the change was simple, due to abstraction of the Conway.Grid. The module interface remained the same.

In any case, let’s start implementing Conway.Grid. The most important decision is how to represent the grid data. Given the game rules, we have following needs:

random access to cells (their states)
incremental building of the grid

We need the first property to access neighbour cells when determining the next state of each cell. The second property is needed since in each step we fully rebuild the grid based on the current state of each cell.

In BEAM, tuples are a good fit for random access (which is O(1) operation), but they are poor for incremental building. Modifying a tuple (almost always) results in (shallow) copying of all tuple elements. This can hurt performance and increase memory usage.

In contrast, lists are crappy for random access, but they are efficient for incremental building, if we’re either prepending new elements to the head, or building the list in a body-recursive way.

However, we can use different approaches in different situations. In particular, we can:

Maintain a 2D grid as a tuple of tuples. This gives us an O(1) random access complexity.
Build a new grid as a lists of lists. Once the new grid is built, convert it to tuple of tuples via List.to_tuple/1.

List.to_tuple/1 will be efficient (though still O(n)), since it is implemented in C, and does it’s job by preallocating the tuple and populating it from the list. Thus, we avoid extra copying of tuples.

Performance wise, this is probably not the optimal implementation, but I think it’s a reasonable first attempt that still keeps the code simple and clear.

So to recap, out grid will be implemented as the tuple of tuples:

{
  {1, 1, 0},
  {0, 1, 0},
  {1, 1, 0}
}

This is all the data we need, since we can efficiently derive the grid size from the data via Kernel.tuple_size/1. It’s still worth making our Conway.Grid a struct, so we can gain pattern matching, possible polymorphism, and easier extensibility.

Hence, the skeleton of the module will look like:

defmodule Conway.Grid do
  defstruct data: nil

  ...
end

Now we can start implementing the module.

Constructing the grid

Recall from usage examples that our “constructor” function is overloaded. It either takes a grid dimension and creates the randomly populated grid, or it takes a list of lists with prepopulated data.

Let’s solve the latter case first:

def new(data) when is_list(data) do
  %Conway.Grid{data: list_to_data(data)}
end

defp list_to_data(data) do
  data
  |> Enum.map(&List.to_tuple/1)     # convert every inner list
  |> List.to_tuple                  # convert the outer list
end

Now, we can do the random population. We’ll first implement a helper generic function for creating the grid data:

defp new_data(size, producer_fun) do
  for y <- 0..(size - 1) do
    for x <- 0..(size - 1) do
      producer_fun.(x, y)
    end
  end
  |> list_to_data
end

Here, we take the desired size, and produce a square list of lists, calling the producer_fun lambda for each element. Then, we just pass it to list_to_data/1 to convert to a tuple of tuples. This genericity of new_data/2 will allow us to reuse the code when moving the grid to the next state.

For the moment, we can implement the second clause of new/1:

def new(size) when is_integer(size) and size > 0 do
  %Conway.Grid{
    data: new_data(size, fn(_, _) -> :random.uniform(2) - 1 end)
  }
end

Next, let’s implement two getter functions for retrieving the grid size and the state of each cell:

def size(%Conway.Grid{data: data}), do: tuple_size(data)

def cell_status(grid, x, y) do
  grid.data
  |> elem(y)
  |> elem(x)
end

Shifting the state

The only thing remaining is to move the grid to the next state. Let’s start with the interface function:

def next(grid) do
  %Conway.Grid{grid |
    data: new_data(size(grid), &next_cell_status(grid, &1, &2))
  }
end

As mentioned earlier, we reuse the existing new_data/2 function. We just provide a different lambda which will generate new cell states based on the current grid state.

Implementation of next_cell_status/3 embeds the game rules:

def next_cell_status(grid, x, y) do
  case {cell_status(grid, x, y), alive_neighbours(grid, x, y)} do
    {1, 2} -> 1
    {1, 3} -> 1
    {0, 3} -> 1
    {_, _} -> 0
  end
end

Here I’ve resorted to a case branch, because I think it’s the most readable approach in this case. I’ve experimented with moving this branching to a separate multiclause, but then it was less clear what is being pattern-matched.

Counting alive neighbours

Now we move to the most complex part of the code. Calculating the number of alive neighbours. For this, we have to get the state of each surrounding cell, and count the number of those which are alive.

In this example, I’ve decided to use the for comprehension, because it has nice support for multiple generators and rich filters.

However, for emits results to a collectable, and we need a single integer (the count of alive neighbours). Therefore, I’ve implemented a simple sum collectable. It allows us to collect an enumerable of numbers into an integer containing their sum.

The idea is then to use for to filter all alive neighbours, emit value 1 for each such neighbour, and collect those 1s into a Sum instance:

defp alive_neighbours(grid, cell_x, cell_y) do
  # 1. Iterate all x,y in -1..+1 area
  for x <- (cell_x - 1)..(cell_x + 1),
      y <- (cell_y - 1)..(cell_y + 1),
      (
        # take only valid coordinates
        x in 0..(size(grid) - 1) and
        y in 0..(size(grid) - 1) and

        # don't include the current cell
        (x != cell_x or y != cell_y) and

        # take only alive cells
        cell_status(grid, x, y) == 1
      ),
      # collect to Sum
      into: %Sum{}
  do
    1   # add 1 for every alive neighbour
  end
  |> Sum.value    # get the sum value
end

I did initial implementation of this with nested Enum.reduce/3 and I wasn’t as pleased. This solution actually takes more LOC, but I find it easier to understand. There are many other ways of implementing this counting, but to me this approach seems pretty readable. YMMV of course.

Update: Tallak Tveide rightfully asked why not just pipe the result of for into Enum.sum/1 (note also that Enum.count/1 also works). This will work, and quite possibly perform just fine. However, when I was first writing this particular function, I asked myself why would I want to create an intermediate enumerable just to count its size. This is why I made the Sum collectable. It’s probably over-engineering / micro-optimizing for this case, but I found it an interesting exercise. As an added benefit, I have a generic Sum collectable which I can use in any of my code whenever I need to count the number of filtered items.

In any case, we’re done. The simple implementation of Conway’s Game of Life is finished. We have a nice functional abstraction and a basic terminal client. Give it a try on your machine. Just paste the complete code into the iex shell, or run it with elixir conway.ex.

Copyright 2014, Saša Jurić. This article is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
The article was first published on The Erlangelist site.
The source of the article can be found here.

Understanding Elixir Macros, Part 6 - In-place Code Generation

Sun, 6 Jul 14 00:00:00 +0000

Understanding Elixir Macros, Part 6 - In-place Code Generation

2014-07-06

Today’s post is the last one in the macro series. Before starting, I’d like to extend kudos to Björn Rochel who already improved on deftraceable macro in his Apex library. Björn discovered that the blog version of deftraceable doesn’t handle default args (arg \\ def_value) properly, and implemented a fix.

In the meantime, let’s wrap up this macro saga. In today’s post, probably the most involved one in the entire series, I’m going to discuss some aspects of an in-place code generation, and the consequences it may have on our macros.

Generating code in the module

As I mentioned way back in part 1, macros are not the only meta-programming mechanism in Elixir. It is also possible to generate the code directly in the module. To refresh your memory, let’s see the example:

defmodule Fsm do
  fsm = [
    running: {:pause, :paused},
    running: {:stop, :stopped},
    paused: {:resume, :running}
  ]

  # Dynamically generating functions directly in the module
  for {state, {action, next_state}} <- fsm do
    def unquote(action)(unquote(state)), do: unquote(next_state)
  end
  def initial, do: :running
end

Fsm.initial
# :running

Fsm.initial |> Fsm.pause
# :paused

Fsm.initial |> Fsm.pause |> Fsm.pause
# ** (FunctionClauseError) no function clause matching in Fsm.pause/1

Here, we’re dynamically generating function clauses directly in the module. This allows us to metaprogram against some input (in this case a keyword list), and generate the code without writing a dedicated macro.

Notice in the code above how we use unquote to inject variables into function clause definition. This is perfectly in sync with how macros work. Keep in mind that def is also a macro, and a macro always receives it’s arguments quoted. Consequently, if you want a macro argument to receive the value of some variable, you must use unquote when passing that variable. It doesn’t suffice to simply call def action, because def macro receives a quoted reference to action rather than value that is in the variable action.

You can of course call your own macros in such dynamic way, and the same principle will hold. There is an unexpected twist though - the order of evaluation is not what you might expect.

Order of expansion

As you’d expect, the module-level code (the code that isn’t a part of any function) is evaluated in the expansion phase. Somewhat surprisingly, this will happen after all macros (save for def) have been expanded. It’s easy to prove this:

iex(1)> defmodule MyMacro do
          defmacro my_macro do
            IO.puts "my_macro called"
            nil
          end
        end

iex(2)> defmodule Test do
          import MyMacro

          IO.puts "module-level expression"
          my_macro
        end

# Output:
my_macro called
module-level expression

See from the output how mymacro is called before IO.puts even though the corresponding IO.puts call precedes the macro call. This proves that compiler first resolves all “standard” macros. Then the module generation starts, and it is in this phase where module-level code, together with calls to def is being evaluated.

Module-level friendly macros

This has some important consequences on our own macros. For example, our deftraceable macro could also be invoked dynamically. However, this currently won’t work:

iex(1)> defmodule Tracer do ... end

iex(2)> defmodule Test do
          import Tracer

          fsm = [
            running: {:pause, :paused},
            running: {:stop, :stopped},
            paused: {:resume, :running}
          ]

          for {state, {action, next_state}} <- fsm do
            # Using deftraceable dynamically
            deftraceable unquote(action)(unquote(state)), do: unquote(next_state)
          end
          deftraceable initial, do: :running
        end

** (MatchError) no match of right hand side value: :error
    expanding macro: Tracer.deftraceable/2
    iex:13: Test (module)

This falls with a somewhat cryptic and not very helpful error. So what went wrong? As mentioned in previous section, macros are expanded before in-place module evaluation starts. For us this means that deftraceable is called before the outer for comprehension is even evaluated.

Consequently, even though it is invoked from a comprehension, deftraceable will be invoked exactly once. Moreover, since comprehension is not yet evaluated, inner variables state, action, and next_state are not present when our macro is called.

How can this even work? Essentially, our macro will be called with quoted unquote - head and body will contain ASTs that represents unquote(action)(unquote(state)) and unquote(next_state) respectively.

Now, recall that in the current version of deftraceable, we make some assumptions about input in our macro. Here’s a sketch:

defmacro deftraceable(head, body) do
  # Here, we are assuming how the input head looks like, and perform some
  # AST transformations based on those assumptions.

  quote do
    ...
  end
end

And that’s our problem. If we call deftraceable dynamically, while generating the code in-place, then such assumptions no longer hold.

Deferring code generation

When it comes to macro execution, it’s important to distinguish between the macro context and the caller’s context:

defmacro my_macro do
  # Macro context: the code here is a normal part of the macro, and runs when
  # the macro is invoked.

  quote do
    # Caller's context: generated code that runs in place where the macro is
    # invoked.
  end

This is where things get a bit tricky. If we want to support module-level dynamic calls of our macros, we shouldn’t assume anything in the macro context. Instead, we should defer the code generation to the caller’s context.

To say it in code:

defmacro deftraceable(head, body) do
  # Macro context: we shouldn't assume anything about the input AST here

  quote do
    # Caller's context: we should transfer input AST here, and then make our
    # assumptions here.
  end
end

Why can we make assumptions in the caller’s context? Because this code will run after all macros have been expanded. For example, remember that even though our macro is invoked from inside a comprehension, it will be called only once. However, the code generated by our macro will run in the comprehension - once for each element.

So this approach amounts to deferring the final code generation. Instead of immediately generating the target code, we generate intermediate module-level statements that will generate the final code. These intermediate statements will run at the latest possible moment of expansion, after all other macros have been resolved:

defmodule Test do
  ...

  for {state, {action, next_state}} <- fsm do
    # After deftraceable is expanded, here we'll get a plain code that
    # generates target function. This code will be invoked once for
    # every step of the for comprehension. At this point, we're in the
    # caller's context, and have an access to state, action, and next_state
    # variables and can properly generate corresponding function.
  end

  ...
end

Before implementing the solution, it’s important to note that this is not a universal pattern, and you should consider whether you really need this approach.

If your macro is not meant to be used on a module-level, then you should probably avoid this technique. Otherwise, if your macro is called from inside function definition, and you move the generation to the caller’s context, you’ll essentially move the code execution from compile-time to run-time, which can affect performance.

Moreover, even if your macro is running on a module-level, this technique won’t be necessary as long as you don’t make any assumptions about the input. For example, in part 2, we made a simulation of Plug’s get macro:

defmacro get(route, body) do
  quote do
    defp do_match("GET", unquote(route), var!(conn)) do
      unquote(body[:do])
    end
  end
end

Even though this macro works on a module-level it doesn’t assume anything about the format of the AST, simply injecting input fragments in the caller’s context, sprinkling some boilerplate around. Of course, we’re expecting here that body will have a :do option, but we’re not assuming anything about the specific shape and format of body[:do] AST.

To recap, if your macro is meant to be called on a module-level, this could be the general pattern:

defmacro ...
  # Macro context:
  # Feel free to do any preparations here, as long as you don't assume anything
  # about the shape of the input AST

  quote do
    # Caller's context:
    # If you're analyzing and/or transforming input AST you should do it here.
  end

Since the caller context is module-level, this deferred transformation will still take place in compilation time, so there will be no runtime performance penalties.

The solution

Given this discussion, the solution is relatively simple, but explaining it is fairly involved. So I’m going to start by showing you the end result (pay attention to comments):

defmodule Tracer do
  defmacro deftraceable(head, body) do
    # This is the most important change that allows us to correctly pass
    # input AST to the caller's context. I'll explain how this works a
    # bit later.
    quote bind_quoted: [
      head: Macro.escape(head, unquote: true),
      body: Macro.escape(body, unquote: true)
    ] do
      # Caller's context: we'll be generating the code from here

      # Since the code generation is deferred to the caller context,
      # we can now make our assumptions about the input AST.

      # This code is mostly identical to the previous version
      #
      # Notice that these variables are now created in the caller's context.
      {fun_name, args_ast} = Tracer.name_and_args(head)
      {arg_names, decorated_args} = Tracer.decorate_args(args_ast)

      # Completely identical to the previous version.
      head = Macro.postwalk(head,
        fn
          ({fun_ast, context, old_args}) when (
            fun_ast == fun_name and old_args == args_ast
          ) ->
            {fun_ast, context, decorated_args}
          (other) -> other
      end)

      # This code is completely identical to the previous version
      # Note: however, notice that the code is executed in the same context
      # as previous three expressions.
      #
      # Hence, the unquote(head) here references the head variable that is
      # computed in this context, instead of macro context. The same holds for
      # other unquotes that are occuring in the function body.
      #
      # This is the point of deferred code generation. Our macro generates
      # this code, which then in turn generates the final code.
      def unquote(head) do
        file = __ENV__.file
        line = __ENV__.line
        module = __ENV__.module

        function_name = unquote(fun_name)
        passed_args = unquote(arg_names) |> Enum.map(&inspect/1) |> Enum.join(",")

        result = unquote(body[:do])

        loc = "#{file}(line #{line})"
        call = "#{module}.#{function_name}(#{passed_args}) = #{inspect result}"
        IO.puts "#{loc} #{call}"

        result
      end
    end
  end

  # Identical to the previous version, but functions are exported since they
  # must be called from the caller's context.
  def name_and_args({:when, _, [short_head | _]}) do
    name_and_args(short_head)
  end

  def name_and_args(short_head) do
    Macro.decompose_call(short_head)
  end

  def decorate_args([]), do: {[],[]}
  def decorate_args(args_ast) do
    for {arg_ast, index} <- Enum.with_index(args_ast) do
      arg_name = Macro.var(:"arg#{index}", __MODULE__)

      full_arg = quote do
        unquote(arg_ast) = unquote(arg_name)
      end

      {arg_name, full_arg}
    end
    |> Enum.unzip
  end
end

Let’s try the macro:

iex(1)> defmodule Tracer do ... end

iex(2)> defmodule Test do
          import Tracer

          fsm = [
            running: {:pause, :paused},
            running: {:stop, :stopped},
            paused: {:resume, :running}
          ]

          for {state, {action, next_state}} <- fsm do
            deftraceable unquote(action)(unquote(state)), do: unquote(next_state)
          end
          deftraceable initial, do: :running
        end

iex(3)> Test.initial |> Test.pause |> Test.resume |> Test.stop

iex(line 15) Elixir.Test.initial() = :running
iex(line 13) Elixir.Test.pause(:running) = :paused
iex(line 13) Elixir.Test.resume(:paused) = :running
iex(line 13) Elixir.Test.stop(:running) = :stopped

As you can see, the change is not very complicated. We managed to keep most of our code intact, though we had to do some trickery with quote bind_quoted: true and Macro.escape:

quote bind_quoted: [
  head: Macro.escape(head, unquote: true),
  body: Macro.escape(body, unquote: true)
] do
  ...
end

Let’s take a closer look at what does it mean.

bind_quoted

Remember that our macro is generating a code that will generate the final code. Somewhere in the first-level generated code (the one returned by our macro), we need to place the following expression:

def unquote(head) do ... end

This expression will be invoked in the caller’s context (the client module), and its task is to generate the function. As mentioned in comments, it’s important to understand that unquote(head) here references the head variable that exists in the caller’s context. We’re not injecting a variable from the macro context, but the one that exists in the caller’s context.

However, we can’t generate such expression with plain quote:

quote do
  def unquote(head) do ... end
end

Remember how unquote works. It injects the AST that is in the head variable in place of the unquote call. This is not what we want here. What we want is to generate the AST representing the call to unquote which will then be executed later, in the caller’s context, and reference the caller’s head variable.

This can be done by providing unquote: false option:

quote unquote: false do
  def unquote(head) do ... end
end

Here, we will generate the code that represents unquote call. If this code is injected in proper place, where variable head exists, we’ll end up calling the def macro, passing whatever is in the head variable.

So it seems that unquote: false is what we need, but there is a downside that we can’t access any variable from the macro context:

foo = :bar
quote unquote: false do
  unquote(foo)    # <- won't work because of unquote: false
end

Using unquote: false effectively blocks immediate AST injection, and treats unquote as any other function call. Consequently, we can’t inject something into the target AST. And here’s where bind_quoted comes in handy. By providing bind_quoted: bindings we can disable immediate unquoting, while still binding whatever data we want to transfer to the caller’s context:

quote bind_quoted: [
  foo: ...,
  bar: ...
] do
  unquote(whatever)  # <- works like with unquote: false

  foo  # <- accessible due to bind_quoted
  bar  # <- accessible due to bind_quoted
end

Injecting the code vs transferring data

Another problem we’re facing is that the contents we’re passing from the macro to the caller’s context is by default injected, rather then transferred. So, whenever you do unquote(some_ast), you’re injecting one AST fragment into another one you’re building with a quote expression.

Occasionally, we want to transfer the data, instead of injecting it. Let’s see an example. Say we have some triplet, we want to transfer to the caller’s context

iex(1)> data = {1, 2, 3}
{1, 2, 3}

Now, let’s try to transfer it using typical unquote:

iex(2)> ast = quote do IO.inspect(unquote(data)) end
{{:., [], [{:__aliases__, [alias: false], [:IO]}, :inspect]}, [], [{1, 2, 3}]}

This seems to work. Let’s try and eval the resulting ast:

iex(3)> Code.eval_quoted(ast)
** (CompileError) nofile: invalid quoted expression: {1, 2, 3}

So what happened here? The thing is that we didn’t really transfer our {1,2,3} triplet. Instead, we injected it into the target AST. Injection means, that {1,2,3} is itself treated as an AST fragment, which is obviously wrong.

What we really want in this case is data transfer. In the code generation context, we have some data that we want to transfer to the caller’s context. And this is where Macro.escape helps. By escaping a term, we can make sure that it is transferred rather than injected. When we call unquote(Macro.escape(term)), we’ll inject an AST that describes the data in term.

Let’s try this out:

iex(3)> ast = quote do IO.inspect(unquote(Macro.escape(data))) end
{{:., [], [{:__aliases__, [alias: false], [:IO]}, :inspect]}, [],
 [{:{}, [], [1, 2, 3]}]}

iex(4)> Code.eval_quoted(ast)
{1, 2, 3}

As you can see, we were able to transfer the data untouched.

Going back to our deferred code generation, this is exactly what we need. Instead of injecting into the target AST, we want to transfer the input AST, completely preserving its shape:

defmacro deftraceable(head, body) do
  # Here we have head and body AST
  quote do
    # We need that same head and body AST here, so we can generate
    # the final code.
  end
end

By using Macro.escape/1 we can ensure that input AST is transferred untouched back to the caller’s context where we’ll generate the final code.

As discussed in previous section, we’re using bind_quoted, but the same principle holds:

quote bind_quoted: [
  head: Macro.escape(head, unquote: true),
  body: Macro.escape(body, unquote: true)
] do
  # Here we have exact data copies of head and body from
  # the macro context.
end

Escaping and unquote: true

Notice a deceptively simple unquote: true option that we pass to Macro.escape. This is the hardest thing to explain here. To be able to understand it, you must be confident about how AST is passed to the macro, and returned back to the caller’s context.

First, remember how we call our macro:

deftraceable unquote(action)(unquote(state)) do ... end

Now, since macro actually receives its arguments quoted, the head argument will be equivalent to following:

# This is what the head argument in the macro context actually contains
quote unquote: false do
  unquote(action)(unquote(state))
end

Remember that Macro.escape preserves data, so when you transfer a variable in some other AST, the contents remains unchanged. Given the shape of the head above, this is the situation we’ll end up with after our macro is expanded:

# Caller's context
for {state, {action, next_state}} <- fsm do
  # Here is our code that generates function. Due to bind_quoted, here
  # we have head and body variables available.

  # Variable head is equivalent to
  #   quote unquote: false do
  #     unquote(action)(unquote(state))
  #   end

  # What we really need is for head to be equivalent to:
  #   quote do
  #     unquote(action)(unquote(state))
  #   end
end

Why do we need the second form of quoted head? Because this AST is now shaped in the caller’s context, where we have action and state variables available. And the second expression will use the contents of these variables.

And this is where unquote: true option helps. When we call Macro.escape(input_ast, unquote: true), we’ll still (mostly) preserve the shape of the transferred data, but the unquote fragments (e.g. unquote(action)) in the input AST will be resolved in the caller’s context.

So to recap, a proper transport of the input AST to the caller’s context looks like this:

defmacro deftraceable(head, body) do
  quote bind_quoted: [
    head: Macro.escape(head, unquote: true),
    body: Macro.escape(body, unquote: true)
  ] do
    # Generate the code here
  end
  ...
end

This wasn’t so hard, but it takes some time grokking what exactly happens here. Try to make sure you’re not just blindly doing escapes (and/or unquote: true) without understanding that this is what you really want. After all, there’s a reason this is not a default behavior.

When writing a macro, think about whether you want to inject some AST, or transport the data unchanged. In the latter case, you need Macro.escape. If the data being transferred is an AST that might contain unquote fragments, then you probably need to use Macro.escape with unquote: true.

Recap

This concludes the series on Elixir macros. I hope you found these articles interesting and educating, and that you have gained more confidence and understanding of how macros work.

Always remember - macros amount to plain composition of AST fragments during expansion phase. If you understand the caller’s context and macro inputs, it shouldn’t be very hard to perform the transformations you want either directly, or by deferring when necessary.

This series has by no means covered all possible aspects and nuances. If you want to learn more, a good place to start is the documentation for quote/2 special form. You’ll also find some useful helpers in the Macro and Code module.

Happy meta-programming!

Understanding Elixir Macros, Part 5 - Reshaping the AST

Sun, 29 Jun 14 00:00:00 +0000

Understanding Elixir Macros, Part 5 - Reshaping the AST

2014-06-29

Last time I presented a basic version of deftraceable macro that allows us to write traceable functions. The final version of the macro has some remaining issues, and today we’ll tackle one of those - arguments pattern matching.

Today’s exercise should demonstrate that we have to carefully consider our assumptions about possible inputs to our macros can receive.

The problem

As I hinted the last time, the current version of deftraceable doesn’t work with pattern matched arguments. Let’s demonstrate the problem:

iex(1)> defmodule Tracer do ... end

iex(2)> defmodule Test do
          import Tracer

          deftraceable div(_, 0), do: :error
        end
** (CompileError) iex:5: unbound variable _

So what happened? The deftraceable macro blindly assumes that input arguments are plain variables or constants. Hence, when you call deftracable div(a, b), do: … the generated code will contain:

passed_args = [a, b] |> Enum.map(&inspect/1) |> Enum.join(",")

This will work as expected, but if one argument is an anonymous variable (_), then we generate the following code:

passed_args = [_, 0] |> Enum.map(&inspect/1) |> Enum.join(",")

This is obviously not correct, and therefore we get the unbound variable error.

So what’s the solution? We shouldn’t assume anything about input arguments. Instead, we should take each argument into a dedicated variable generated by the macro. Or to say it with code, if our macro is called with:

deftraceable fun(pattern1, pattern2, ...)

We should generate the function head:

def fun(pattern1 = arg1, pattern2 = arg2, ...)

This allows us to take argument values into our internal temp variables, and print the contents of those variables.

The solution

So let’s implement this. First, I’m going to show you the top-level sketch of the solution:

defmacro deftraceable(head, body) do
  {fun_name, args_ast} = name_and_args(head)

  # Decorates input args by adding "= argX" to each argument.
  # Also returns a list of argument names (arg1, arg2, ...)
  {arg_names, decorated_args} = decorate_args(args_ast)

  head = ??   # Replace original args with decorated ones

  quote do
    def unquote(head) do
      ... # unchanged

      # Use temp variables to make a trace message
      passed_args = unquote(arg_names) |> Enum.map(&inspect/1) |> Enum.join(",")

      ... # unchanged
    end
  end
end

First, we extract name and args from the head (we resolved this in previous article). Then we have to inject = argX into the args_ast and take back the modified args (which we’ll put into decorated_args).

We also need pure names of generated variables (or more exactly their AST), since we’ll use these to collect argument values. The variable arg_names will essentially contain quote do [arg_1, arg_2, …] end which can be easily injected into the tree.

So let’s implement the rest. First, let’s see how we can decorate arguments:

defp decorate_args(args_ast) do
  for {arg_ast, index} <- Enum.with_index(args_ast) do
    # Dynamically generate quoted identifier
    arg_name = Macro.var(:"arg#{index}", __MODULE__)

    # Generate AST for patternX = argX
    full_arg = quote do
      unquote(arg_ast) = unquote(arg_name)
    end

    {arg_name, full_arg}
  end
  |> Enum.unzip
end

Most of the action takes place in the for comprehension. Essentially we go through input AST fragment of each variable, and compute the temp name (quoted argX) relying on the Macro.var/2 function which can transform an atom into a quoted variable that has a name of that atom. The second argument to Macro.var/2 ensures that the variable is hygienic. Although we’ll inject arg1, arg2, … variables into the caller context, the caller won’t see these variables. In fact, a user of deftraceable can freely use these names for some local variables without interfering with temps introduced by our macro.

Finally, at the end of the comprehension we return a tuple consisting of the temp’s name, and the quoted full pattern - (e.g. _ = arg1, or 0 = arg2). The little dance after the comprehension with unzip and to_tuple ensures that decorate_args returns the result in form of {arg_names, decorated_args}.

With decorate_args helper ready we can pass input arguments, and get decorated ones, together with the names of temp variables. Now we need to inject these decorated arguments into the head of the function, in place of the original arguments. In particular, we must perform following steps:

Walk recursively through the AST of the input function head.
Find the place where function name and arguments are specified.
Replace original (input) arguments with the AST of decorated arguments

This task can be reasonably simplified if we rely on Macro.postwalk/2 function:

defmacro deftraceable(head, body) do
  {fun_name, args_ast} = name_and_args(head)

  {arg_names, decorated_args} = decorate_args(args_ast)

  # 1. Walk recursively through the AST
  head = Macro.postwalk(
    head,

    # This lambda is called for each element in the input AST and
    # has a chance of returning alternative AST
    fn
      # 2. Pattern match the place where function name and arguments are
      # specified
      ({fun_ast, context, old_args}) when (
        fun_ast == fun_name and old_args == args_ast
      ) ->
        # 3. Replace input arguments with the AST of decorated arguments
        {fun_ast, context, decorated_args}

      # Some other element in the head AST (probably a guard)
      #   -> we just leave it unchanged
      (other) -> other
    end
  )

  ... # unchanged
end

Macro.postwalk/2 walks the AST recursively, and calls the provided lambda for each node, after all of the node’s descendants have been visited. The lambda receives the AST of the element, and there we have a chance of returning something else instead of that node.

So what we do in this lambda is basically a pattern match where we’re looking for the {fun_name, context, args}. As explained in part 3, this is the quoted representation of the expression some_fun(arg1, arg2, …). Once we encounter the node that matches this pattern, we just replace input arguments with new (decorated) ones. In all other cases, we simply return the input AST, leaving the rest of the tree unchanged.

This is somewhat convoluted, but it solves our problem. Here’s the final version of the trace macro:

defmodule Tracer do
  defmacro deftraceable(head, body) do
    {fun_name, args_ast} = name_and_args(head)

    {arg_names, decorated_args} = decorate_args(args_ast)

    head = Macro.postwalk(head,
      fn
        ({fun_ast, context, old_args}) when (
          fun_ast == fun_name and old_args == args_ast
        ) ->
          {fun_ast, context, decorated_args}
        (other) -> other
      end)

    quote do
      def unquote(head) do
        file = __ENV__.file
        line = __ENV__.line
        module = __ENV__.module

        function_name = unquote(fun_name)
        passed_args = unquote(arg_names) |> Enum.map(&inspect/1) |> Enum.join(",")

        result = unquote(body[:do])

        loc = "#{file}(line #{line})"
        call = "#{module}.#{function_name}(#{passed_args}) = #{inspect result}"
        IO.puts "#{loc} #{call}"

        result
      end
    end
  end

  defp name_and_args({:when, _, [short_head | _]}) do
    name_and_args(short_head)
  end

  defp name_and_args(short_head) do
    Macro.decompose_call(short_head)
  end

  defp decorate_args([]), do: {[],[]}
  defp decorate_args(args_ast) do
    for {arg_ast, index} <- Enum.with_index(args_ast) do
      # dynamically generate quoted identifier
      arg_name = Macro.var(:"arg#{index}", __MODULE__)

      # generate AST for patternX = argX
      full_arg = quote do
        unquote(arg_ast) = unquote(arg_name)
      end

      {arg_name, full_arg}
    end
    |> Enum.unzip
  end
end

Let’s try it out:

iex(1)> defmodule Tracer do ... end

iex(2)> defmodule Test do
          import Tracer

          deftraceable div(_, 0), do: :error
          deftraceable div(a, b), do: a/b
        end

iex(3)> Test.div(5, 2)
iex(line 6) Elixir.Test.div(5,2) = 2.5

iex(4)> Test.div(5, 0)
iex(line 5) Elixir.Test.div(5,0) = :error

As you can see, it’s possible, and not extremely complicated, to get into the AST, tear it apart, and sprinkle it with some custom injected code. On the downside, the code of the resulting macro gets increasingly complex, and it becomes harder to analyze.

This concludes today’s session. Next time I’m going to discuss some aspects of in-place code generation.

Understanding Elixir Macros, Part 4 - Diving Deeper

Mon, 23 Jun 14 00:00:00 +0000

Understanding Elixir Macros, Part 4 - Diving Deeper

2014-06-23

In previous installment, I’ve shown you some basic ways of analyzing input AST and doing something about it. Today we’ll take a look at some more involved AST transformations. This will mostly be a rehash of already explained techniques. The aim is to show that it’s not very hard to go deeper into the AST, though the resulting code can easily become fairly complex and somewhat hacky.

Tracing function calls

In this article, we’ll create a deftraceable macro that allows us to define traceable functions. A traceable function works just like a normal function, but whenever we call it, a debug information is printed. Here’s the idea:

defmodule Test do
  import Tracer

  deftraceable my_fun(a,b) do
    a/b
  end
end

Test.my_fun(6,2)

# => test.ex(line 4) Test.my_fun(6,2) = 3

This example is of course contrived. You don’t need to devise such macro, because Erlang already has very powerful tracing capabilities, and there’s an Elixir wrapper available. However, the example is interesting because it will demand some deeper AST transformations and techniques.

Before starting, I’d like to mention again that you should carefully consider whether you really need such constructs. Macros such as deftraceable introduce another thing every code maintainer needs to understand. Looking at the code, it’s not obvious what happens behind the scene. If everyone devises such constructs, each Elixir project will quickly turn into a soup of custom language extentions. It will be hard even for experienced developers to understand the flow of the underlying code that heavily relies on complex macros.

All that said, there will be cases suitable for macros, so you shouldn’t avoid them just because someone claims that macros are bad. For example, if we didn’t have tracing facilities in Erlang, we’d need to devise some kind of a macro to help us with it (not necesarilly similar to the example above, but that’s another discussion), or our code would suffer from large boilerplate.

In my opinion, boilerplate is bad because the code becomes ridden with bureaucratic noise, and therefore it is harder to read and understand. Macros can certainly help in reducing crust, but before reaching for them, consider whether you can resolve duplication with run-time constructs (functions, modules, protocols).

With that long disclaimer out of the way, let’s write deftraceable. First, it’s worth manually generating the corresponding code.

Let’s recall the usage:

deftraceable my_fun(a,b) do
  a/b
end

The generated code should look like:

def my_fun(a, b) do
  file = __ENV__.file
  line = __ENV__.line
  module = __ENV__.module
  function_name = "my_fun"
  passed_args = [a,b] |> Enum.map(&inspect/1) |> Enum.join(",")

  result = a/b

  loc = "#{file}(line #{line})"
  call = "#{module}.#{function_name}(#{passed_args}) = #{inspect result}"
  IO.puts "#{loc} #{call}"

  result
end

The idea is simple. We fetch various data from the compiler environment, then compute the result, and finally print everything to the screen.

The code relies on __ENV__ special form that can be used to inject all sort of compile-time informations (e.g. line number and file) in the final AST. __ENV__ is a struct and whenever you use it in the code, it will be expanded in compile time to appropriate value. Hence, wherever in code we write __ENV__.file the resulting bytecode will contain the (binary) string constant with the containing file name.

Now we need to build this code dynamically. Let’s see the basic outline:

defmacro deftraceable(??) do
  quote do
    def unquote(head) do
      file = __ENV__.file
      line = __ENV__.line
      module = __ENV__.module
      function_name = ??
      passed_args = ?? |> Enum.map(&inspect/1) |> Enum.join(",")

      result = ??

      loc = "#{file}(line #{line})"
      call = "#{module}.#{function_name}(#{passed_args}) = #{inspect result}"
      IO.puts "#{loc} #{call}"

      result
    end
  end
end

Here I placed question marks (??) in places where we need to dynamically inject AST fragments, based on the input arguments. In particular, we have to deduce function name, argument names, and function body from the passed parameters.

Now, when we call a macro deftraceable my_fun(…) do … end, the macro receives two arguments - the function head (function name and argument list) and a keyword list containing the function body. Both of these will of course be quoted.

How do I know this? I actually don’t. I usually gain this knowledge by trial and error. Basically, I start by defining a macro:

defmacro deftraceable(arg1) do
  IO.inspect arg1
  nil
end

Then I try to call the macro from some test module or from the shell. If the argument numbers are wrong, an error will occur, and I’ll retry by adding another argument to the macro definition. Once I get the result printed, I try to figure out what arguments represent, and then start building the macro.

The nil at the end of the macro ensures we don’t generate anything (well, we generate nil which is usually irrelevant to the caller code). This allows me to further compose fragments without injecting the code. I usually rely on IO.inspect and Macro.to_string/1 to verify intermediate results, and once I’m happy, I remove the nil part and see if the thing works.

In our case deftraceable receives the function head and the body. The function head will be an AST fragment in the format I’ve described last time ({function_name, context, [arg1, arg2, …]).

So we need to do following:

Extract function name and arguments from the quoted head
Inject these values into the AST we’re returning from the macro
Inject function body into that same AST
Print trace info

We could use pattern matching to extract function name and arguments from this AST fragment, but as it turns out there is a helper Macro.decompose_call/1 that does exactly this. Given these steps, the final version of the macro looks like this:

defmodule Tracer do
  defmacro deftraceable(head, body) do
    # Extract function name and arguments
    {fun_name, args_ast} = Macro.decompose_call(head)

    quote do
      def unquote(head) do
        file = __ENV__.file
        line = __ENV__.line
        module = __ENV__.module

        # Inject function name and arguments into AST
        function_name = unquote(fun_name)
        passed_args = unquote(args_ast) |> Enum.map(&inspect/1) |> Enum.join(",")

        # Inject function body into the AST
        result = unquote(body[:do])

        # Print trace info"
        loc = "#{file}(line #{line})"
        call = "#{module}.#{function_name}(#{passed_args}) = #{inspect result}"
        IO.puts "#{loc} #{call}"

        result
      end
    end
  end
end

Let’s try it out:

iex(1)> defmodule Tracer do ... end

iex(2)> defmodule Test do
          import Tracer

          deftraceable my_fun(a,b) do
            a/b
          end
        end

iex(3)> Test.my_fun(10,5)
iex(line 4) Test.my_fun(10,5) = 2.0   # trace output
2.0

It seems to be working. However, I should immediately point out that there are a couple of problems with this implementation:

The macro doesn’t handle guards well
Pattern matching arguments will not always work (e.g. when using _ to match any term)
The macro doesn’t work when dynamically generating code directly in the module.

I’ll explain each of these problems one by one, starting with guards, and leaving remaining issues for future articles.

Handling guards

All problems with deftraceable stem from the fact that we’re making some assumptions about the input AST. That’s a dangerous teritory, and we must be careful to cover all cases.

For example, the macro assumes that head contains just the name and the arguments list. Consequently, deftraceable won’t work if we want to define a traceable function with guards:

deftraceable my_fun(a,b) when a < b do
  a/b
end

In this case, our head (the first argument of the macro) will also contain the guard information, and will not be parsable by Macro.decompose_call/1 The solution is to detect this case, and handle it in a special way.

First, let’s discover how this head is quoted:

iex(1)> quote do my_fun(a,b) when a < b end
{:when, [],
 [{:my_fun, [], [{:a, [], Elixir}, {:b, [], Elixir}]},
  {:<, [context: Elixir, import: Kernel],
   [{:a, [], Elixir}, {:b, [], Elixir}]}]}

So essentially, our guard head has the shape of {:when, _, [name_and_args, …]}. We can rely on this to extract the name and arguments using pattern matching:

defmodule Tracer do
  ...
  defp name_and_args({:when, _, [short_head | _]}) do
    name_and_args(short_head)
  end

  defp name_and_args(short_head) do
    Macro.decompose_call(short_head)
  end
  ...

And of course, we need to call this function from the macro:

defmodule Tracer do
  ...
  defmacro deftraceable(head, body) do
    {fun_name, args_ast} = name_and_args(head)

    ... # unchanged
  end
  ...
end

As you can see, it’s possible to define additional private functions and call them from your macro. After all, a macro is just a function, and when it is called, the containing module is already compiled and loaded into the VM of the compiler (otherwise, macro couldn’t be running).

Here’s the full version of the macro:

defmodule Tracer do
  defmacro deftraceable(head, body) do
    {fun_name, args_ast} = name_and_args(head)

    quote do
      def unquote(head) do
        file = __ENV__.file
        line = __ENV__.line
        module = __ENV__.module

        function_name = unquote(fun_name)
        passed_args = unquote(args_ast) |> Enum.map(&inspect/1) |> Enum.join(",")

        result = unquote(body[:do])

        loc = "#{file}(line #{line})"
        call = "#{module}.#{function_name}(#{passed_args}) = #{inspect result}"
        IO.puts "#{loc} #{call}"

        result
      end
    end
  end

  defp name_and_args({:when, _, [short_head | _]}) do
    name_and_args(short_head)
  end

  defp name_and_args(short_head) do
    Macro.decompose_call(short_head)
  end
end

Let’s try it out:

iex(1)> defmodule Tracer do ... end

iex(2)> defmodule Test do
          import Tracer

          deftraceable my_fun(a,b) when a<b do
            a/b
          end

          deftraceable my_fun(a,b) do
            a/b
          end
        end

iex(3)> Test.my_fun(5,10)
iex(line 4) Test.my_fun(5,10) = 0.5
0.5

iex(4)> Test.my_fun(10, 5)
iex(line 7) Test.my_fun(10,5) = 2.0

The main point of this exercise was to illustrate that it’s possible to deduce something from the input AST. In this example, we managed to detect and handle a function guard. Obviously, the code becomes more involved, since it relies on the internal structure of the AST. In this case, the code is relatively simple, but as you’ll see in future articles, where I’ll tackle remaining problems of deftraceable, things can quickly become messy.

Understanding Elixir Macros, Part 3 - Getting into the AST

Sun, 15 Jun 14 00:00:00 +0000

Understanding Elixir Macros, Part 3 - Getting into the AST

2014-06-15

It’s time to continue our exploration of Elixir macros. Last time I’ve covered some essential theory, and today, I’ll step into a less documented territory, and discuss some details on Elixir AST.

Tracing function calls

So far you have seen only basic macros that take input AST fragments and combine them together, sprinkling some additional boilerplate around and/or between input fragments. Since we don’t analyze or parse the input AST, this is probably the cleanest (or the least hackiest) style of macro writing, which results in fairly simple macros that are reasonably easy to understand.

However, in some cases we will need to parse input AST fragments to get some specific informations. A simple example are ExUnit assertions. For example, the expression assert 1+1 == 2+2 will fail with an error:

Assertion with == failed
code: 1+1 == 2+2
lhs:  1
rhs:  2

The macro assert accepts the entire expression 1+1 == 2+2 and is able to extract individual sub-expressions of the comparison, printing their corresponding results if the entire expression returns false. To do this, the macro code must somehow split the input AST into separate parts and compute each sub-expression separately.

In more involved cases even richer AST transformations are called for. For example, with ExActor you can write this code:

defcast inc(x), state: state, do: new_state(state + x)

which translates to roughly the following:

def inc(pid, x) do
  :gen_server.cast(pid, {:inc, x})
end

def handle_cast({:inc, x}, state) do
  {:noreply, state+x}
end

Just like assert, the defcast macro needs to dive into the input AST fragment and detect individual sub-fragments (e.g. function name, individual arguments). Then, ExActor performs an elaborate transformation, reassembling this sub-parts into a more complex code.

Today, I’m going to show you some basic techniques of building such macros, and I’ll continue with more complex transformations in subsequent articles. But before doing this, I should advise you to carefully consider whether your code needs to be based on macros. Though very powerful, macros have some downsides.

First, as you’ll see in this series, the code can quickly become much more involved than “plain” run-time abstractions. You can quickly end up doing many nested quote/unquote calls and weird pattern matches that rely on undocumented format of the AST.

In addition, proliferation of macros may make your client code extremly cryptic, since it will rely on custom, non-standard idioms (such as defcast from ExActor). It can become harder to reason about the code, and understand what exactly happens underneath.

On the plus side, macros can be very helpful when removing boilerplate (as hopefully ExActor example demonstrated), and have the power of accessing information that is not available at run-time (as you should see from the assert example). Finally, since they run during compilation, macros make it possible to optimize some code by moving calculations to compile-time.

So there will definitely be cases that are suited for macros, and you shouldn’t be afraid of using them. However, you shouldn’t choose macros only to gain some cute DSL-ish syntax. Before reaching for macros, you should consider whether your problem can be solved efficiently in run-time, relying on “standard” language abstractions such as functions, modules, and protocols.

Discovering the AST structure

At the moment of writing this there is very little documentation on the AST structure. However, it’s easy to explore and play with AST in the shell session, and this is how I usually discover the AST format.

For example, here’s how a quoted reference to a variable looks like:

iex(1)> quote do my_var end
{:my_var, [], Elixir}

Here, the first element represents the name of the variable. The second element is a context keyword list that contains some metadata specific for this particular AST fragment (e.g. imports and aliases). Most often you won’t be interested in context data. The third element usually represents the module where the quoting happened, and is used to ensure hygiene of quoted variables. If this element is nil then the identifier is not hygienic.

A simple expression looks a bit more involved:

iex(2)> quote do a+b end
{:+, [context: Elixir, import: Kernel], [{:a, [], Elixir}, {:b, [], Elixir}]}

This might look scary, but it’s reasonably easy to understand if I show you the higher-level pattern:

{:+, context, [ast_for_a, ast_for_b]}

In our example, ast_for_a and ast_for_b follow the shape of a variable reference you’ve seen earlier (e.g. {:a, [], Elixir}. More generally, quoted arguments can be arbitrary complex since they describe the expression of each argument. Essentially, AST is a deep nested structure of simple quoted expressions such as the ones I’m showing you here.

Let’s take a look at a function call:

iex(3)> quote do div(5,4) end
{:div, [context: Elixir, import: Kernel], [5, 4]}

This resembles the quoted + operation, which shouldn’t come as a surprise knowing that + is actually a function. In fact, all binary operators will be quoted as function calls.

Finally, let’s take a look at a quoted function definition:

iex(4)> quote do def my_fun(arg1, arg2), do: :ok end
{:def, [context: Elixir, import: Kernel],
 [{:my_fun, [context: Elixir], [{:arg1, [], Elixir}, {:arg2, [], Elixir}]},
  [do: :ok]]}

While this looks scary, it can be simplified by looking at important parts. Essentially, this deep structure amounts to:

{:def, context, [fun_call, [do: body]]}

with fun_call having the structure of a function call (which you’ve just seen).

As you can see, there usually is some reason and sense behind the AST. I won’t go through all possible AST shapes here, but the approach to discovery is to play in iex and quote simpler forms of expressions you’re interested in. This is a bit of reverse engineering, but it’s not exactly a rocket science.

Writing assert macro

For a quick demonstration, let’s write a simplified version of the assert macro. This is an interesting macro because it literally reinterprets the meaning of comparison operators. Normally, when you write a == b you get a boolean result. However, when this expression is given to the assert macro, a detailed output is printed if the expression evaluates to false.

I’ll start simple, by supporting only == operator in the macro. To recap, when we call assert expected == required, it’s the same as calling assert(expected == required), which means that our macro receives a quoted fragment that represents comparison. Let’s discover the AST structure of this comparison:

iex(1)> quote do 1 == 2 end
{:==, [context: Elixir, import: Kernel], [1, 2]}

iex(2)> quote do a == b end
{:==, [context: Elixir, import: Kernel], [{:a, [], Elixir}, {:b, [], Elixir}]}

So our structure is essentially, {:==, context, [quoted_lhs, quoted_rhs]}. This should not be surprising if you remember the examples shown in previous section, where I’ve mentioned that binary operators are quoted as two arguments function calls.

Knowing the AST shape, it’s relatively simple to write the macro:

defmodule Assertions do
  defmacro assert({:==, _, [lhs, rhs]} = expr) do
    quote do
      left = unquote(lhs)
      right = unquote(rhs)

      result = (left == right)

      unless result do
        IO.puts "Assertion with == failed"
        IO.puts "code: #{unquote(Macro.to_string(expr))}"
        IO.puts "lhs: #{left}"
        IO.puts "rhs: #{right}"
      end

      result
    end
  end
end

The first interesting thing happens in line 2. Notice how we pattern match on the input expression, expecting it to conform to some structure. This is perfectly fine, since macros are functions, which means you can rely on pattern matching, guards, and even have multi-clause macros. In our case, we rely on pattern matching to take each (quoted) side of the comparison expression into corresponding variables.

Then, in the quoted code, we reinterpret the == operation by computing left- and right-hand side individually, (lines 4 and 5), and then the entire result (line 7). Finally, if the result is false, we print detailed informations (lines 9-14).

Let’s try it out:

iex(1)> defmodule Assertions do ... end
iex(2)> import Assertions

iex(3)> assert 1+1 == 2+2
Assertion with == failed
code: 1 + 1 == 2 + 2
lhs: 2
rhs: 4

Generalizing the code

It’s not much harder to make the code work for other operators:

defmodule Assertions do
  defmacro assert({operator, _, [lhs, rhs]} = expr)
    when operator in [:==, :<, :>, :<=, :>=, :===, :=~, :!==, :!=, :in]
  do
    quote do
      left = unquote(lhs)
      right = unquote(rhs)

      result = unquote(operator)(left, right)

      unless result do
        IO.puts "Assertion with #{unquote(operator)} failed"
        IO.puts "code: #{unquote(Macro.to_string(expr))}"
        IO.puts "lhs: #{left}"
        IO.puts "rhs: #{right}"
      end

      result
    end
  end
end

There are only a couple of changes here. First, in the pattern-match, the hard-coded :== is replaced with the operator variable (line 2).

I’ve also introduced (or to be honest, copy-pasted from Elixir source) guards specifying the set of operators for which the macro works (line 3). There is a special reason for this check. Remember how I earlier mentioned that quoted a + b (and any other binary operation) has the same shape as quoted fun(a,b). Consequently, without these guards, every two-arguments function call would end up in our macro, and this is something we probably don’t want. Using this guard limits allowed inputs only to known binary operators.

The interesting thing happens in line 9. Here I make a simple generic dispatch to the operator using unquote(operator)(left, right). You might think that I could have instead used left unquote(operator) right, but this wouldn’t work. The reason is that operator variable holds an atom (e.g. :==). Thus, this naive quoting would produce left :== right, which is not even a proper Elixir syntax.

Keep in mind that while quoting, we don’t assemble strings, but AST fragments. So instead, when we want to generate a binary operation code, we need to inject a proper AST, which (as explained earlier) is the same as the two arguments function call. Hence, we can simply generate the function call unquote(operator)(left, right).

With this in mind, I’m going to finish today’s session. It was a bit shorter, but slightly more complex. Next time, I’m going to dive a bit deeper into the topic of AST parsing.

Understanding Elixir Macros, Part 2 - Micro Theory

Wed, 11 Jun 14 00:00:00 +0000

Understanding Elixir Macros, Part 2 - Micro Theory

2014-06-11

This is the second part of the mini-series on Elixir macros. Last time I discussed compilation phases and Elixir AST, finishing with a basic example of the trace macro. Today, I’ll provide a bit more details on macro mechanics.

This is going to involve repeating some of the stuff mentioned last time, but I think it’s beneficial to understand how things work and how the final AST is built. If you grasp this, you can reason about your macro code with more confidence. This becomes important, since more involved macros will consist of many combined quote/unquote constructs which can at first seem intimidating.

Calling a macro

The most important thing to be aware of is the expansion phase. This is where compiler calls various macros (and other code-generating constructs) to produce the final AST.

For example, a typical usage of the trace macro will look like this:

defmodule MyModule do
  require Tracer
  ...
  def some_fun(...) do
    Tracer.trace(...)
  end
end

As previously explained, the compiler starts with an AST that resembles this code. This AST is then expanded to produce the final code. Consequently, in the snippet above, the call to Tracer.trace/1 will take place in the expansion phase.

Our macro receives the input AST and must produce the output AST. The compiler will then simply replace the macro call with the AST returned from that macro. This process is incremental - a macro can return AST that will invoke some other macro (or even itself). The compiler will simply re-expand until there’s nothing left to expand.

A macro call is thus our opportunity to change the meaning of the code. A typical macro will take the input AST and somehow decorate it, adding some additional code around the input.

That’s exactly what we did in the trace macro. We took a quoted expression (e.g. 1+2) and spit out something like:

result = 1 + 2
Tracer.print("1 + 2", result)
result

To call the trace macro from any part of the code (including shell), you must invoke either require Tracer or import Tracer. Why is this? There are two seemingly contradicting properties of macros:

A macro is an Elixir code
A macro runs in expansion time, before the final bytecode is produced

How can Elixir code run before it is produced? It can’t. To call a macro, the container module (the module where the macro is defined) must already be compiled.

Consequently, to run macros defined in the Tracer module, we must ensure that it is already compiled. In other words, we must provide some hints to the compiler about the module ordering. When we require a module, we instruct the Elixir to hold the compilation of the current module until the required module is compiled and loaded into the compiler run-time (the Erlang VM instance where compiler is running). We can only call trace macro when the Tracer module is fully compiled, and available to the compiler.

Using import has the same effect but it additionally lexically imports all exported functions and macros, making it possible to write trace instead of Tracer.trace.

Since macros are functions and Elixir doesn’t require parentheses in function calls, we can use this syntax:

Tracer.trace 1+2

This is quite possibly the most important reason why Elixir doesn’t require parentheses in function calls. Remember that most language constructs are actually macros. If parentheses were obligatory, the code we’d have to write would be noisier:

defmodule(MyModule, do:
  def(function_1, do: ...)
  def(function_2, do: ...)
)

Hygiene

As hinted in the last article, macros are by default hygienic. This means that variables introduced by a macro are its own private affair that won’t interfere with the rest of the code. This is why we can safely introduce the result variable in our trace macro:

quote do
  result = unquote(expression_ast)  # result is private to this macro
  ...
end

This variable won’t interfere with the code that is calling the macro. In place where you call the trace macro, you can freely declare your own result variable, and it won’t be shadowed by the result from the tracer macro.

Most of the time hygiene is exactly what you want, but there are exceptions. Sometimes, you may need to create a variable that is available to the code calling the macro. Instead of devising some contrived example, let’s take a look at the real use case from the Plug library. This is how we can specify routes with Plug router:

get "/resource1" do
  send_resp(conn, 200, ...)
end

post "/resource2" do
  send_resp(conn, 200, ...)
end

Notice how in both snippets we use conn variable that doesn’t exist. This is possible because get macro binds this variable in the generated code. You can imagine that the resulting code is something like:

defp do_match("GET", "/resource1", conn) do
  ...
end

defp do_match("POST", "/resource2", conn) do
  ...
end

Note: the real code produced by Plug is somewhat different, this is just a simplification.

This is an example of a macro introducing a variable that must not be hygienic. The variable conn is introduced by the get macro, but must be visible to the code where the macro is called.

Another example is the situation I had with ExActor. Take a look a the following example:

defmodule MyServer do
  ...
  defcall my_request(...), do: reply(result)
  ...
end

If you’re familiar with GenServer then you know that the result of a call must be in form {:reply, response, state}. However, in the snippet above, the state is not even mentioned. So how can we return the non-mentioned state? This is possible, because defcall macro generates a hidden state variable, which is then implicitly used by the reply macro.

In both cases, a macro must create a variable that is not hygienic and must be visible beyond macro’s quoted code. For such purposes you can use var! construct. Here’s how a simple version of the Plug’s get macro could look like:

defmacro get(route, body) do
  quote do
    defp do_match("GET", unquote(route), var!(conn)) do
      # put body AST here
    end
  end
end

Notice how we use var!(conn). By doing this, we’re specifying that conn is a variable that must be visible to the caller.

In the snippet above, it’s not explained how the body is injected. Before doing so, you must understand a bit about arguments that macros receive.

Macro arguments

You should always keep in mind that macros are essentially Elixir functions that are invoked in expansion phase, while the final AST is being produced. The specifics of macros is that arguments being passed are always quoted. This is why we can call:

def my_fun do
  ...
end

Which is the same as:

def(my_fun, do: (...))

Notice how we’re calling the def macro, passing my_fun even when this variable doesn’t exist. This is completely fine, since we’re actually passing the result of quote(do: my_fun), and quoting doesn’t require that the variable exists. Internally, def macro will receive the quoted representation which will, among other things, contain :my_fun. The def macro will use this information to generate the function with the corresponding name.

Another thing I sort of skimmed over is the do…end block. Whenever you pass a do…end block to a macro, it is the same as passing a keywords list with a :do key.

So the call

my_macro arg1, arg2 do ... end

is the same as

my_macro(arg1, arg2, do: ...)

This is just a special syntactical sugar of Elixir. The parser transforms do..end into {:do, …}.

Now, I’ve just mentioned that arguments are quoted. However, for many constants (atoms, numbers, strings), the quoted representation is exactly the same as the input value. In addition, two element tuples and lists will retain their structure when quoted. This means that quote(do: {a,b}) will give a two element tuple, with both values being of course quoted.

Let’s illustrate this in a shell:

iex(1)> quote do :an_atom end
:an_atom

iex(2)> quote do "a string" end
"a string"

iex(3)> quote do 3.14 end
3.14

iex(4)> quote do {1,2} end
{1, 2}

iex(5)> quote do [1,2,3,4,5] end
[1, 2, 3, 4, 5]

In contrast, a quoted three element tuple doesn’t retain its shape:

iex(6)> quote do {1,2,3} end
{:{}, [], [1, 2, 3]}

Since lists and two element tuples retain their structure when quoted, the same holds for a keyword list:

iex(7)> quote do [a: 1, b: 2] end
[a: 1, b: 2]

iex(8)> quote do [a: x, b: y] end
[a: {:x, [], Elixir}, b: {:y, [], Elixir}]

In the first example, you can see that the input keyword list is completely intact. The second example proves that complex members (such as references to x and y) are quoted. But the list still retains its shape. It is still a keyword lists with keys :a and :b.

Putting it together

Why is all this important? Because in the macro code, you can easily retrieve the options from the keywords list, without analyzing some convoluted AST. Let’s see this in action on our oversimplified take on get macro. Earlier, we left with this sketch:

defmacro get(route, body) do
  quote do
    defp do_match("GET", unquote(route), var!(conn)) do
      # put body AST here
    end
  end
end

Remember that do…end is the same as do: … so when we call get route do … end, we’re effectively calling get(route, do: …). Keeping in mind that macro arguments are quoted, but also knowing that quoted keyword lists keep their shape, it’s possible to retrieve the quoted body in the macro using body[:do]:

defmacro get(route, body) do
  quote do
    defp do_match("GET", unquote(route), var!(conn)) do
      unquote(body[:do])
    end
  end
end

So we simply inject the quoted input body into the body of the do_match clause we’re generating.

As already mentioned, this is the purpose of a macro. It receives some AST fragments, and combines them together with the boilerplate code, to generate the final result. Ideally, when we do this, we don’t care about the contents of the input AST. In our example, we simply inject the body in the generated function, without caring what is actually in that body.

It is reasonably simple to test that this macro works. Here’s a bare minimum of the required code:

defmodule Plug.Router do
  # get macro removes the boilerplate from the client and ensures that
  # generated code conforms to some standard required by the generic logic
  defmacro get(route, body) do
    quote do
      defp do_match("GET", unquote(route), var!(conn)) do
        unquote(body[:do])
      end
    end
  end
end

Now we can implement a client module:

defmodule MyRouter do
  import Plug.Router

  # Generic code that relies on the multi-clause dispatch
  def match(type, route) do
    do_match(type, route, :dummy_connection)
  end

  # Using macro to minimize boilerplate
  get "/hello", do: {conn, "Hi!"}
  get "/goodbye", do: {conn, "Bye!"}
end

And test it:

MyRouter.match("GET", "/hello") |> IO.inspect
# {:dummy_connection, "Hi!"}

MyRouter.match("GET", "/goodbye") |> IO.inspect
# {:dummy_connection, "Bye!"}

The important thing to notice here is the code of match/2. This is the generic code that relies on the existence of the implementation of do_match/3.

Using modules

Looking at the code above, you can see that the glue code of match/2 is developed in the client module. That’s definitely far from perfect, since each client must provide correct implementation of this function, and be aware of how do_match function must be invoked.

It would be better if Plug.Router abstraction could provide this implementation for us. For that purpose we can reach for the use macro, a rough equivalent of mixins in other languages.

The general idea is as follows:

defmodule ClientCode do
  # invokes the mixin
  use GenericCode, option_1: value_1, option_2: value_2, ...
end

defmodule GenericCode do
  # called when the module is used
  defmacro __using__(options) do
    # generates an AST that will be inserted in place of the use
    quote do
      ...
    end
  end
end

So the use mechanism allows us to inject some piece of code into the caller’s context. This is just a replacement for something like:

defmodule ClientCode do
  require GenericCode
  GenericCode.__using__(...)
end

Which can be proven by looking in Elixir source code. This proves another point - that of incremental expansion. The use macro generates the code which will call another macro. Or to put it more fancy, use generates a code that generates a code. As mentioned earlier, the compiler will simply reexpand this until there’s nothing left to be expanded.

Armed with this knowledge, we can move the implementation of the match function to the generic Plug.Router module:

defmodule Plug.Router do
  defmacro __using__(_options) do
    quote do
      import Plug.Router

      def match(type, route) do
        do_match(type, route, :dummy_connection)
      end
    end
  end

  defmacro get(route, body) do
    ... # This code remains the same
  end
end

This now keeps the client code very lean:

defmodule MyRouter do
  use Plug.Router

  get "/hello", do: {conn, "Hi!"}
  get "/goodbye", do: {conn, "Bye!"}
end

As mentioned, the AST generated by the __using__ macro will simply be injected in place of the use Plug.Router call. Take special note how we do import Plug.Router from the __using__ macro. This is not strictly needed, but it allows the client to call get instead of Plug.Router.get.

So what have we gained? The various boilerplate is now confined to the single place (Plug.Router). Not only does this simplify the client code, it also keeps the abstraction properly closed. The module Plug.Router ensures that whatever is generated by get macros fits properly with the generic code of match. As clients, we simply use the module and call into the provided macros to assemble our router.

This concludes today’s session. Many details are not covered, but hopefully you have a better understanding of how macros integrate with the Elixir compiler. In the next part I’ll dive deeper and start exploring how we can tear apart the input AST.

Understanding Elixir Macros, Part 1 - Basics

Fri, 6 Jun 14 00:00:00 +0000

Understanding Elixir Macros, Part 1 - Basics

2014-06-06

This is the first article in the miniseries that deals with macros. I originally planned on treating this topic in my upcoming Elixir in Action book, but decided against it because the subject somehow doesn’t fit into the main theme of the book that is more focused on the underlying VM and crucial parts of OTP.

Instead, I decided to provide a treatment on macros here. Personally, I find the subject of macros very interesting, and in this miniseries I’ll try to explain how they work, providing some basic techniques and advices on how to write them. While I’m convinced that writing macros is not very hard, it certainly requires a higher level of attention compared to plain Elixir code. Thus, I think it’s very helpful to understand some inner details of Elixir compiler. Knowing how things tick behind the scene makes it easier to reason about the meta-programming code.

This will be a medium-level difficulty text. If you’re familiar with Elixir and Erlang, but are still somewhat confused about macros, then you’re in the right place. If you’re new to Elixir and Erlang, it’s probably better to start with something else, for example the Getting started guide, or one of the available books.

Meta-programming

Chances are you’re already somewhat familiar with meta-programming in Elixir. The essential idea is that we have a code that generates code based on some input.

Owing to macros we can write constructs like this one from Plug:

get "/hello" do
  send_resp(conn, 200, "world")
end

match _ do
  send_resp(conn, 404, "oops")
end

or this from ExActor:

defmodule SumServer do
  use ExActor.GenServer

  defcall sum(x, y), do: reply(x+y)
end

In both cases, we are running some custom macro in compile time that will transform the original code to something else. Calls to Plug’s get and match will create a function, while ExActor’s defcall will generate two functions and the code that properly propagates arguments from the client process to the server.

Elixir itself is heavily powered by macros. Many constructs, such as defmodule, def, if, unless, and even defmacro are actually macros. This keeps the language core minimal, and simplifies further extensions to the language.

Related, but somewhat less known is the possibility to generate functions on the fly:

defmodule Fsm do
  fsm = [
    running: {:pause, :paused},
    running: {:stop, :stopped},
    paused: {:resume, :running}
  ]

  for {state, {action, next_state}} <- fsm do
    def unquote(action)(unquote(state)), do: unquote(next_state)
  end
  def initial, do: :running
end

Fsm.initial
# :running

Fsm.initial |> Fsm.pause
# :paused

Fsm.initial |> Fsm.pause |> Fsm.pause
# ** (FunctionClauseError) no function clause matching in Fsm.pause/1

Here, we have a declarative specification of an FSM that is (again in compile time) transformed into corresponding multi-clause functions.

The similar technique is for example employed by Elixir to generate String.Unicode module. Essentially, this module is generated by reading UnicodeData.txt and SpecialCasing.txt files where codepoints are described. Based on the data from this file, various functions (e.g. upcase, downcase) are generated.

In either case (macros or in-place code generation), we are performing some transformation of the abstract syntax tree structure in the middle of the compilation. To understand how this works, you need to learn a bit about compilation process and AST.

Compilation process

Roughly speaking, the compilation of Elixir code happens in three phases:

The input source code is parsed, and a corresponding abstract syntax tree (AST) is produced. The AST represents your code in form of nested Elixir terms. Then the expansion phase kicks off. It is in this phase that various built-in and custom macros are called to transform the input AST into the final version. Once this transformation is done, Elixir can produce final bytecode - a binary representation of your source program.

This is just an approximation of the process. For example, Elixir compiler actually generates Erlang AST and relies on Erlang functions to transform it into a bytecode, but it’s not important to know exact details. However, I find this general picture helpful when reasoning about meta-programming code.

The main point to understand is that meta-programming magic happens in the expansion phase. The compiler initially starts with an AST that closely resembles your original Elixir code, and then expands it to the final version.

Another important takeaway from this diagram is that in Elixir, meta-programming stops after binaries are produced. Except for code upgrades or some dynamic code loading trickery (which is beyond the scope of this article), you can be sure that your code is not redefined. While meta-programming always introduces an invisible (or not so obvious) layer to the code, in Elixir this at least happens only in compile-time, and is thus independent of various execution paths of a program.

Given that code transformation happens in compile time, it is relatively easy to reason about the final product, and meta-programming doesn’t interfere with static analysis tools, such as dialyzer. Compile-time meta-programming also means that we get no performance penalty. Once we get to run-time, the code is already shaped, and no meta-programming construct is running.

Creating AST fragments

So what is an Elixir AST. It is an Elixir term, a deep nested hierarchy that represents a syntactically correct Elixir code. To make things clearer, let’s see some examples. To generate an AST of some code, you can use quote special form:

iex(1)> quoted = quote do 1 + 2 end
{:+, [context: Elixir, import: Kernel], [1, 2]}

Quote takes an arbitrarily complex Elixir expression and returns the corresponding AST fragment that describes that input code.

In our case, the result is an AST fragment describing simple sum operation (1+2). This is often called a quoted expression.

Most of the time you don’t need to understand the exact details of the quoted structure, but let’s take a look at this simple example. In this case our AST fragment is a triplet that consists of:

An atom identifying the operation that will be invoked (:+)
A context of the expression (e.g. imports and aliases). Most of the time you don’t need to understand this data
The arguments (operands) of the operation

The main point is that quoted expression is an Elixir term that describes the code. The compiler will use this to eventually generate the final bytecode.

Though not very common, it is possible to evaluate a quoted expression:

iex(2)> Code.eval_quoted(quoted)
{3, []}

The result tuple contains the result of the expression, and the list of variable bindings that are made in that expression.

However, before the AST is somehow evaluated (which is usually done by the compiler), the quoted expression is not semantically verified. For example, when we write following expression:

iex(3)> a + b
** (RuntimeError) undefined function: a/0

We get an error, since there’s no variable (or function) called a.

In contrast, if we quote the expression:

iex(3)> quote do a + b end
{:+, [context: Elixir, import: Kernel], [{:a, [], Elixir}, {:b, [], Elixir}]}

There’s no error. We have a quoted representation of a+b, which means we generated the term that describes the expression a+b, regardless of whether these variables exist or not. The final code is not yet emitted, so there’s no error.

If we insert that representation into some part of the AST where a and b are valid identifiers, this code will be correct.

Let’s try this out. First, we’ll quote the sum expression:

iex(4)> sum_expr = quote do a + b end

Then we’ll make a quoted binding expression:

iex(5)> bind_expr = quote do
          a=1
          b=2
        end

Again, keep in mind that these are just quoted expressions. They are simply the data that describes the code, but nothing is yet evaluated. In particular, variables a and b don’t exist in the current shell session.

To make these fragments work together, we must connect them:

iex(6)> final_expr = quote do
          unquote(bind_expr)
          unquote(sum_expr)
        end

Here we generate a new quoted expression that consists of whatever is in bind_expr, followed by whatever is in sum_expr. Essentially, we produced a new AST fragment that combines both expressions. Don’t worry about the unquote part - I’ll explain this in a bit.

In the meantime, we can evaluate this final AST fragment:

iex(7)> Code.eval_quoted(final_expr)
{3, [{{:a, Elixir}, 1}, {{:b, Elixir}, 2}]}

Again, the result consists of the result of an expression (3) and bindings list where we can see that our expression bound two variables a and b to the respective values of 1 and 2.

This is the core of meta-programming approach in Elixir. When meta-programming, we essentially compose various AST fragments to generate some alternate AST that represents the code we want to produce. In doing so, we’re most often not interested in the exact contents or structure of input AST fragments (the ones we combine). Instead, we use quote to generate and combine input fragments and generate some decorated code.

Unquoting

This is where unquote comes into play. Notice that whatever is inside the quote block is, well, quoted - turned into an AST fragment. This means we can’t normally inject the contents of some variable that exists outside of our quote. Looking at the example above, this wouldn’t work:

quote do
  bind_expr
  sum_expr
end

In this snippet, quote simply generates quoted references to bind_expr and sum_expr variables that must exist in the context where this AST will be interpreted. However, this is not what we want in our case. What we need is a way of directly injecting contents of bind_expr and sum_expr to corresponding places in the AST fragment we’re generating.

That’s the purpose of unquote(…) - the expression inside parentheses is immediately evaluated, and inserted in place of the unquote call. This in turn means that the result of unquote must also be a valid AST fragment.

Another way of looking at unquote is to treat it as an analogue to string interpolation (#{}). With strings you can do this:

"... #{some_expression} ... "

Similarly, when quoting you can do this:

quote do
  ...
  unquote(some_expression)
  ...
end

In both cases, you evaluate an expression that must be valid in the current context, and inject the result in the expression you’re building (either string, or an AST fragment).

It’s important to understand this, because unquote is not a reversal of a quote. While quote takes a code fragment and turns it into a quoted expression, unquote doesn’t do the opposite. If you want to turn a quoted expression into a string, you can use Macro.to_string/1.

Example: tracing expressions

Let’s combine this theory into a simple example. We’ll write a macro that can help us in debugging the code. Here’s how this macro can be used:

iex(1)> Tracer.trace(1 + 2)
Result of 1 + 2: 3
3

The Tracer.trace takes a given expression and prints it’s result to the screen. Then the result of the expression is returned.

It’s important to realize that this is a macro, which means that input expression (1 + 2) will be transformed into something more elaborate - a code that prints the result and returns it. This transformation will happen in the expansion time, and the resulting bytecode will contain some decorated version of the input code.

Before looking at the implementation, it might be helpful to imagine the final result. When we call Tracer.trace(1+2), the resulting bytecode will correspond to something like this:

mangled_result = 1+2
Tracer.print("1+2", mangled_result)
mangled_result

The name mangled_result indicates that Elixir compiler will somehow mangle all temporary variables we’re introducing in our macro. This is also known as the macro hygiene, and we’ll discuss later in this series (though not in this article).

Given this template, here’s how the macro can be implemented:

defmodule Tracer do
  defmacro trace(expression_ast) do
    string_representation = Macro.to_string(expression_ast)

    quote do
      result = unquote(expression_ast)
      Tracer.print(unquote(string_representation), result)
      result
    end
  end

  def print(string_representation, result) do
    IO.puts "Result of #{string_representation}: #{inspect result}"
  end
end

Let’s analyze this code one step at a time.

First, we define the macro using defmacro. A macro is essentially a special kind of function. It’s name will be mangled, and this function is meant to be invoked only in the expansion phase (though you could theoretically still call it in run-time).

Our macro receives a quoted expression. This is very important to keep in mind - whichever arguments you send to a macro, they will already be quoted. So when we call Tracer.trace(1+2), our macro (which is a function) won’t receive 3. Instead, the contents of expression_ast will be the result of quote(do: 1+2).

In line 3, we use Macro.to_string/1 to compute the string representation of the received AST fragment. This is the kind of thing you can’t do with a plain function that is called in runtime. While its possible to call Macro.to_string/1 in runtime, the problem is that we don’t have an access to AST anymore, and therefore don’t know what is the string representation of some expression.

Once we have a string representation, we can generate and return the resulting AST, which is done from the quote do … end construct. The result of this is the quoted expression that will substitute the original Tracer.trace(…) call.

Let’s look at this part closer:

quote do
  result = unquote(expression_ast)
  Tracer.print(unquote(string_representation), result)
  result
end

If you understood the explanation of unquote then this is reasonably simple. We essentially inject the expression_ast (quoted 1+2) into the fragment we’re generating, taking the result of the operation into the result variable. Then we print this together with the stringified expression (obtained via Macro.to_string/1), and finally return the result.

Expanding an AST

It is easy to observe how this connects in the shell. Start the iex shell and copy-paste the definition of the Tracer module above:

iex(1)> defmodule Tracer do
          ...
        end

Then, you must require the Tracer module:

iex(2)> require Tracer

Next, let’s quote a call to trace macro:

iex(3)> quoted = quote do Tracer.trace(1+2) end
{{:., [], [{:__aliases__, [alias: false], [:Tracer]}, :trace]}, [],
 [{:+, [context: Elixir, import: Kernel], [1, 2]}]}

Now, this output looks a bit scary, and you usually don’t have to understand it. But if you look close enough, somewhere in this structure you can see a mention of Tracer and trace, which proves that this AST fragment corresponds to our original code, and is not yet expanded.

Now, we can turn this AST into an expanded version, using Macro.expand/2:

iex(4)> expanded = Macro.expand(quoted, __ENV__)
{:__block__, [],
 [{:=, [],
   [{:result, [counter: 5], Tracer},
    {:+, [context: Elixir, import: Kernel], [1, 2]}]},
  {{:., [], [{:__aliases__, [alias: false, counter: 5], [:Tracer]}, :print]},
   [], ["1 + 2", {:result, [counter: 5], Tracer}]},
  {:result, [counter: 5], Tracer}]}

This is now the fully expanded version of our code, and somewhere inside it you can see mentions of result (the temporary variable introduced by the macro), and the call to Tracer.print/2. You can even turn this expression into a string:

iex(5)> Macro.to_string(expanded) |> IO.puts
(
  result = 1 + 2
  Tracer.print("1 + 2", result)
  result
)

The point of all this is to demonstrate that your macro call is really expanded to something else. This is how macros work. Though we only tried it from the shell, the same things happen when we’re building our projects with mix or elixirc.

I guess this is enough for the first session. You’ve learned a bit about the compiler process and the AST, and seen a fairly simple example of a macro. In the next installment, I’ll dive a bit deeper, discussing some mechanical aspects of macros.

Why Elixir

Tue, 21 Jan 14 00:00:00 +0000

Why Elixir

2014-01-21

It’s been about a year since I’ve started using Elixir. Originally, I intended to use the language only for blogging purposes, thinking it could help me better illustrate benefits of Erlang Virtual Machine (EVM). However, I was immediately fascinated with what the language brings to the table, and very quickly introduced it to the Erlang based production system I have been developing at the time. Today, I consider Elixir as a better alternative for the development of EVM powered systems, and in this posts I’ll try to highlight some of its benefits, and also dispell some misconceptions about it.

The problems of Erlang the language

EVM has many benefits that makes it easier to build highly-available, scalable, fault-tolerant, distributed systems. There are various testimonials on the Internet, and I’ve blogged a bit about some advantages of Erlang here and here, and the chapter 1 of my upcoming book Elixir in Action presents benefits of both Erlang and Elixir.

Long story short, Erlang provides excellent abstractions for managing highly-scalable, fault-tolerant systems, which is particularly useful in concurrent systems, where many independent or loosely-dependent tasks must be performed. I’ve been using Erlang in production for more than three years, to build a long-polling based HTTP push server that in peak time serves over 2000 reqs/sec (non-cached). Never before have I written anything of this scale nor have I ever developed something this stable. The service just runs happily, without me thinking about it. This was actually my first Erlang code, bloated with anti-patterns, and bad approaches. And still, EVM proved to be very resilient, and run the code as best as it could. Most importantly, it was fairly straightforward for me to work on the complex problem, mostly owing to Erlang concurrency mechanism.

However, despite some great properties, I never was (and I’m still not) quite comfortable programming in Erlang. The coding experience somehow never felt very fluent, and the resulting code was always burdened with excessive boilerplate and duplication. The problem was not the language syntax. I did a little Prolog back in my student days, and I liked the language a lot. By extension, I also like Erlang syntax, and actually think it is in many ways nicer and more elegant than Elixir. And this is coming from an OO developer who spent most of his coding time in languages such as Ruby, JavaScript, C# and C++.

The problem I have with Erlang is that the language is somehow too simple, making it very hard to eliminate boilerplate and structural duplication. Conversely, the resulting code gets a bit messy, being harder to write, analyze, and modify. After coding in Erlang for some time, I thought that functional programming is inferior to OO, when it comes to efficient code organization.

What Elixir is (not)

This is where Elixir changed my opinion. After I’ve spent enough time with the language, I was finally able to see benefits and elegance of functional programming more clearly. Now I can’t say anymore that I prefer OO to FP. I find the coding experience in Elixir much more pleasant, and I’m able to concentrate on the problem I’m solving, instead of dealing with the language’s shortcomings.

Before discussing some benefits of Elixir, there is an important thing I’d like to stress: Elixir is not Ruby for Erlang. It is also not CoffeeScript, Clojure, C++ or something else for Erlang. Relationship between Elixir and Erlang is unique, with Elixir being often semantically very close to Erlang, but in addition bringing many ideas from different languages. The end result may on surface look like Ruby, but I find it much more closer to Erlang, with both languages completely sharing the type system, and taking the same functional route.

So what is Elixir? To me, it is an Erlang-like language with improved code organization capabilities. This definition differs from what you’ll see on the official page, but I think it captures the essence of Elixir, when compared to Erlang.

Let me elaborate on this. In my opinion, a programming language has a couple of roles:

It serves as an interface that allows programmers to control something, e.g. a piece of hardware, a virtual machine, a running application, UI layout, …
It shapes the way developers think about the world they’re modeling. An OO language will make us look for entities with state and behavior, while in FP language we’ll think about data and transformations. A declarative programming language will force us to think about rules, while in imperative language we’ll think more about sequence of actions.
It provides tools to organize the code, remove duplications, boilerplate, noise, and hopefully model the problem as closely as possible to the way we understand it.

Erlang and Elixir are completely identical in first two roles - they target the same “thing” (EVM), and they both take a functional approach. It is in role three where Elixir improves on Erlang, and gives us additional tools to organize our code, and hopefully be more efficient in writing production-ready, maintainable code.

Ingredients

Much has been said about Elixir on the Internet, but I especially like two articles from Devin Torres which you can find here and here. Devin is an experienced Erlang developer, who among other things wrote a popular poolboy library, so it’s worth reading what he thinks about Elixir.

I’ll try not to repeat much, and avoid going into many mechanical details. Instead, let’s do a brief tour of main tools that can be used for better code organization.

Metaprogramming

Metaprogramming in Elixir comes in a couple of flavors, but the essence is the same. It allows us to write concise constructs that seems as if they’re a part of the language. These constructs are in compile-time then transformed into a proper code. On a mechanical level, it helps us remove structural duplication - a case where two pieces of code share the same abstract pattern, but they differ in many mechanical details.

For example, a following snippet presents a sketch of a module models a User record:

defmodule User do
  #initializer
  def new(data) do ... end

  # getters
  def name(user) do ... end
  def age(user) do ... end

  # setters
  def name(value, user) do ... end
  def age(value, user) do ... end
end

Some other type of record will follow this pattern, but contain different fields. Instead of copy-pasting this pattern, we can use Elixir defrecord macro:

defrecord User, name: nil, age: 0

Based on the given definition, defrecord generates a dedicated module that contains utility functions for manipulating our User record. Thus, the common pattern is stated only in one place (the code of defrecord macro), while the particular logic is relieved of mechanical implementation details.

Elixir macros are nothing like C/C++ macros. Instead of working on strings, they are something like compile-time Elixir functions that are called in the middle of parsing, and work on the abstract syntax tree (AST), which is a code represented as Elixir data structure. Macro can work on AST, and spit out some alternative AST that represents the generated code. Consequently, macros are executed in compile-time, so once we come to runtime, the performance is not affected, and there are no surprise situations where some piece of code can change the definition of a module (which is possible for example in JavaScript or Ruby).

Owing to macros, most of Elixir, is actually implemented in Elixir, including constructs such as if, unless, or unit testing support. Unicode support works by reading UnicodeData.txt file, and generating the corresponding implementation of Unicode aware string function such as downcase or upcase. This in turn makes it easier for developers to contribute to Elixir.

Macros also allow 3rd party library authors to provide internal DSLs that naturally fit in language. Ecto project, that provides embedded integrated queries, something like LINQ for Elixir, is my personal favorite that really showcases the power of macros.

I’ve seen people sometimes dismissing Elixir, stating they don’t need metaprogramming capabilities. While extremely useful, metaprogramming can also become very dangerous tool, and it is advised to carefully consider their usage. That said, there are many features that are powered by metaprogramming, and even if you don’t write macros yourself, you’ll still probably enjoy many of these features, such as aforementioned records, Unicode support, or integrated query language.

Pipeline operator

This seemingly simple operator is so useful, I “invented” its Erlang equivalent even before I was aware it exists in Elixir (or other languages for that matter).

Let’s see the problem first. In Erlang, there is no pipeline operator, and furthermore, we can’t reassign variables. Therefore, typical Erlang code will often be written with following pattern:

State1 = trans_1(State),
State2 = trans_2(State1),
State3 = trans_3(State2),
...

This is a very clumsy code that relies on intermediate variables, and correct passing of the last result to the next call. I actually had a nasty bug because I accidentally used State6 in one place instead of State7.

Of course, we can go around by inlining function calls:

trans_3(
  trans_2(
    trans_1(State)
  )
)

As you can see, this code can soon get ugly, and the problem is often aggravated when transformation functions receive additional arguments, and the number of transformation increases.

The pipeline operator makes it possible to combine various operations without using intermediate variables:

state
|> trans_1
|> trans_2
|> trans_3

The code reads like the prose, from top to bottom, and highlights one of the strengths of FP, where we treat functions as data transformers that are combined in various ways to achieve the desired result.

For example, the following code computes the sum of squares of all positive numbers of a list:

list
|> Enum.filter(&(&1 > 0))       # take positive numbers
|> Enum.map(&(&1 * &1))         # square each one
|> Enum.reduce(0, &(&1 + &2))   # calculate sum

The pipeline operator works extremely well because the API in Elixir libraries follows the “subject (noun) as the first argument” convention. Unlike Erlang, Elixir takes the stance that all functions should take the thing they operate on as the first argument. So String module functions take string as the first argument, while Enum module functions take enumerable as the first argument.

Polymorphism via protocols

Protocols are the Elixir way of providing something roughly similar to OO interfaces. Initially, I wasn’t much impressed with them, but as the time progressed, I started seeing many benefits they bring. Protocols allow developers to create a generic logic that can be used with any type of data, assuming that some contract is implemented for the given data.

An excellent example is the Enum module, that provides many useful functions for manipulating with anything that is enumerable. For example, this is how we iterate an enumerable:

Enum.each(enumerable, fn -> ... end)

Enum.each works with different types such as lists, or key-value dictionaries, and of course we can add support for our own types by implementing corresponding protocol. This is resemblant of OO interfaces, with an additional twist that it’s possible to implement a protocol for a type, even if you don’t own its source code.

One of the best example of protocol usefulness is the Stream module, which implements a lazy, composable, enumerable abstraction. A stream makes it possible to compose various enumerable transformations, and then generate the result only when needed, by feeding the stream to some function from the Enum module. For example, here’s the code that computes the sum of squares of all positive numbers of a list in a single pass:

list
|> Stream.filter(&(&1 > 0))
|> Stream.map(&(&1 * &1))
|> Enum.reduce(0, &(&1 + &2))   # Entire iteration happens here in a single pass

In lines 2 and 3, operations are composed, but not yet executed. The result is a specification descriptor that implements an Enumerable protocol. Once we feed this descriptor to some Enum function (line 3), it starts producing values. Other than supporting protocol mechanism, there is no special laziness support from Elixir compiler.

The mix tool

The final important piece of puzzle is the tool that help us manage projects. Elixir comes bundled with the mix tool that does exactly that. This is again done in an impressively simple manner. When you create a new project, only 7 files are created (including .gitignore and README.md) on the disk. And this is all it takes to create a proper OTP application. It’s an excellent example of how far can things be simplified, by hiding necessary boilerplate and bureaucracy in the generic abstraction.

Mix tool supports various other tasks, such as dependency management. The tool is also extensible, so you can create your own specific tasks as needed.

Syntactical changes

The list doesn’t stop here, and there are many other benefits Elixir gives us. Many of these do include syntactical changes from Erlang, such as support for variable rebinding, optional parentheses, implicit statement endings, nullability, short circuits operators, …

Admittedly, some ambiguity is introduced due to optional parentheses, as illustrated in this example:

abs -1 + 5    # same as abs(-1 + 5)

However, I use parentheses (except for macros and zero arg functions), so I can’t remember experiencing this problem in practice.

In general, I like many of the decision made in this department. It’s nice to be able to write if without obligatory else. It’s also nice that I don’t have to consciously think which character must I use to end the statement.

Even optional parentheses are good, as they support DSL-ish usage of macros, making the code less noisy. Without them, we would have to add parentheses when invoking macros:

defrecord User, name: nil, age: 0       # without parentheses

defrecord(User, [name: nil, age: 0])    # with parentheses

Still, I don’t find these enhancements to be of crucial importance. They are nice finishing touches, but if this was all Elixir had to offer, I’d probably still use pure Erlang.

Wrapping up

Much has been said in this article, and yet I feel that the magic of Elixir is far from being completely captured. The language preference is admittedly something subjective, but I feel that Elixir really improves on Erlang foundations. With more than three years of production level coding in Erlang, and about a year of using Elixir, I simply find Elixir experience to be much more pleasant. The resulting code seems more compact, and I can be more focused on the problem I’m solving, instead of wrestling with excessive noise and boilerplate.

It is for similar reasons that I like EVM. The underlying concurrency mechanisms makes it radically easier for me to tackle complexity of a highly loaded server-side system that must constantly provide service and perform many simultaneous tasks.

Both Elixir and EVM raise the abstraction bar, and help me tackle complex problems with greater ease. This is why I would always put my money behind Elixir/EVM combination as the tools of choice for building a server-side system. YMMV of course.