Hi, I am Saša Jurić, a software developer with 10+ years of professional experience in programming of web and desktop applications using Elixir, Erlang, Ruby, JavaScript, C# and C++. I'm also the author of the upcoming Elixir in Action book. In this blog you can read about Erlang and other programming related topics. You can subscribe to the feed, follow me on Twitter or fork me on GitHub.

Beyond Task.async

| 2 comments
In this post I'll talk about less typical patterns of parallelization with tasks. Arguably, the most common case for tasks is to start some jobs concurrently with Task.async and then collect the results with Task.await. By doing this we might run separate jobs in parallel, and thus perform the total work more efficiently. This can be done very elegantly with async/await without the much overhead in the code.

However, async/await have some properties which may not be suitable in some cases, so you might need a different approach. That is the topic of this post, but first, let's quickly recap the basic async/await pattern.

Parallelizing with async/await

Async/await makes sense when we need to perform multiple independent computations and aggregate their results into the total output. If computations take some time, we might benefit by running them concurrently, possibly reducing the total execution time from sum(computation_times) to max(computation_times).

The computation can be any activity such as database query, a call to a 3rd party service, or some CPU bound calculation. In this post, I'll just use a contrived stub:

1
2
3
4
5
6
defmodule Computation do
  def run(x) when x > 0 do
    :timer.sleep(x)  # simulates a long-running operation
    x
  end
end

This "computation" takes a positive integer x, sleeps for x milliseconds, and returns the number back. It's just a simulation of a possibly long running operation.

Now, let's say that we need to aggregate the results of multiple computations. Again, I'll introduce a simple stub:

1
2
3
4
5
6
7
8
9
defmodule Aggregator do
  def new, do: 0
  def value(aggregator), do: aggregator

  def add_result(aggregator, result) do
    :timer.sleep(50)
    aggregator + result
  end
end

This is just a simple wrapper which sums input numbers. In real life, this might be a more involved aggregator that somehow combines results of multiple queries into a single "thing".

Assuming that different computations are independent, there is potential to run them concurrently, and this is where tasks come in handy. For example, let's say we need to run this computation for ten different numbers:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
defmodule AsyncAwait do
  def run do
    :random.seed(:os.timestamp)

    1..10
    |> Enum.map(fn(_) -> :random.uniform(1000) end)
    |> Enum.map(&Task.async(fn -> Computation.run(&1) end))
    |> Enum.map(&Task.await/1)
    |> Enum.reduce(Aggregator.new, &Aggregator.add_result(&2, &1))
    |> Aggregator.value
  end
end

This is a fairly simple technique. First, we generate some random input and start the task to handle each element. Then, we await on results of each task and reduce responses into the final value. This allow us to improve the running time, since computations might run in parallel. The total time should be the time of the longest running computation plus the fixed penalty of 500 ms (10 * 50 ms) to include each result into the total output. In this example it shouldn't take longer than 1500 ms to get the final result.

Properties of async/await

Async/await is very elegant and brings some nice benefits, but it also has some limitations.

The first problem is that we await on results in the order we started the tasks. In some cases, this might not be optimal. For example, imagine that the first task takes 500 ms, while all others take 1 ms. This means that we'll process the results of short-running tasks only after we handle the slow task. The total execution time in this example will be about 1 second. From the performance point of view, it would be better if we would take results as they arrive. This would allow us to aggregate most of the results while the slowest task is still running, reducing the execution time to 550 ms.

Another issue is that it's not easy to enforce a global timeout. You can't easily say, "I want to give up if all the results don't arrive in 500 ms". You can provide a timeout to Task.await (it's five seconds by default), but this applies only to a single await operation. Hence, a five seconds timeout actually means we might end up waiting 50 seconds for ten tasks to time out.

Finally, you should be aware that async/await pattern takes the all-or-nothing approach. If any task or the master process crashes, all involved processes will be taken down (unless they're trapping exits). This happens because Task.async links the caller and the spawned task process.

In most situations, these issues won't really matter, and async/await combo will be perfectly fine. However, sometimes you might want to change the default behaviour.

Eliminating await

Let's start by making the "master" process handle results in the order of arrival. This is fairly simple if we rely on the fact that Task.async reports the result back to the caller process via a message. We can therefore receive a message, and check if it comes from one of our task. If so, we can add the result to the aggregator.

To do this, we can rely on Task.find/2 that takes the list of tasks and the message, and returns either {result, task} if the message corresponds to the task in the list, or nil if the message is not from a task in the given list:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
defmodule AsyncFind do
  def run do
    :random.seed(:os.timestamp)

    1..10
    |> Enum.map(fn(_) -> :random.uniform(1000) end)
    |> Enum.map(&Task.async(fn -> Computation.run(&1) end))
    |> collect_results
  end

  defp collect_results(tasks, aggregator \\ Aggregator.new)

  defp collect_results([], aggregator), do: Aggregator.value(aggregator)
  defp collect_results(tasks, aggregator) do
    receive do
      msg ->
        case Task.find(tasks, msg) do
          {result, task} ->
            collect_results(
              List.delete(tasks, task),
              Aggregator.add_result(aggregator, result)
            )

          nil ->
            collect_results(tasks, aggregator)
        end
    end
  end
end

Most of the action happens in collect_results. Here, we loop recursively, waiting for a message to arrive. Then we invoke Task.find/2 to determine whether the message comes from a task. If yes, we delete the task from the list of pending tasks, aggregate the response and resume the loop. The loop stops when there are no more pending tasks in the list. Then, we simply return the aggregated value.

In this example I'm using explicit receive, but in production you should be careful about it. If the master process is a server, such as GenServer or Phoenix.Channel, you should let the underlying behaviour receive messages, and invoke Task.find/2 from the handle_info callback. For the sake of brevity, I didn't take that approach here, but as an exercise you could try to implement it yourself.

One final note: by receiving results as they arrive we lose the ordering. In this case, where we simply sum numbers, this doesn't matter. If you must preserve the ordering, you'll need to include an additional order info, and then sort the results after they are collected.

Handling timeouts

Once we moved away from Task.await, the master process becomes more flexible. For example, we can now easily introduce a global timeout. The idea is simple: after the tasks are started, we can use Process.send_after/3 to send a message to the master process after some time:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
defmodule Timeout do
  def run do
    # exactly the same as before
  end

  defp collect_results(tasks) do
    timeout_ref = make_ref
    timer = Process.send_after(self, {:timeout, timeout_ref}, 900)
    try do
      collect_results(tasks, Aggregator.new, timeout_ref)
    after
      :erlang.cancel_timer(timer)
      receive do
        {:timeout, ^timeout_ref} -> :ok
        after 0 -> :ok
      end
    end
  end

  # ...
end

Here, we create the timer, and a reference which will be a part of the timeout message. Then we enqueue the timeout message to be sent to the master process after 900 ms. Including the reference in the message ensures that the timeout message will be unique for this run, and will not interfere with some other message.

Finally, we start the receive loop and return it's result.

Take special note of the after block where we cancel the timer to avoid sending a timeout message if all the results arrive on time. However, since timer works concurrently to the master process, it is still possible that the message might have been sent just before we canceled the timer, but after all the results are already collected. Thus, we do a receive with a zero timeout to flush the message if it's already in the queue.

With this setup in place, we now need to handle the timeout message:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
defp collect_results([], aggregator, _), do: {:ok, Aggregator.value(aggregator)}
defp collect_results(tasks, aggregator, timeout_ref) do
  receive do
    {:timeout, ^timeout_ref} ->
      {:timeout, Aggregator.value(aggregator)}

    msg ->
      case Task.find(tasks, msg) do
        {result, task} ->
          collect_results(
            List.delete(tasks, task),
            Aggregator.add_result(aggregator, result),
            timeout_ref
          )

        nil -> collect_results(tasks, aggregator, timeout_ref)
      end
  end
end

The core change here is in lines 4-5 where we explicitly deal with the timeout. In this example, we just return what we currently have. Depending on the particular use case, you may want to do something different, for example raise an error.

Explicitly handling errors

The next thing we'll tackle is error handling. Task.async is built in such a way that if something fails, everything fails. When you start the task via async the process will be linked to the caller. This holds even if you use Task.Supervisor.async. As the result, if some task crashes, the master process will crash as well, taking down all other tasks.

If this is not a problem, then Task.async is a perfectly valid solution. However, sometimes you may want to explicitly deal with errors. For example, you might want to just ignore failing tasks, reporting back whatever succeeded. Or you may want to keep the tasks running even if the master process crashes.

There are two basic ways you can go about it: catch errors in the task, or use Task.Supervisor with start_child.

Catching errors

The simplest approach is to encircle the task code with a try/catch block:

1
2
3
4
5
6
7
Task.async(fn ->
  try do
    {:ok, Computation.run(...)}
  catch _, _ ->
    :error
  end
end)


Then, when you receive results, you can explicitly handle each case, ignoring :error results. The implementation is mostly mechanical and left to you as an exercise.

I've occasionally seen some concerns that catching is not the Erlang/Elixir way, so I'd like to touch on this. If you can do something meaningful with an error, catching is a reasonable approach. In this case, we want to collect all the successful responses, so ignoring failed ones is completely fine.

So catching is definitely a simple way of explicitly dealing with errors, but it's not without shortcomings. The main issue is that catch doesn't handle exit signals. Thus, if the task links to some other process, and that other process terminates, the task process will crash as well. Since the task is linked to the master process, this will cause the master process to crash, and in turn crash all other tasks. The link between the caller and the task also means that if the master process crashes, for example while aggregating, all tasks will be terminated.

To overcome this, we can either make all processes trap exits, or remove the link between processes. Trapping exits might introduce some subtle issues (see here for some information), so I'll take the second approach.

Replacing async

The whole issue arises because async links the caller and the task process, which ensures "all-or-nothing" property. This is a perfectly fine decision, but it's not necessarily suitable for all cases. I wonder whether linking should be made optional, but I don't have a strong opinion at the moment.

As it is, Task.async currently establishes a link, and if we want to avoid this, we need to reimplement async ourselves. Here's what we'll do:
  • Start a Task.Supervisor and use Task.Supervisor.start_child to start tasks.
  • Manually implement sending of the return message from the task to the caller.
  • Have the master process monitor tasks so it can be notified about potential crashes. Explicitly handle such messages by removing the crashed task from the list of tasks we await on.
The first point allow us to run tasks in a different part of the supervision tree from the master. Tasks and the master process are no longer linked, and failure of one process doesn't cause failure of others.

However, since we're not using async anymore, we need to manually send the return message to the caller process.

Finally, using the monitor ensures that the master process will be notified if some task crashes and can stop awaiting on their results.

This requires more work, but it provides stronger guarantees. We can now be certain that:
  • A failing task won't crash anyone else.
  • The master process will be informed about the task crash and can do something about it.
  • Even a failure of master process won't cause tasks to crash.
If the third property doesn't suit your purposes, you can simply place the master process and the tasks supervisor under the same common supervisor, with one_for_all or rest_for_one strategy.

This is what I like about Erlang fault-tolerance approach. There are various options with strong guarantees. You can isolate crashes, but you can also connect failures if needed. Some scenarios may require more work, but the implementation is still straightforward. Supporting these scenarios without process isolation and crash propagation would be harder and you might end up reinventing parts of Erlang.

Let's implement this. The top-level run/0 function is now changed a bit:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
defmodule SupervisedTask do
  def run do
    :random.seed(:os.timestamp)
    Task.Supervisor.start_link(name: :task_supervisor)

    work_ref = make_ref

    1..10
    |> Enum.map(fn(_) -> :random.uniform(1000) - 500 end)
    |> Enum.map(&start_computation(work_ref, &1))
    |> collect_results(work_ref)
  end

  # ...
end

First, a named supervisor is started. This is a quick hack to keep the example short. In production, this supervisor should of course reside somewhere in the supervision hierarchy.

Then, a work reference is created, which will be included in task response messages. Finally, we generate some random numbers and start our computations. Notice the :random.uniform(1000) - 500. This ensures that some numbers will be negative, which will cause some tasks to crash.

Tasks now have to be started under the supervisor:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
defp start_computation(work_ref, arg) do
  caller = self

  # Start the task under the named supervisor
  {:ok, pid} = Task.Supervisor.start_child(
    :task_supervisor,
    fn ->
      result = Computation.run(arg)

      # Send the result back to the caller
      send(caller, {work_ref, self, result})
    end
  )

  # Monitor the started task
  Process.monitor(pid)
  pid
end

Finally, we need to expand the receive loop to handle :DOWN messages, which we'll receive when the task terminates:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
defp collect_results(tasks, work_ref) do
  timeout_ref = make_ref
  timer = Process.send_after(self, {:timeout, timeout_ref}, 400)
  try do
    collect_results(tasks, work_ref, Aggregator.new, timeout_ref)
  after
    :erlang.cancel_timer(timer)
    receive do
      {:timeout, ^timeout_ref} -> :ok
      after 0 -> :ok
    end
  end
end

defp collect_results([], _, aggregator, _), do: {:ok, Aggregator.value(aggregator)}
defp collect_results(tasks, work_ref, aggregator, timeout_ref) do
  receive do
    {:timeout, ^timeout_ref} ->
      {:timeout, Aggregator.value(aggregator)}

    {^work_ref, task, result} ->
      collect_results(
        List.delete(tasks, task),
        work_ref,
        Aggregator.add_result(aggregator, result),
        timeout_ref
      )

    {:DOWN, _, _, pid, _} ->
      if Enum.member?(tasks, pid) do
        # Handling task termination. In this case, we simply delete the
        # task from the list of tasks, and wait for other tasks to finish.
        collect_results(List.delete(tasks, pid), work_ref, aggregator, timeout_ref)
      else
        collect_results(tasks, work_ref, aggregator, timeout_ref)
      end
  end
end

This is mostly straightforward, with the major changes happening in lines 29-36. It's worth mentioning that we'll receive a :DOWN message even if the task doesn't crash. However, this message will arrive after the response message has been sent back, so the master process will first handle the response message. Since we remove the task from the list, the subsequent :DOWN message of that task will be ignored. This is not super efficient, and we could have improved this by doing some extra bookkeeping and demonitoring the task after it returns, but I refrained from this for the sake of brevity.

In any case, we can now test it. If I start SupervisedTask.run, I'll see some errors logged (courtesy of Logger), but I'll still get whatever is collected. You can also try it yourself. The code is available here.

Reducing the boilerplate

As we moved to more complex patterns, our master process became way more involved. The plain async/await has only 12 lines of code, while the final implementation has 66. The master process is burdened with a lot of mechanics, such as keeping references, starting a timer message, and handling received messages. There's a lot of potential to extract some of that boilerplate, so we can keep the master process more focused.

There are different approaches to extracting the boilerplate. If a process has to behave in a special way, you can consider creating a generic OTP-like behaviour that powers the process. The concrete implementation then just has to fill in the blanks by providing necessary callback functions.

However, in this particular case, I don't think creating a behaviour is a good option. The thing is that the master process might already be powered by a behaviour, such as GenServer or Phoenix.Channel. If we implement our generic code as a behaviour, we can't really combine it with another behaviour. Thus, we'll always need to have one more process that starts all these tasks and collects their results. This may result in excessive message passing, and have an impact on performance.

An alternative is to implement a helper module that can be used to start tasks and process task related messages. For example, we could have the following interface for starting tasks:

1
2
3
4
5
6
7
8
9
runner = TaskRunner.run(
  [
    {:supervisor1, {SomeModule, :some_function, args}},
    {:supervisor2, {AnotherModule, :some_function, other_args}},
    {:supervisor3, fn -> ... end},
    # ...
  ],
  timeout
)

Under the hood, TaskRunner would start tasks under given supervisors, setup work and timer references, and send the timeout message to the caller process. By allowing different tasks to run under different supervisors, we have more flexibility. In particular, this allows us to start different tasks on different nodes.

The responsibility of receiving messages now lies on the caller process. It has to receive a message either via receive or for example in the handle_info callback. When the process gets a message, it has to first pass it to TaskRunner.handle_message which will return one of the following:
  • nil - a message is not task runner specific, feel free to handle it yourself
  • {{:ok, result}, runner} - a result arrived from a task
  • {{:task_error, reason}, runner} - a task has crashed
  • {:timeout, runner} - timeout has occurred
Finally, we'll introduce a TaskRunner.done?/1 function, which can be used to determine whether all tasks have finished.

This is all we need to make various decision in the client process. The previous example can now be rewritten as:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
defmodule TaskRunnerClient do
  def run do
    :random.seed(:os.timestamp)
    Task.Supervisor.start_link(name: :task_supervisor)

    1..10
    |> Enum.map(fn(_) -> :random.uniform(1000) - 500 end)
    |> Enum.map(&{:task_supervisor, {Computation, :run, [&1]}})
    |> TaskRunner.run(400)
    |> handle_messages(Aggregator.new)
  end


  defp handle_messages(runner, aggregator) do
    if TaskRunner.done?(runner) do
      {:ok, Aggregator.value(aggregator)}
    else
      receive do
        msg ->
          case TaskRunner.handle_message(runner, msg) do
            nil -> handle_messages(runner, aggregator)

            {{:ok, result}, runner} ->
              handle_messages(runner, Aggregator.add_result(aggregator, result))

            {{:task_error, _reason}, runner} ->
              handle_messages(runner, aggregator)

            {:timeout, _runner} ->
              {:timeout, Aggregator.value(aggregator)}
          enda
      end
    end
  end
end

This is less verbose than the previous version, and the receive loop is now focused only on handling of success, error, and timeout, without worrying how these situations are detected.

The code is still more involved than the simple async/await pattern, but it offers more flexibility. You can support various scenarios, such as stopping on first success or reporting the timeout back to the user while letting the tasks finish their jobs. If this flexibility is not important for your particular scenarios, then this approach is an overkill, and async/await should do just fine.

I will not describe the implementation of TaskRunner as it is mostly a refactoring of the code from SupervisedTask. You're advised to try and implement it yourself as an exercise. A basic (definitely not complete or tested) take can be found here.

Parting words

While this article focuses on tasks, in a sense they serve more as an example to illustrate concurrent thinking in Erlang.

Stepping away from Task.await and receiving messages manually allowed the master process to be more flexible. Avoiding links between master and the tasks decoupled their lives, and gave us a better error isolation. Using monitors made it possible to detect failures and perform some special handling. Pushing everything to a helper module, without implementing a dedicated behaviour, gave us the generic code that can be used in different types of processes.

These are in my opinion more important takeaways of this article. In the future the Elixir team may introduce additional support for tasks which will make most of these techniques unnecessary. But the underlying reasoning should be applicable in many other situations, not necessarily task-related.

Speaking at ElixirConf EU

| Comment on this post
I'm very excited that my talk on high-availability got accepted for the first European Elixir conference.

For the past three years, I have been evangelizing Erlang and Elixir here in Croatia, at our local annual WebCampZg event. In addition, I invested a lot of effort writing the book on Elixir which is now in its final stages.

All this work has been motivated by my own positive experience using Erlang and Elixir in production. Some four years ago, I started using Erlang almost by chance, and it helped me immensely in building a long-polling based HTTP push server together with the supporting data provider system. The more I worked with Erlang, the more I got fascinated with how useful it is when it comes to building server-side systems. Ultimately, it became my tool of choice for development of complex backend systems that need to provide reliable service.

I reached for Elixir two years ago, when I started this blog, hoping it will help me showcase the magic behind Erlang to OO developers. I was really surprised with the level of maturity and integration with Erlang, even at that early stage. Pretty soon, I started introducing Elixir in production, and discovered it further boosts my productivity.

Two years later, Elixir 1.0 is out, the ecosystem is growing, and we have great libraries such as Phoenix and Ecto, leveraging Elixir to further improve developer productivity.

Moreover, there's a lot of learning material available. In addition to excellent online getting started guides and reference, there are three published books, with Elixir in Action almost finished, and two more in the making. The present looks great, and the future is even more promising. These are exciting times, and a great chance to jump aboard and start using Elixir and Erlang.

So if you happen to be interested, grab a ticket for ElixirConfEU while it's available. Hope I'll see you there!

While I'm in the announcement mode, I'll also mention that we're starting a local FP group here in Zagreb, Croatia. Details about the introductory drinkup can be found here, so if you happen to live nearby, come and visit us for some functional chat.

Conway's Game of Life in Elixir

| 8 comments
About a month ago, on Elixir Quiz site there was a Conway's Game of Life challenge. While I didn't find the time to participate in the challenge, I played with the problem recently, and found it very interesting.

So in this post, I'm going to break down my solution to the problem. If you're not familiar with Game of Life rules, you can take a quick look here.

My solution is simplified in that I deal only with square grids. It's not very hard to extend it to work for any rectangle, but I wanted to keep things simple.

Functional abstraction

The whole game revolves around the grid of cells which are in some state (dead or alive), and there are clear rules that determine the next state of each cell based on the current state of its neighbours. Thus, I've implemented the Conway.Grid module that models the grid. Let's see how the module will be used.

The initial grid can be created with Conway.Grid.new/1:

1
2
3
4
5
6
7
8
9
# Creates 5x5 grid with random values
grid = Conway.Grid.new(5)

# Creates grid from the given cell data
grid = Conway.Grid.new([
  [1, 1, 0],
  [0, 1, 0],
  [1, 1, 0]
])

As can be deducted from the second example, a cell state can be either zero (not alive) or one (alive).

Once the grid is instantiated, we can move it a step forward with Conway.Grid.next/1:

1
grid = Conway.Grid.next(grid)

Finally, we can query grid's size, and the value of each cell:

1
2
3
4
Conway.Grid.size(grid)

# Returns 0 or 1 for the cell at the given location
Conway.Grid.cell_status(grid, x, y)

This is all we need to manipulate the grid and somehow display it.

This is a simple decoupling technique. The game logic is contained in the single module, but the "driving" part of the game, i.e. the code that repeatedly moves the game forward, is left out.

This allows us to use the core game module in different contexts. In my example, I'm using Conway.Grid from a simplistic terminal client, but it's easy to use the module from a GenServer for example to push updates to various connected clients, or from unit tests to verify that state transition works properly.

Another nice benefit of this approach is that we can use :erlang.term_to_binary/1 to serialize the structure and persist the grid state, and then later deserialize it and resume playing the grid.

This is what I like to call a functional abstraction. Notice in previous examples how we use Conway.Grid without knowing its internal representation. The module abstracts away its internal details. In particular, as clients, we don't care what data type is used for the module. All we know that creator and updater functions return a "grid", and all functions from Conway.Grid know how to work with that grid.

The module thus abstracts some concept, and does so relying on a pure functional (immutable) data structure. Hence, a functional abstraction.

Note: Frequently, the term type is used for this. I'm not particular fan of this terminology. To me, the only true Elixir types are the ones supported by BEAM. All others, such as HashDict, HashSet, Range, Erlang's :gb_trees, and even structs, are somehow composed from those basic types.

Choosing the data representation

Update: As Greg and leikind pointed out in comments, the approach I'm taking here is neither efficient nor flexible, because I'm keeping and processing all cells, instead of dealing only with live ones. You can find the alternative version, where only live cells are kept in a HashSet here. The nice thing is that the change was simple, due to abstraction of the Conway.Grid. The module interface remained the same.

In any case, let's start implementing Conway.Grid. The most important decision is how to represent the grid data. Given the game rules, we have following needs:
  • random access to cells (their states)
  • incremental building of the grid
We need the first property to access neighbour cells when determining the next state of each cell. The second property is needed since in each step we fully rebuild the grid based on the current state of each cell.

In BEAM, tuples are a good fit for random access (which is O(1) operation), but they are poor for incremental building. Modifying a tuple (almost always) results in (shallow) copying of all tuple elements. This can hurt performance and increase memory usage.

In contrast, lists are crappy for random access, but they are efficient for incremental building, if we're either prepending new elements to the head, or building the list in a body-recursive way.

However, we can use different approaches in different situations. In particular, we can:
  • Maintain a 2D grid as a tuple of tuples. This gives us an O(1) random access complexity.
  • Build a new grid as a lists of lists. Once the new grid is built, convert it to tuple of tuples via List.to_tuple/1.
List.to_tuple/1 will be efficient (though still O(n)), since it is implemented in C, and does it's job by preallocating the tuple and populating it from the list. Thus, we avoid extra copying of tuples.

Performance wise, this is probably not the optimal implementation, but I think it's a reasonable first attempt that still keeps the code simple and clear.

So to recap, out grid will be implemented as the tuple of tuples:

1
2
3
4
5
{
  {1, 1, 0},
  {0, 1, 0},
  {1, 1, 0}
}

This is all the data we need, since we can efficiently derive the grid size from the data via Kernel.tuple_size/1. It's still worth making our Conway.Grid a struct, so we can gain pattern matching, possible polymorphism, and easier extensibility.

Hence, the skeleton of the module will look like:

1
2
3
4
5
defmodule Conway.Grid do
  defstruct data: nil

  ...
end

Now we can start implementing the module.

Constructing the grid

Recall from usage examples that our "constructor" function is overloaded. It either takes a grid dimension and creates the randomly populated grid, or it takes a list of lists with prepopulated data.

Let's solve the latter case first:

1
2
3
4
5
6
7
8
9
def new(data) when is_list(data) do
  %Conway.Grid{data: list_to_data(data)}
end

defp list_to_data(data) do
  data
  |> Enum.map(&List.to_tuple/1)     # convert every inner list
  |> List.to_tuple                  # convert the outer list
end

Now, we can do the random population. We'll first implement a helper generic function for creating the grid data:

1
2
3
4
5
6
7
8
defp new_data(size, producer_fun) do
  for y <- 0..(size - 1) do
    for x <- 0..(size - 1) do
      producer_fun.(x, y)
    end
  end
  |> list_to_data
end

Here, we take the desired size, and produce a square list of lists, calling the producer_fun lambda for each element. Then, we just pass it to list_to_data/1 to convert to a tuple of tuples. This genericity of new_data/2 will allow us to reuse the code when moving the grid to the next state.

For the moment, we can implement the second clause of new/1:

1
2
3
4
5
def new(size) when is_integer(size) and size > 0 do
  %Conway.Grid{
    data: new_data(size, fn(_, _) -> :random.uniform(2) - 1 end)
  }
end


Next, let's implement two getter functions for retrieving the grid size and the state of each cell:

1
2
3
4
5
6
7
def size(%Conway.Grid{data: data}), do: tuple_size(data)

def cell_status(grid, x, y) do
  grid.data
  |> elem(y)
  |> elem(x)
end


Shifting the state

The only thing remaining is to move the grid to the next state. Let's start with the interface function:

1
2
3
4
5
def next(grid) do
  %Conway.Grid{grid |
    data: new_data(size(grid), &next_cell_status(grid, &1, &2))
  }
end

As mentioned earlier, we reuse the existing new_data/2 function. We just provide a different lambda which will generate new cell states based on the current grid state.

Implementation of next_cell_status/3 embeds the game rules:

1
2
3
4
5
6
7
8
def next_cell_status(grid, x, y) do
  case {cell_status(grid, x, y), alive_neighbours(grid, x, y)} do
    {1, 2} -> 1
    {1, 3} -> 1
    {0, 3} -> 1
    {_, _} -> 0
  end
end

Here I've resorted to a case branch, because I think it's the most readable approach in this case. I've experimented with moving this branching to a separate multiclause, but then it was less clear what is being pattern-matched.

Counting alive neighbours

Now we move to the most complex part of the code. Calculating the number of alive neighbours. For this, we have to get the state of each surrounding cell, and count the number of those which are alive.

In this example, I've decided to use the for comprehension, because it has nice support for multiple generators and rich filters.

However, for emits results to a collectable, and we need a single integer (the count of alive neighbours). Therefore, I've implemented a simple sum collectable. It allows us to collect an enumerable of numbers into an integer containing their sum.

The idea is then to use for to filter all alive neighbours, emit value 1 for each such neighbour, and collect those 1s into a Sum instance:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
defp alive_neighbours(grid, cell_x, cell_y) do
  # 1. Iterate all x,y in -1..+1 area
  for x <- (cell_x - 1)..(cell_x + 1),
      y <- (cell_y - 1)..(cell_y + 1),
      (
        # take only valid coordinates
        x in 0..(size(grid) - 1) and
        y in 0..(size(grid) - 1) and

        # don't include the current cell
        (x != cell_x or y != cell_y) and

        # take only alive cells
        cell_status(grid, x, y) == 1
      ),
      # collect to Sum
      into: %Sum{}
  do
    1   # add 1 for every alive neighbour
  end
  |> Sum.value    # get the sum value
end

I did initial implementation of this with nested Enum.reduce/3 and I wasn't as pleased. This solution actually takes more LOC, but I find it easier to understand. There are many other ways of implementing this counting, but to me this approach seems pretty readable. YMMV of course.

Update: Tallak Tveide rightfully asked why not just pipe the result of for into Enum.sum/1 (note also that Enum.count/1 also works). This will work, and quite possibly perform just fine. However, when I was first writing this particular function, I asked myself why would I want to create an intermediate enumerable just to count its size. This is why I made the Sum collectable. It's probably over-engineering / micro-optimizing for this case, but I found it an interesting exercise. As an added benefit, I have a generic Sum collectable which I can use in any of my code whenever I need to count the number of filtered items.

In any case, we're done. The simple implementation of Conway's Game of Life is finished. We have a nice functional abstraction and a basic terminal client. Give it a try on your machine. Just paste the complete code into the iex shell, or run it with elixir conway.ex.

Understanding Elixir macros, Part 6 - In-place code generation

| Comment on this post
Today's post is the last one in the macro series. Before starting, I'd like to extend kudos to Björn Rochel who already improved on deftraceable macro in his Apex library. Björn discovered that the blog version of deftraceable doesn't handle default args (arg \\ def_value) properly, and implemented a fix.

In the meantime, let's wrap up this macro saga. In today's post, probably the most involved one in the entire series, I'm going to discuss some aspects of an in-place code generation, and the consequences it may have on our macros.

Generating code in the module

As I mentioned way back in part 1, macros are not the only meta-programming mechanism in Elixir. It is also possible to generate the code directly in the module. To refresh your memory, let's see the example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
defmodule Fsm do
  fsm = [
    running: {:pause, :paused},
    running: {:stop, :stopped},
    paused: {:resume, :running}
  ]

  # Dynamically generating functions directly in the module
  for {state, {action, next_state}} <- fsm do
    def unquote(action)(unquote(state)), do: unquote(next_state)
  end
  def initial, do: :running
end

Fsm.initial
# :running

Fsm.initial |> Fsm.pause
# :paused

Fsm.initial |> Fsm.pause |> Fsm.pause
# ** (FunctionClauseError) no function clause matching in Fsm.pause/1

Here, we're dynamically generating function clauses directly in the module. This allows us to metaprogram against some input (in this case a keyword list), and generate the code without writing a dedicated macro.

Notice in the code above how we use unquote to inject variables into function clause definition. This is perfectly in sync with how macros work. Keep in mind that def is also a macro, and a macro always receives it's arguments quoted. Consequently, if you want a macro argument to receive the value of some variable, you must use unquote when passing that variable. It doesn't suffice to simply call def action, because def macro receives a quoted reference to action rather than value that is in the variable action.

You can of course call your own macros in such dynamic way, and the same principle will hold. There is an unexpected twist though - the order of evaluation is not what you might expect.

Order of expansion

As you'd expect, the module-level code (the code that isn't a part of any function) is evaluated in the expansion phase. Somewhat surprisingly, this will happen after all macros (save for def) have been expanded. It's easy to prove this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
iex(1)> defmodule MyMacro do
          defmacro my_macro do
            IO.puts "my_macro called"
            nil
          end
        end

iex(2)> defmodule Test do
          import MyMacro

          IO.puts "module-level expression"
          my_macro
        end

# Output:
my_macro called
module-level expression

See from the output how mymacro is called before IO.puts even though the corresponding IO.puts call precedes the macro call. This proves that compiler first resolves all "standard" macros. Then the module generation starts, and it is in this phase where module-level code, together with calls to def is being evaluated.

Module-level friendly macros

This has some important consequences on our own macros. For example, our deftraceable macro could also be invoked dynamically. However, this currently won't work:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
iex(1)> defmodule Tracer do ... end

iex(2)> defmodule Test do
          import Tracer

          fsm = [
            running: {:pause, :paused},
            running: {:stop, :stopped},
            paused: {:resume, :running}
          ]

          for {state, {action, next_state}} <- fsm do
            # Using deftraceable dynamically
            deftraceable unquote(action)(unquote(state)), do: unquote(next_state)
          end
          deftraceable initial, do: :running
        end

** (MatchError) no match of right hand side value: :error
    expanding macro: Tracer.deftraceable/2
    iex:13: Test (module)

This falls with a somewhat cryptic and not very helpful error. So what went wrong? As mentioned in previous section, macros are expanded before in-place module evaluation starts. For us this means that deftraceable is called before the outer for comprehension is even evaluated.

Consequently, even though it is invoked from a comprehension, deftraceable will be invoked exactly once. Moreover, since comprehension is not yet evaluated, inner variables state, action, and next_state are not present when our macro is called.

How can this even work? Essentially, our macro will be called with quoted unquote - head and body will contain ASTs that represents unquote(action)(unquote(state)) and unquote(next_state) respectively.

Now, recall that in the current version of deftraceable, we make some assumptions about input in our macro. Here's a sketch:

1
2
3
4
5
6
7
8
defmacro deftraceable(head, body) do
  # Here, we are assuming how the input head looks like, and perform some
  # AST transformations based on those assumptions.

  quote do
    ...
  end
end

And that's our problem. If we call deftraceable dynamically, while generating the code in-place, then such assumptions no longer hold.

Deferring code generation

When it comes to macro execution, it's important to distinguish between the macro context and the caller's context:

1
2
3
4
5
6
7
8
defmacro my_macro do
  # Macro context: the code here is a normal part of the macro, and runs when
  # the macro is invoked.

  quote do
    # Caller's context: generated code that runs in place where the macro is
    # invoked.
  end

This is where things get a bit tricky. If we want to support module-level dynamic calls of our macros, we shouldn't assume anything in the macro context. Instead, we should defer the code generation to the caller's context.

To say it in code:

1
2
3
4
5
6
7
8
defmacro deftraceable(head, body) do
  # Macro context: we shouldn't assume anything about the input AST here

  quote do
    # Caller's context: we should transfer input AST here, and then make our
    # assumptions here.
  end
end

Why can we make assumptions in the caller's context? Because this code will run after all macros have been expanded. For example, remember that even though our macro is invoked from inside a comprehension, it will be called only once. However, the code generated by our macro will run in the comprehension - once for each element.

So this approach amounts to deferring the final code generation. Instead of immediately generating the target code, we generate intermediate module-level statements that will generate the final code. These intermediate statements will run at the latest possible moment of expansion, after all other macros have been resolved:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
defmodule Test do
  ...

  for {state, {action, next_state}} <- fsm do
    # After deftraceable is expanded, here we'll get a plain code that
    # generates target function. This code will be invoked once for
    # every step of the for comprehension. At this point, we're in the
    # caller's context, and have an access to state, action, and next_state
    # variables and can properly generate corresponding function.
  end

  ...
end

Before implementing the solution, it's important to note that this is not a universal pattern, and you should consider whether you really need this approach.

If your macro is not meant to be used on a module-level, then you should probably avoid this technique. Otherwise, if your macro is called from inside function definition, and you move the generation to the caller's context, you'll essentially move the code execution from compile-time to run-time, which can affect performance.

Moreover, even if your macro is running on a module-level, this technique won't be necessary as long as you don't make any assumptions about the input. For example, in part 2, we made a simulation of Plug's get macro:

1
2
3
4
5
6
7
defmacro get(route, body) do
  quote do
    defp do_match("GET", unquote(route), var!(conn)) do
      unquote(body[:do])
    end
  end
end

Even though this macro works on a module-level it doesn't assume anything about the format of the AST, simply injecting input fragments in the caller's context, sprinkling some boilerplate around. Of course, we're expecting here that body will have a :do option, but we're not assuming anything about the specific shape and format of body[:do] AST.

To recap, if your macro is meant to be called on a module-level, this could be the general pattern:

1
2
3
4
5
6
7
8
9
defmacro ...
  # Macro context:
  # Feel free to do any preparations here, as long as you don't assume anything
  # about the shape of the input AST

  quote do
    # Caller's context:
    # If you're analyzing and/or transforming input AST you should do it here.
  end

Since the caller context is module-level, this deferred transformation will still take place in compilation time, so there will be no runtime performance penalties.

The solution

Given this discussion, the solution is relatively simple, but explaining it is fairly involved. So I'm going to start by showing you the end result (pay attention to comments):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
defmodule Tracer do
  defmacro deftraceable(head, body) do
    # This is the most important change that allows us to correctly pass
    # input AST to the caller's context. I'll explain how this works a
    # bit later.
    quote bind_quoted: [
      head: Macro.escape(head, unquote: true),
      body: Macro.escape(body, unquote: true)
    ] do
      # Caller's context: we'll be generating the code from here

      # Since the code generation is deferred to the caller context,
      # we can now make our assumptions about the input AST.

      # This code is mostly identical to the previous version
      #
      # Notice that these variables are now created in the caller's context.
      {fun_name, args_ast} = Tracer.name_and_args(head)
      {arg_names, decorated_args} = Tracer.decorate_args(args_ast)

      # Completely identical to the previous version.
      head = Macro.postwalk(head,
        fn
          ({fun_ast, context, old_args}) when (
            fun_ast == fun_name and old_args == args_ast
          ) ->
            {fun_ast, context, decorated_args}
          (other) -> other
      end)

      # This code is completely identical to the previous version
      # Note: however, notice that the code is executed in the same context
      # as previous three expressions.
      #
      # Hence, the unquote(head) here references the head variable that is
      # computed in this context, instead of macro context. The same holds for
      # other unquotes that are occuring in the function body.
      #
      # This is the point of deferred code generation. Our macro generates
      # this code, which then in turn generates the final code.
      def unquote(head) do
        file = __ENV__.file
        line = __ENV__.line
        module = __ENV__.module

        function_name = unquote(fun_name)
        passed_args = unquote(arg_names) |> Enum.map(&inspect/1) |> Enum.join(",")

        result = unquote(body[:do])

        loc = "#{file}(line #{line})"
        call = "#{module}.#{function_name}(#{passed_args}) = #{inspect result}"
        IO.puts "#{loc} #{call}"

        result
      end
    end
  end

  # Identical to the previous version, but functions are exported since they
  # must be called from the caller's context.
  def name_and_args({:when, _, [short_head | _]}) do
    name_and_args(short_head)
  end

  def name_and_args(short_head) do
    Macro.decompose_call(short_head)
  end

  def decorate_args([]), do: {[],[]}
  def decorate_args(args_ast) do
    for {arg_ast, index} <- Enum.with_index(args_ast) do
      arg_name = Macro.var(:"arg#{index}", __MODULE__)

      full_arg = quote do
        unquote(arg_ast) = unquote(arg_name)
      end

      {arg_name, full_arg}
    end
    |> List.unzip
    |> List.to_tuple
  end
end

Let's try the macro:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
iex(1)> defmodule Tracer do ... end

iex(2)> defmodule Test do
          import Tracer

          fsm = [
            running: {:pause, :paused},
            running: {:stop, :stopped},
            paused: {:resume, :running}
          ]

          for {state, {action, next_state}} <- fsm do
            deftraceable unquote(action)(unquote(state)), do: unquote(next_state)
          end
          deftraceable initial, do: :running
        end

iex(3)> Test.initial |> Test.pause |> Test.resume |> Test.stop

iex(line 15) Elixir.Test.initial() = :running
iex(line 13) Elixir.Test.pause(:running) = :paused
iex(line 13) Elixir.Test.resume(:paused) = :running
iex(line 13) Elixir.Test.stop(:running) = :stopped

As you can see, the change is not very complicated. We managed to keep most of our code intact, though we had to do some trickery with quote bind_quoted: true and Macro.escape:

1
2
3
4
5
6
quote bind_quoted: [
  head: Macro.escape(head, unquote: true),
  body: Macro.escape(body, unquote: true)
] do
  ...
end

Let's take a closer look at what does it mean.

bind_quoted

Remember that our macro is generating a code that will generate the final code. Somewhere in the first-level generated code (the one returned by our macro), we need to place the following expression:

1
def unquote(head) do ... end

This expression will be invoked in the caller's context (the client module), and its task is to generate the function. As mentioned in comments, it's important to understand that unquote(head) here references the head variable that exists in the caller's context. We're not injecting a variable from the macro context, but the one that exists in the caller's context.

However, we can't generate such expression with plain quote:

1
2
3
quote do
  def unquote(head) do ... end
end

Remember how unquote works. It injects the AST that is in the head variable in place of the unquote call. This is not what we want here. What we want is to generate the AST representing the call to unquote which will then be executed later, in the caller's context, and reference the caller's head variable.

This can be done by providing unquote: false option:

1
2
3
quote unquote: false do
  def unquote(head) do ... end
end

Here, we will generate the code that represents unquote call. If this code is injected in proper place, where variable head exists, we'll end up calling the def macro, passing whatever is in the head variable.

So it seems that unquote: false is what we need, but there is a downside that we can't access any variable from the macro context:

1
2
3
4
foo = :bar
quote unquote: false do
  unquote(foo)    # <- won't work because of unquote: false
end

Using unquote: false effectively blocks immediate AST injection, and treats unquote as any other function call. Consequently, we can't inject something into the target AST. And here's where bind_quoted comes in handy. By providing bind_quoted: bindings we can disable immediate unquoting, while still binding whatever data we want to transfer to the caller's context:

1
2
3
4
5
6
7
8
9
quote bind_quoted: [
  foo: ...,
  bar: ...
] do
  unquote(whatever)  # <- works like with unquote: false

  foo  # <- accessible due to bind_quoted
  bar  # <- accessible due to bind_quoted
end

Injecting the code vs transferring data

Another problem we're facing is that the contents we're passing from the macro to the caller's context is by default injected, rather then transferred. So, whenever you do unquote(some_ast), you're injecting one AST fragment into another one you're building with a quote expression.

Occasionally, we want to _transfer_ the data, instead of injecting it. Let's see an example. Say we have some triplet, we want to transfer to the caller's context

1
2
iex(1)> data = {1, 2, 3}
{1, 2, 3}

Now, let's try to transfer it using typical unquote:

1
2
iex(2)> ast = quote do IO.inspect(unquote(data)) end
{{:., [], [{:__aliases__, [alias: false], [:IO]}, :inspect]}, [], [{1, 2, 3}]}

This seems to work. Let's try and eval the resulting ast:

1
2
iex(3)> Code.eval_quoted(ast)
** (CompileError) nofile: invalid quoted expression: {1, 2, 3}

So what happened here? The thing is that we didn't really transfer our {1,2,3} triplet. Instead, we injected it into the target AST. Injection means, that {1,2,3} is itself treated as an AST fragment, which is obviously wrong.

What we really want in this case is data transfer. In the code generation context, we have some data that we want to transfer to the caller's context. And this is where Macro.escape helps. By escaping a term, we can make sure that it is transferred rather than injected. When we call unquote(Macro.escape(term)), we'll inject an AST that describes the data in term.

Let's try this out:

1
2
3
4
5
6
iex(3)> ast = quote do IO.inspect(unquote(Macro.escape(data))) end
{{:., [], [{:__aliases__, [alias: false], [:IO]}, :inspect]}, [],
 [{:{}, [], [1, 2, 3]}]}

iex(4)> Code.eval_quoted(ast)
{1, 2, 3}

As you can see, we were able to transfer the data untouched.

Going back to our deferred code generation, this is exactly what we need. Instead of injecting into the target AST, we want to transfer the input AST, completely preserving its shape:

1
2
3
4
5
6
7
defmacro deftraceable(head, body) do
  # Here we have head and body AST
  quote do
    # We need that same head and body AST here, so we can generate
    # the final code.
  end
end

By using Macro.escape/1 we can ensure that input AST is transferred untouched back to the caller's context where we'll generate the final code.

As discussed in previous section, we're using bind_quoted, but the same principle holds:

1
2
3
4
5
6
7
quote bind_quoted: [
  head: Macro.escape(head, unquote: true),
  body: Macro.escape(body, unquote: true)
] do
  # Here we have exact data copies of head and body from
  # the macro context.
end


Escaping and unquote: true

Notice a deceptively simple unquote: true option that we pass to Macro.escape. This is the hardest thing to explain here. To be able to understand it, you must be confident about how AST is passed to the macro, and returned back to the caller's context.

First, remember how we call our macro:

1
deftraceable unquote(action)(unquote(state)) do ... end

Now, since macro actually receives its arguments quoted, the head argument will be equivalent to following:

1
2
3
4
# This is what the head argument in the macro context actually contains
quote unquote: false do
  unquote(action)(unquote(state))
end

Remember that Macro.escape preserves data, so when you transfer a variable in some other AST, the contents remains unchanged. Given the shape of the head above, this is the situation we'll end up with after our macro is expanded:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# Caller's context
for {state, {action, next_state}} <- fsm do
  # Here is our code that generates function. Due to bind_quoted, here
  # we have head and body variables available.

  # Variable head is equivalent to
  #   quote unquote: false do
  #     unquote(action)(unquote(state))
  #   end

  # What we really need is for head to be equivalent to:
  #   quote do
  #     unquote(action)(unquote(state))
  #   end
end

Why do we need the second form of quoted head? Because this AST is now shaped in the caller's context, where we have action and state variables available. And the second expression will use the contents of these variables.

And this is where unquote: true option helps. When we call Macro.escape(input_ast, unquote: true), we'll still (mostly) preserve the shape of the transferred data, but the unquote fragments (e.g. unquote(action)) in the input AST will be resolved in the caller's context.

So to recap, a proper transport of the input AST to the caller's context looks like this:

defmacro deftraceable(head, body) do
  quote bind_quoted: [
    head: Macro.escape(head, unquote: true),
    body: Macro.escape(body, unquote: true)
  ] do
    # Generate the code here
  end
  ...
end

This wasn't so hard, but it takes some time grokking what exactly happens here. Try to make sure you're not just blindly doing escapes (and/or unquote: true) without understanding that this is what you really want. After all, there's a reason this is not a default behavior.

When writing a macro, think about whether you want to inject some AST, or transport the data unchanged. In the latter case, you need Macro.escape. If the data being transferred is an AST that might contain unquote fragments, then you probably need to use Macro.escape with unquote: true.

Recap

This concludes the series on Elixir macros. I hope you found these articles interesting and educating, and that you have gained more confidence and understanding of how macros work.

Always remember - macros amount to plain composition of AST fragments during expansion phase. If you understand the caller's context and macro inputs, it shouldn't be very hard to perform the transformations you want either directly, or by deferring when necessary.

This series has by no means covered all possible aspects and nuances. If you want to learn more, a good place to start is the documentation for quote/2 special form. You'll also find some useful helpers in the Macro and Code module.

Happy meta-programming!