The Erlangelist

(not only) Erlang related musings
Hi, I'm Saša Jurić, a software developer with many years of professional experience in programming of web and desktop applications using various languages, such as Elixir, Erlang, Ruby, JavaScript, C# and C++. I'm also the author of the Elixir in Action book. In this blog you can read about Elixir, Erlang, and other programming related topics. You can subscribe to the feed, follow me on Twitter or fork me on GitHub.

Understanding Elixir Macros, Part 2 - Micro Theory

2014-06-11

This is the second part of the mini-series on Elixir macros. Last time I discussed compilation phases and Elixir AST, finishing with a basic example of the trace macro. Today, I’ll provide a bit more details on macro mechanics.

This is going to involve repeating some of the stuff mentioned last time, but I think it’s beneficial to understand how things work and how the final AST is built. If you grasp this, you can reason about your macro code with more confidence. This becomes important, since more involved macros will consist of many combined quote/unquote constructs which can at first seem intimidating.

Calling a macro

The most important thing to be aware of is the expansion phase. This is where compiler calls various macros (and other code-generating constructs) to produce the final AST.

For example, a typical usage of the trace macro will look like this:

defmodule MyModule do
  require Tracer
  ...
  def some_fun(...) do
    Tracer.trace(...)
  end
end

As previously explained, the compiler starts with an AST that resembles this code. This AST is then expanded to produce the final code. Consequently, in the snippet above, the call to Tracer.trace/1 will take place in the expansion phase.

Our macro receives the input AST and must produce the output AST. The compiler will then simply replace the macro call with the AST returned from that macro. This process is incremental - a macro can return AST that will invoke some other macro (or even itself). The compiler will simply re-expand until there’s nothing left to expand.

A macro call is thus our opportunity to change the meaning of the code. A typical macro will take the input AST and somehow decorate it, adding some additional code around the input.

That’s exactly what we did in the trace macro. We took a quoted expression (e.g. 1+2) and spit out something like:

result = 1 + 2
Tracer.print("1 + 2", result)
result

To call the trace macro from any part of the code (including shell), you must invoke either require Tracer or import Tracer. Why is this? There are two seemingly contradicting properties of macros:

  • A macro is an Elixir code
  • A macro runs in expansion time, before the final bytecode is produced

How can Elixir code run before it is produced? It can’t. To call a macro, the container module (the module where the macro is defined) must already be compiled.

Consequently, to run macros defined in the Tracer module, we must ensure that it is already compiled. In other words, we must provide some hints to the compiler about the module ordering. When we require a module, we instruct the Elixir to hold the compilation of the current module until the required module is compiled and loaded into the compiler run-time (the Erlang VM instance where compiler is running). We can only call trace macro when the Tracer module is fully compiled, and available to the compiler.

Using import has the same effect but it additionally lexically imports all exported functions and macros, making it possible to write trace instead of Tracer.trace.

Since macros are functions and Elixir doesn’t require parentheses in function calls, we can use this syntax:

Tracer.trace 1+2

This is quite possibly the most important reason why Elixir doesn’t require parentheses in function calls. Remember that most language constructs are actually macros. If parentheses were obligatory, the code we’d have to write would be noisier:

defmodule(MyModule, do:
  def(function_1, do: ...)
  def(function_2, do: ...)
)

Hygiene

As hinted in the last article, macros are by default hygienic. This means that variables introduced by a macro are its own private affair that won’t interfere with the rest of the code. This is why we can safely introduce the result variable in our trace macro:

quote do
  result = unquote(expression_ast)  # result is private to this macro
  ...
end

This variable won’t interfere with the code that is calling the macro. In place where you call the trace macro, you can freely declare your own result variable, and it won’t be shadowed by the result from the tracer macro.

Most of the time hygiene is exactly what you want, but there are exceptions. Sometimes, you may need to create a variable that is available to the code calling the macro. Instead of devising some contrived example, let’s take a look at the real use case from the Plug library. This is how we can specify routes with Plug router:

get "/resource1" do
  send_resp(conn, 200, ...)
end

post "/resource2" do
  send_resp(conn, 200, ...)
end

Notice how in both snippets we use conn variable that doesn’t exist. This is possible because get macro binds this variable in the generated code. You can imagine that the resulting code is something like:

defp do_match("GET", "/resource1", conn) do
  ...
end

defp do_match("POST", "/resource2", conn) do
  ...
end

Note: the real code produced by Plug is somewhat different, this is just a simplification.

This is an example of a macro introducing a variable that must not be hygienic. The variable conn is introduced by the get macro, but must be visible to the code where the macro is called.

Another example is the situation I had with ExActor. Take a look a the following example:

defmodule MyServer do
  ...
  defcall my_request(...), do: reply(result)
  ...
end

If you’re familiar with GenServer then you know that the result of a call must be in form {:reply, response, state}. However, in the snippet above, the state is not even mentioned. So how can we return the non-mentioned state? This is possible, because defcall macro generates a hidden state variable, which is then implicitly used by the reply macro.

In both cases, a macro must create a variable that is not hygienic and must be visible beyond macro’s quoted code. For such purposes you can use var! construct. Here’s how a simple version of the Plug’s get macro could look like:

defmacro get(route, body) do
  quote do
    defp do_match("GET", unquote(route), var!(conn)) do
      # put body AST here
    end
  end
end

Notice how we use var!(conn). By doing this, we’re specifying that conn is a variable that must be visible to the caller.

In the snippet above, it’s not explained how the body is injected. Before doing so, you must understand a bit about arguments that macros receive.

Macro arguments

You should always keep in mind that macros are essentially Elixir functions that are invoked in expansion phase, while the final AST is being produced. The specifics of macros is that arguments being passed are always quoted. This is why we can call:

def my_fun do
  ...
end

Which is the same as:

def(my_fun, do: (...))

Notice how we’re calling the def macro, passing my_fun even when this variable doesn’t exist. This is completely fine, since we’re actually passing the result of quote(do: my_fun), and quoting doesn’t require that the variable exists. Internally, def macro will receive the quoted representation which will, among other things, contain :my_fun. The def macro will use this information to generate the function with the corresponding name.

Another thing I sort of skimmed over is the do…end block. Whenever you pass a do…end block to a macro, it is the same as passing a keywords list with a :do key.

So the call

my_macro arg1, arg2 do ... end

is the same as

my_macro(arg1, arg2, do: ...)

This is just a special syntactical sugar of Elixir. The parser transforms do..end into {:do, …}.

Now, I’ve just mentioned that arguments are quoted. However, for many constants (atoms, numbers, strings), the quoted representation is exactly the same as the input value. In addition, two element tuples and lists will retain their structure when quoted. This means that quote(do: {a,b}) will give a two element tuple, with both values being of course quoted.

Let’s illustrate this in a shell:

iex(1)> quote do :an_atom end
:an_atom

iex(2)> quote do "a string" end
"a string"

iex(3)> quote do 3.14 end
3.14

iex(4)> quote do {1,2} end
{1, 2}

iex(5)> quote do [1,2,3,4,5] end
[1, 2, 3, 4, 5]

In contrast, a quoted three element tuple doesn’t retain its shape:

iex(6)> quote do {1,2,3} end
{:{}, [], [1, 2, 3]}

Since lists and two element tuples retain their structure when quoted, the same holds for a keyword list:

iex(7)> quote do [a: 1, b: 2] end
[a: 1, b: 2]

iex(8)> quote do [a: x, b: y] end
[a: {:x, [], Elixir}, b: {:y, [], Elixir}]

In the first example, you can see that the input keyword list is completely intact. The second example proves that complex members (such as references to x and y) are quoted. But the list still retains its shape. It is still a keyword lists with keys :a and :b.

Putting it together

Why is all this important? Because in the macro code, you can easily retrieve the options from the keywords list, without analyzing some convoluted AST. Let’s see this in action on our oversimplified take on get macro. Earlier, we left with this sketch:

defmacro get(route, body) do
  quote do
    defp do_match("GET", unquote(route), var!(conn)) do
      # put body AST here
    end
  end
end

Remember that do…end is the same as do: … so when we call get route do … end, we’re effectively calling get(route, do: …). Keeping in mind that macro arguments are quoted, but also knowing that quoted keyword lists keep their shape, it’s possible to retrieve the quoted body in the macro using body[:do]:

defmacro get(route, body) do
  quote do
    defp do_match("GET", unquote(route), var!(conn)) do
      unquote(body[:do])
    end
  end
end

So we simply inject the quoted input body into the body of the do_match clause we’re generating.

As already mentioned, this is the purpose of a macro. It receives some AST fragments, and combines them together with the boilerplate code, to generate the final result. Ideally, when we do this, we don’t care about the contents of the input AST. In our example, we simply inject the body in the generated function, without caring what is actually in that body.

It is reasonably simple to test that this macro works. Here’s a bare minimum of the required code:

defmodule Plug.Router do
  # get macro removes the boilerplate from the client and ensures that
  # generated code conforms to some standard required by the generic logic
  defmacro get(route, body) do
    quote do
      defp do_match("GET", unquote(route), var!(conn)) do
        unquote(body[:do])
      end
    end
  end
end

Now we can implement a client module:

defmodule MyRouter do
  import Plug.Router

  # Generic code that relies on the multi-clause dispatch
  def match(type, route) do
    do_match(type, route, :dummy_connection)
  end

  # Using macro to minimize boilerplate
  get "/hello", do: {conn, "Hi!"}
  get "/goodbye", do: {conn, "Bye!"}
end

And test it:

MyRouter.match("GET", "/hello") |> IO.inspect
# {:dummy_connection, "Hi!"}

MyRouter.match("GET", "/goodbye") |> IO.inspect
# {:dummy_connection, "Bye!"}

The important thing to notice here is the code of match/2. This is the generic code that relies on the existence of the implementation of do_match/3.

Using modules

Looking at the code above, you can see that the glue code of match/2 is developed in the client module. That’s definitely far from perfect, since each client must provide correct implementation of this function, and be aware of how do_match function must be invoked.

It would be better if Plug.Router abstraction could provide this implementation for us. For that purpose we can reach for the use macro, a rough equivalent of mixins in other languages.

The general idea is as follows:

defmodule ClientCode do
  # invokes the mixin
  use GenericCode, option_1: value_1, option_2: value_2, ...
end

defmodule GenericCode do
  # called when the module is used
  defmacro __using__(options) do
    # generates an AST that will be inserted in place of the use
    quote do
      ...
    end
  end
end

So the use mechanism allows us to inject some piece of code into the caller’s context. This is just a replacement for something like:

defmodule ClientCode do
  require GenericCode
  GenericCode.__using__(...)
end

Which can be proven by looking in Elixir source code. This proves another point - that of incremental expansion. The use macro generates the code which will call another macro. Or to put it more fancy, use generates a code that generates a code. As mentioned earlier, the compiler will simply reexpand this until there’s nothing left to be expanded.

Armed with this knowledge, we can move the implementation of the match function to the generic Plug.Router module:

defmodule Plug.Router do
  defmacro __using__(_options) do
    quote do
      import Plug.Router

      def match(type, route) do
        do_match(type, route, :dummy_connection)
      end
    end
  end

  defmacro get(route, body) do
    ... # This code remains the same
  end
end

This now keeps the client code very lean:

defmodule MyRouter do
  use Plug.Router

  get "/hello", do: {conn, "Hi!"}
  get "/goodbye", do: {conn, "Bye!"}
end

As mentioned, the AST generated by the __using__ macro will simply be injected in place of the use Plug.Router call. Take special note how we do import Plug.Router from the __using__ macro. This is not strictly needed, but it allows the client to call get instead of Plug.Router.get.

So what have we gained? The various boilerplate is now confined to the single place (Plug.Router). Not only does this simplify the client code, it also keeps the abstraction properly closed. The module Plug.Router ensures that whatever is generated by get macros fits properly with the generic code of match. As clients, we simply use the module and call into the provided macros to assemble our router.

This concludes today’s session. Many details are not covered, but hopefully you have a better understanding of how macros integrate with the Elixir compiler. In the next part I’ll dive deeper and start exploring how we can tear apart the input AST.

Comments

Privacy policy