Hi, I am Saša Jurić, a software developer with 10+ years of professional experience in programming of web and desktop applications using Elixir, Erlang, Ruby, JavaScript, C# and C++. I'm also the author of the upcoming Elixir in Action book. In this blog you can read about Erlang and other programming related topics. You can subscribe to the feed, follow me on Twitter or fork me on GitHub.

Actors in Erlang/Elixir

| Comment on this post
Updated Apr. 06, 2013. - ExActor library note

Introduction

The topic of today's post is introduction to actor model in Erlang. Actors are a more formal way of looking at the Erlang concurrency, and probably the main reason why doing concurrency in Erlang is very easy, even when running many parallel units of execution.

Actors are a fairly large topic, and to present them in code, I would have to provide some introduction of Erlang syntax which is very different than most modern mainstream OO languages. To avoid dealing with the "weirdness" of the Erlang language, I will instead use the Elixir language, which is kind of a more elegant/concise wrapper around Erlang. This will make the presented code extremely simple, so you should be able to follow it even without previous knowledge of Elixir or Erlang.

Elixir is a very young language, built on top of Erlang platform. I regard it as a sort of Ruby flavored version of Erlang. The language is very flexible and hides away much of the unnecessary noise which often occurs in a typical Erlang source. At the same time, it is semantically aligned to Erlang and the underlying principles map 1:1 to the corresponding Erlang representation. After all, Elixir code is compiled to Erlang byte code, and can normally run in Erlang VM, as well as cooperate with other Erlang code.

The code in Elixir will allow us to examine actors from a somewhat higher level, hiding away the mechanical details and tedium of the Erlang language. However, if you find the topic interesting, and plan to investigate it deeper, or possibly even use it in production, I suggest you first read a book on Erlang, thoroughly understand the low level workings, write some pure Erlang code, and only then possibly move to Elixir.

(Im)mutability

As a consequence of being functional languages, both Erlang and Elixir use immutable variables. Once assigned, they can't be modified. Of course, almost every program, more complex than hello world, will have to deal with a state which changes based on some external interactions (user input, tcp/http requests, ...).

The primary (though not the only) way of maintaining a mutable state in Erlang is to run a separate Erlang process. We can start a process (not only) by running a spawn command:

pid = spawn(
  # ...
)

In between parentheses will be a reference to a function, or a body of an inline anonymous function, which will run concurrently. The return value is the id of the created Erlang process, often called pid. We can use that value to send messages to that process:

pid <- message

Messages are arbitrary Elixir/Erlang terms, whatever you can put in a variable (e.g. list, structures, ...), and sending a message means that its value is placed in the mailbox of the receiving process, after which the sender goes on executing its own code. The receiver can obtain the next message by calling the receive statement. Messages are processed in the order they are placed in the mailbox (although this behavior can be altered in code).

When we want to maintain a continuous mutable state, we have to run an endless recursion in a separate Erlang process. The code outline looks like this:

1
2
3
4
5
def loop(state) do
  message = receive
  new_state = f(state, message)
  loop(new_state)
end

The process enters the loop function with the current state, which is an arbitrary Elixir/Erlang term. It then waits for a message from some other process, and, upon receiving it, computes the new state, depending on the message content and the current state. Finally, the loop function is called recursively, effectively setting the new state in place of the old one. The next message will operate on the new state.

In Elixir/Erlang, such recursion will not cause a stack overflow, since both languages have special handling of the so called "tail calls" which will, on a byte code level, be transformed to a jump/goto instructions. Consequently, this code simply runs an endless loop.

Once we have such process running, and hold its pid, we can interact with it via messages. For example:

1
2
3
4
5
6
# async send and pray
pid <- {:set, :value, "123"}

# sync call and get response
pid <- {:get, :value}
response = receive # the receiver must send us the response

The {...} is a tuple which in Elixir/Erlang is a sort of a weak type struct. The :something represents an atom, similar to Ruby symbol, kind of a named constant.

Of course, for this code to work, handling of such messages must be implemented in the receiving process. In the previous snippet, that would be the implementation of the f(current_state, message).

To summarize: our process runs concurrently, it encapsulates a state, and we can send that process messages to modify the state, or to retrieve it. We call such process an actor.

The examples

The principles above outline the workflow of an actor on the lowest level. To do something useful with it, a fair amount of code is required. It gets even more complex if you want to do production level code. Erlang/OTP address this issue by offering an abstraction called gen_server (a generic server process), which abstracts typical message passing patterns, but adds more boilerplate.

Elixir simplifies the use of gen_server, and there is an additional wrapper called genx which removes most of the duplication. On top of this, I have utilized Elixir's extensibility and built additional abstractions which allows me to write very dense, OO-ish like code, and hide away most of the mechanical details.

The abstractions I wrote are quick hacks, made specifically for the purposes of this blog. I don't advise you to use them in production.
Update (06. April, 2013.): I have since modified the library and use it in production for some time.

The presented code will be deceptively simple, which will help us to observe actors from a somewhat higher level. However, I'd like to point out that the underlying implementation relies on the mentioned gen_server, which is in turn powered by the endless recursion and message passing mechanism presented earlier. The recursion will therefore not be coded explicitly, but under the hood it will still execute, approximately as described.

The complete code of the examples, together with build/start instructions can be found here.

Simple calculator

The first example is a simple calculator actor which supports increment/decrement operations. Let's see how we can use it:

1
2
3
4
5
calculator = Calculator.actor_start(0)
calculator.inc(10)
calculator.dec(5)
result = calculator.get
IO.puts result

The code is not spectacular, but it illustrates the simplicity of use. The actor is created with an initial value of 0. Then I add the value of 10, and subtract a value of 5. Finally, I retrieve the result and print it. Since calculator is an actor it works concurrently. Specifically, inc/dec operations are asynchronous, while get is obviously synchronous, since it has to return the result.

In the first line, an actor is created. Under the hood, the function start will spawn a process, sending it value 0 as an argument. The actor will use that value as its initial state, which internally means, we will enter the infinite recursion with the value of 0.

Now we can use the calculator variable to do something with an actor, for example invoke increment/decrement operations. Behind the scene, these functions will send asynchronous messages {:inc, 10} or {:dec, 5} to the calculator process without waiting for the response (actually these messages will be decorated by gen_server with a bit more contents).

In the fourth line, the actor's state is retrieved by calling synchronous get operation. Internally, this function sends a get message to the actor, and waits for it to respond.

As I already mentioned, actors normally process messages in the order they are received. Therefore the get message will be processed after inc and dec, although they were issued asynchronously. Consequently, in the final line, we will print the result of 5 (0 + 10 - 5).

The actor's implementation looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
defmodule Calculator do
  use ExActor

  defcast inc(x), state: value, do
    new_state(value + x)
  end
  
  defcast dec(x), state: value do 
    new_state(value - x)
  end
  
  defcall get, state: value do
    value
  end
end

First, a module is defined, which is simply a collection of functions. In the line 2, there is a use construction which includes additional functions and macros to support easier actor definition and manipulation. For example, defcast and defcall are macros defined in ExActor.

In the line 4, the inc operation is defined. The defcast means that an operation will be defined as a cast which in Erlang terminology means it will be asynchronous. When a client calls inc, a message will be sent, and the caller will continue its execution immediately.

The state: value means that, in the function's body, we will refer to the actor's state (the argument of an infinite recursive function) using the name value.

Between do/end is the implementation of our operation. This is the code that will run in the actor process. More specifically, the code will handle the {:inc, x} messages. In the message handler we have to compute the new value of our state and return it in the special form required by gen_server. The gen_server will reenter the infinite recursion with the new value, which, as explained earlier, changes the state of the actor.
Elixir/Erlang use implicit returns: the result of the function's last statement is its return value. The new_state function forms the response so it can satisfy gen_server requirements.

The operation dec is implemented in the same way as inc, while get has some differences. Since it must return a value to the caller, it is defined via defcall, which means it will be a synchronous operation (aka call). Here the return value is treated differently: it is sent back to the caller, while the state is not modified, which means that recursion will be reentered with the same value.


Call/cast responses

To recap, our cast operation can return new_state(something) which will set the new state. If we return a "naked" value (i.e. without wrapping it with new_state) it will be ignored, and current state will be used. This is usually helpful when actor has to interact with external entities (other actors, files, networks, databases) without changing its state.

For call, the return value is sent to caller without state modification. If we want to modify the state as well, we can use reply(value, new_state) which would mean that we respond to the caller with a value and at the same time change the state to new_state.

Finally, there are some other special forms of responses. For example, one form of response can be used to stop the actor. When gen_server receives such response, it simply does not reenter recursion, and the process will consequently finish. 


Actors vs objects

Actors and objects have some properties in common: both encapsulate state while hiding its details from their clients. The clients can manipulate the state via messages (actors) or methods (objects).

Coming from OO, it helps me to think of actors as objects. More specifically I reason about connections between actors i.e. how they interact together. The composition of actors resembles that of objects. You can make global actors, singletons, or "local" ones which are known only to a limited number of actors. However, be aware of the fact that actors are not garbage collectible. They are long running processes which will not be terminated if no "reference" to them (their pid) exists.

Unlike objects, actors are inherently concurrent, and can run in parallel. They are also completely independent and have no data in common. One actor cannot corrupt the state of another nor can a crash in one actor impact the other ones, unless explicitly specified by the programmer.

Consumer/producer

This slightly more complicated example involves two actors. The producer creates random integers and sends it to the consumer. The consumer simply prints the received values to the screen. Since both are actors, they operate concurrently.

The producer exposes one service: produce, which, when called, will produce a number and send it to the consumer. This is the code:

1
2
3
4
5
6
7
8
9
10
defmodule Producer do
  use  ExActor

  defcast produce, state: consumer do
    :timer.sleep(100)
    value = :random.uniform(100)
    consumer.consume(value)
    IO.puts "produced #{value}"
  end
end

The producer sleeps for some time, which is a simulation of a long running operation, then it creates a number and passes it to the consumer.
Notice that the produce operation doesn't end with a new_state(...) statement. This means that the actor's state will not be changed by the operation. That's ok, because producer's state is a pid of the consumer, and we don't want to change that when performing the produce operation.
The weird :timer.sleep and :random.uniform calls are Elixir's way of calling Erlang functions and they demonstrate how we can easily interoperate with Erlang libraries.

This is how the consumer looks like:

1
2
3
4
5
6
7
8
defmodule Consumer do
  use  ExActor

  defcast consume(value) do
    :timer.sleep(200)
    IO.puts "                consumed #{value}"
  end
end

The consumer's code is even simpler: it sleeps for some time, and then prints the received value to the screen without using the state at all (hence, no state: identifier part in the consume definition).

Notice that both produce and consume operations are defined as casts. This means that the calling processes will continue with the execution immediately after invoking them.

This is the usage example:

1
2
3
4
5
6
7
consumer = Consumer.actor_start
producer = Producer.actor_start(consumer)

times(5, fn(_) -> producer.produce end)

IO.puts "main process finished\n"
:timer.sleep(2000)

Nothing fancy here: we create both actors, connect them, and invoke the produce operation five times. This is the output:

main process finished

produced 45
produced 73
                consumed 45
produced 95
produced 51
                consumed 73
produced 32
                consumed 95
                consumed 51
                consumed 32

The output illustrates the asynchronous nature of cast operations. The "main" process has finished immediately, and we can also see how producer was generating values faster than the consumer was able to handle them (since producer sleeps for 100ms, and the consumer for 200ms). Finally, comparing the produced and consumed values, we can confirm that the messages are processed in the order received.

Chat backend

To provide a more complex example, I made a small sketch of how a basic chat server would be implemented. Since this article is already getting long, I'll only briefly outline the concepts. The full code can be found in the github repository together with the previous two examples.

In a typical Erlang based chat server, each chatroom and each user are represented with an actor. In the most basic implementation, a chatroom actor's state will be the list of its users while a user's state will be the pid (reference) of the chatroom he is currently in. When a user wants to post a message to the room, its process will send the message to the chatroom process, which will in turn loop through all of its users (except the sender) and send them the message. The user process will then transport the message to the physical user via network. To keep the example simple, I didn't implement the networking interface, but have instead simply printed the message to the screen.

In such architecture, everything runs concurrently (since chatrooms and users are actors), and yet the code is fairly simple and straightforward, not burdened with intricacies of locking and synchronizing, typically found in conventional multithreading approaches.

The concurrent property of the system means that we can use available CPU resources, and easily scale up by adding more processor power to address higher load of the system. Obviously, the scalability will depend on interactions between actors. If, for example, many actors are synchronously calling one specific actors, then that one actor can be a potential bottleneck. However, such situation is now easier to identify, since we can analyze dependencies between our actors, discover bottlenecks, and work on resolving them.

Recap

This article presented a lot of concepts. Most important to remember is that an actor is an Erlang process which encapsulates some state. The clients can communicate with it by sending messages to it, providing they have its pid (process id). There are two forms of communication: cast (asynchronous), and call (synchronous: client sends a message, receiver sends the response back).

The concept of actors allows us to create neatly designed solutions for complex concurrent problems, and also to look at our system from a higher perspective, analyzing the dependencies between actors, or studying each one in isolation.

Post a Comment