Hi, I am Saša Jurić, a software developer with 10+ years of professional experience in programming of web and desktop applications using Erlang, Ruby, C# and C++. In this blog you can read about Erlang and other programming related topics. You can subscribe to the feed, follow me on Twitter or fork me on GitHub.

Why Elixir?

| Comment on this post
It's been about a year since I've started using Elixir. Originally, I intended to use the language only for blogging purposes, thinking it could help me better illustrate benefits of Erlang Virtual Machine (EVM). However, I was immediately fascinated with what the language brings to the table, and very quickly introduced it to the Erlang based production system I have been developing at the time. Today, I consider Elixir as a better alternative for the development of EVM powered systems, and in this posts I'll try to highlight some of its benefits, and also dispell some misconceptions about it.

The problems of Erlang the language

EVM has many benefits that makes it easier to build highly-available, scalable, fault-tolerant, distributed systems. There are various testimonials on the Internet, and I've blogged a bit about some advantages of Erlang here and here. In addition, benefits of both Erlang and Elixir are presented in chapter 1 of my upcoming book Elixir in Action.

Long story short, Erlang provides excellent abstractions for managing highly-scalable, fault-tolerant systems, which is particularly useful in concurrent systems, where many independent or loosely-dependent tasks are constantly running.

I've been using Erlang in production for more than three years, to build a long-polling based HTTP push server that in peak time serves over 2000 reqs/sec (non-cached). Never before have I written anything of this scale nor have I ever developed something this stable. The service just runs happily, without me thinking about it. This was actually my first Erlang code, initially burdened with anti-patterns, and bad approaches. And still, EVM proved to be very resilient, and the server ran efficiently from the very beginning. Most importantly, it was fairly straightforward for me to work on such complex problem, in large part owing to Erlang concurrency mechanism.

However, despite some great properties, I never was (and I'm still not) quite comfortable programming in Erlang. The coding experience somehow never felt very fluent, and the resulting code was always burdened with excessive boilerplate and duplication. The problem was not the language syntax. I did a little Prolog back in my student days, and I liked the language a lot. By extension, I also like Erlang syntax, and actually think it is in many ways nicer and more elegant than Elixir. And this is coming from an OO developer who spent most of his coding life in languages such as Ruby, JavaScript, C# and C++.

The problem I have with Erlang is that the language is somehow too simple, making it very hard to eliminate some frequent cases of boilerplate and structural duplication. Consequently, the resulting code gets a bit messy, being harder to write, analyze, and modify. After coding in Erlang for some time, I thought that functional programming is inferior to OO, when it comes to efficient code organization.

What Elixir is (not)

This is where Elixir changed my opinion. After I've spent enough time with the language, I was finally able to see benefits and elegance of functional programming more clearly. Now I can't say anymore that I prefer OO to FP. I find the coding experience in Elixir much more pleasant, and I'm able to concentrate on the problem I'm solving, instead of dealing with the language's shortcomings.

Before discussing some benefits of Elixir, there is an important thing I'd like to stress: Elixir is not Ruby for Erlang. It is also not CoffeeScript, Clojure, C++ or something else for Erlang. Relationship between Elixir and Erlang is unique, with Elixir being often semantically very close to Erlang, but extending the original concept further by borrowing approaches from various other languages. The end result may on surface look like Ruby, but I find it much more closer to Erlang, with both languages completely sharing the type system, and taking the same functional route.

So what is Elixir? To me, it is an Erlang-esque language with improved code organization capabilities. This definition differs from what you'll see on the official page, but I think it captures the essence of Elixir, when compared to Erlang.

Ability to organize the code and to flush out duplication and boilerplate is in my opinion an important role of every programming language. If developers have to constantly duplicate some code structure (such as this one), without being able to eliminate this duplication reasonably easy, then the language doesn't succeed when it comes to code organization. It is in this department where Elixir improves on Erlang, by providing additional tools to organize the code, making developers more efficient in writing production-ready, maintainable code.

Ingredients

Much has been said about Elixir on the Internet, but I especially like two articles from Devin Torres which you can find here and here. Devin is an experienced Erlang developer, who among other things wrote a popular poolboy library, so it's worth reading what he thinks about Elixir.

Here, I'll try to focus on some of the most important tools that help us organize the code efficiently.

Metaprogramming

Metaprogramming in Elixir comes in a couple of flavors, but the essence is the same. It allows us to devise elegant and concise constructs that seems as if they're integral part of the language. These constructs are in compile-time then transformed into a proper code.

On a more specific level, metaprogramming helps us remove structural duplication - a situation where two pieces of code share the same abstract pattern, but differ in various details. For example, a following snippet presents a sketch of a module that models a User record:

defmodule User do
  #initializer
  def new(data) do ... end

  # getters
  def name(user) do ... end
  def age(user) do ... end

  # setters
  def name(value, user) do ... end
  def age(value, user) do ... end
end

Some other type of record will follow this pattern, but with different fields. Instead of constantly repeating this pattern, we can use Elixir defrecord macro:

defrecord User, name: nil, age: 0

Based on the given definition, defrecord generates a dedicated module with utility functions for manipulating our User record. Thus, the common pattern is stated in a single place (the code of defrecord macro), while the particular code is relieved of implementation mechanics, remains concise and easy to understand.

Elixir macros are nothing like C/C++ macros. Instead of working on strings, they are something like compile-time Elixir functions that are called in the middle of parsing, and work on the abstract syntax tree (AST), which is a code represented as Elixir data structure. Macro can work on AST, and spit out some alternative AST that represents the generated code. Since macros are executed in compile-time, the performance of a running system will not be affected.

Owing to macros, most of Elixir, is actually implemented in Elixir, including constructs such as if, unless, or unit testing support. Unicode support works by reading UnicodeData.txt file, and generating the corresponding implementation of Unicode aware string function such as downcase or upcase. All of this makes it easier for developers to contribute to Elixir, since for most feature they don't need to be familiar with another language.

Macros also allow 3rd party library authors to provide internal DSLs that naturally fit in language. Ecto project, that provides embedded integrated queries, something like LINQ for Elixir, is my personal favorite that really showcases the power of macros.

I've seen people sometimes dismissing Elixir, stating they don't need metaprogramming capabilities. While extremely useful, metaprogramming can also become very dangerous tool, and it is advised to carefully consider its usage. That said, there are many features that are powered by metaprogramming, and even if you don't write macros yourself, you can still enjoy many of these features, such as aforementioned records, Unicode support, or integrated query language.

Pipeline operator

This seemingly simple operator is so useful, I actually "invented" its Erlang equivalent even before I was aware it exists in Elixir (or other languages for that matter).

Let me first describe the problem it solves. In Erlang, there is no pipeline operator, and furthermore we can't reassign variables. Therefore, typical Erlang code will often follow this pattern:

State1 = trans_1(State),
State2 = trans_2(State1),
State3 = trans_3(State2),
...

This is a very clumsy code that relies on intermediate variables, and correct passing of the last result to the next call. I actually had a nasty bug in production because in one place, I accidentally used State6 instead of State7.

Of course, we can go around this by inlining function calls:

trans_3(
  trans_2(
    trans_1(State)
  )
)

Notice how the code now must be read "inside-out", the most inner function being called first. This style, known as staircasing, can soon get ugly, and the problem is often aggravated when functions receive additional arguments, and the number of functions in the chain increases.

The pipeline operator makes it possible to combine various operations without using intermediate variables:

state
|> trans_1
|> trans_2
|> trans_3

The code can easily be followed by reading it from top to bottom. We pass the state through various transformations to get the desired result, each transformation returning some modified version of the state. This highlights one of the strengths of FP, where we regard functions as data transformations that are combined in various ways to achieve the desired result.

For example, the following code computes the sum of squares of all positive numbers of a list:

list 
|> Enum.filter(&(&1 > 0))       # take positive numbers
|> Enum.map(&(&1 * &1))         # square each one
|> Enum.reduce(0, &(&1 + &2))   # calculate sum

The pipeline operator works extremely well because the API in Elixir libraries follows the "subject (noun) as the first argument" convention. Unlike Erlang, Elixir takes the stance that all functions should accept the thing they operate on as the first argument. So String module functions accept string, while Enum module functions accept enumerable as the first argument.

Polymorphism via protocols

Protocols are the Elixir way of providing something roughly similar to OO interfaces. Initially, I wasn't much impressed with them, but as the time progressed, I began to understand the benefits they bring.

Protocols allow developers to create a generic logic that can be used with any type of data, assuming that some contract is implemented for the given data. An excellent example is the Enum module, that provides many useful functions for manipulating anything enumerable. For example, this is how we iterate an enumerable:

Enum.each(enumerable, fn -> ... end)

Enum.each works with different types such as lists, or key-value dictionaries, and of course we can add support for our own types by implementing corresponding protocol. This is resemblant of OO interfaces, with an additional twist that it's possible to implement a protocol for a type, even if you don't own the type's source code.

One of the best example of protocol usefulness is the Stream module, which implements a lazy, composable, enumerable abstraction. A stream makes it possible to compose various enumerable transformations, and then generate the result only when needed, by feeding the stream to some function from the Enum module. For example, here's the code that computes the sum of squares of all positive numbers of a list in a single pass:

1
2
3
4
list 
|> Stream.filter(&(&1 > 0))
|> Stream.map(&(&1 * &1))
|> Enum.reduce(0, &(&1 + &2))   # Entire iteration happens here in a single pass

In lines 2 and 3, operations are composed, but not yet executed. The result is a specification descriptor that implements an Enumerable protocol. Once we feed this descriptor to some Enum function (line 3), it starts producing values.

It's worth mentioning that laziness is in no special way supported by Elixir compiler. The general protocol support, and first-class functions are all it takes to implement lazy enumerables.

The mix tool

The final important piece of puzzle is the tool called mix that helps us create and build projects, and manage their dependencies. The tool does its job in a simple and easy way. With a single command line you can create an OTP application skeleton, that consists of only 7 files (this includes .gitignore and README.md). There's not much to talk about mix - it does its job well, and makes it easy to quickly dive into proper development of EVM powered systems.

Other goodies

The list doesn't stop here, and there are many other enhancements Elixir provides, such as support for variable rebinding, optional parentheses, implicit statement endings, nullability, short circuits operators, ...

In general, I like many of the decisions made in this department. It's nice to be able to write if without obligatory else. It's also nice that I don't have to consciously think which character to use for statement ending. Still, I don't find these enhancements to be of crucial importance. They are nice finishing touches, but if this was all Elixir had to offer, I'd probably still use pure Erlang.

Wrapping up

Much has been said in this article, and yet I feel that the magic of Elixir is far from being captured. The language preference is admittedly something subjective, but I feel that Elixir really improves on Erlang foundations. With more than three years of production-level coding in Erlang, and about a year of using Elixir, I simply find coding experience in Elixir to be much more pleasant. The resulting code seems more compact, and I can be more focused on the problem I'm solving, instead of wrestling with excessive noise and boilerplate.

And yet, we get to keep all the benefits of EVM. The underlying concurrency mechanisms makes it significantly easier to tackle complexity of a highly-loaded concurrent system that must constantly provide service and perform many simultaneous tasks.

Personally, I think that both Elixir and EVM raise the abstraction bar, and help me tackle complex problems with greater ease. This is why I would always put my money behind Elixir/EVM combination as the tools of choice for building any server-side system that is at least moderately complex. YMMV of course.

Announcing Elixir in Action

| 3 comments
It's been a while since I last blogged. There are multiple reasons, but the main one is that I'm writing a book called Elixir in Action. The book is already available for sale in the early access version. At this point there are three chapters (you can read the first one for free!), in the rough, unedited version. A 50% discount code eiaaunch50 is currently available, and it will be valid until January 18, 2014.

The aim of the book is to teach how Elixir, Erlang, and OTP can be used to build highly-available, scalable, fault-tolerant, distributed systems. In that sense, it is less about the language, and more about its practical usage, especially in server side systems.

The readers need not be familiar with Elixir or Erlang/OTP, but they should be otherwise experienced programmers, fluent in mainstream languages such as Ruby, Python, C#, Java, ... Knowing about functional programming is not a prerequisite, and the book aims to bring OO programmers up to speed with common functional idioms and techniques.

I came to Erlang after 10+ years of classical imperative OO programming. Being self-taught, and with lack of proper live mentoring from an expert, I made many mistakes trying to make Erlang programming more OO-ish. With this in mind, one of the aims of this book is to guide readers on how to properly use Elixir and Erlang building blocks to produce clean and efficient code.

Beyond the language, various parts of Erlang platform and OTP framework will be covered, such as concurrency, fault-tolerance, distributed systems, releasing and monitoring running systems.

Writing a book is a very time consuming task, and it took me a while to pick up a steady tempo, which is the main reason for the lack of recent posts on this blog. Now that I've got into a sort of writing routine, I plan to blog more regularly, so stay tuned for topics from Elixir and Erlang world!

WebCamp Zagreb, 2013

| Comment on this post
Recently, I participated at the WebCamp Zagreb conference (a bit of the atmosphere is captured on the conference Facebook page), so I'd like to share some of my impressions.

Disclaimer: although I was there in a role of a speaker, I was in no way associated with the conference organization. I know some of the members from the organization team, back from the student days, and I met some of the others last year, but other than that, I'm not personally involved, so this can be considered as a fairly unbiased review of an independent participant.

WebCamp is an interesting conference, quite possibly unique in Croatia, since it is not tied to a specific community. Instead, it is a joint venture of a couple of IT communities, such as Python, PHP, Microsoft and others (I didn't mentioned them all, but they're listed on the web site). As such, it has a nice advantage of being able to offer a variety of topics you can't usually find in one place. It's more about people's own experiences and less about marketing, preaching to the choir, and community self-praise.

And it shows in the contents. While most talks deal with mainstream technologies, they definitely cover broad range of topics, most of them being based on people's own professional experience. This diversity even makes it possible for people that use exotic technologies to talk about topics that are not often presented in the mainstream conferences/meetups, such as RabbitMq or a polyglot project powered (not only) by Haskell, Scala, and Go. And of course, I was lucky enough to get a chance to talk about Elixir/Erlang. Being one of the few (to my own knowledge the only one) professional Elixir/Erlang developer in Croatia, this is the only conference where I could get the chance to talk about it. So in that sense, I'm very grateful and also quite partial to the WebCamp conference. Without it, I'd never get a chance to present Elixir/Erlang to a broader audience from this region.

Another great thing is the duration of the talks. Abandoning the classical 45 minutes, the organizers decided to make the talks last 25 minutes (this includes the time for questions). Having been a speaker, both last and this year, I found it very hard to fit everything into that timeframe. However, as a listener, I think that shorter talk duration is great. It gives speakers just enough time to provide a higher level overview of the subject, without becoming too involved and too detailed. The whole feeling is dynamic, with talks changing at a fast pace. After all, there were 12 talks per track, which is really a lot.

The conference was free of charge, and this is in itself a fantastic achievement. Obviously, this wouldn't be possible without sponsors, and yet the selection of talks seems to be relieved of the usual sponsor influenced politics. The organizers really deserve a praise for pulling everything off. I can only imagine how much effort had to be invested to make it all happen.

According to the organizers, the turnout was more than 600 people. I can't estimate myself, but there was definitely a lot of people, and this made it possible to socialize with many developers from different companies. For me it was a great (and rare) chance to exchange experiences with other IT people. Socializing is one of the important aspects of conferences, and being able to go there for free, and hang out with 600+ professionals is a great benefit.

All things considered, I found WebCamp Zagreb to be a very pleasant experience. In Croatia, it's a unique conference regarding topics and format, and it attracts a very big and diverse crowd of IT professionals, making it one of the largest IT events in the country. I am definitely looking forward to come there next year!

Immutable programming, FP style

| Comment on this post
In the the previous post I presented the OO-like technique of manipulating complex hierarchy with immutables. Today we will see how to perform the same task using (almost) pure functional approach, which is a standard way of writing Elixir code.


From OO to FP

We have already gone a long way from typical OO programming, introducing pure immutable structures to manipulate complex data. One OO-ish thing that still remains is the polymorphic dispatch, such as company.add_employee(...) which is in runtime transformed to Company.add_employee(..., company).

In a way, we are already almost using functional approach, but it is hidden behind the polymorphic dispatch. The primary benefit of OO style is the elegant use of the dot (.) operator which allows us to chain multiple statements without using intermediate variables:

Company.new(name: "Initech").
  add_employee(Employee.new(name: "Peter Gibbons",  salary: 10000)).
  add_employee(Employee.new(name: "Michael Bolton", salary: 12000))

To mimic this in pure functional style, we can use the pipeline operator (|>) which feeds the result of the previous function to the next call as the first argument. So the call something |> fun1(a,b) |> fun2(c,d) is in compile time transformed into fun2(fun1(something, a, b), c, d). This makes it easy to chain function calls, much like in OO approach, but without the need for runtime dispatch.

Notice that the pipeline operator feeds the previous result as the first argument, while the dot operator feeds "this" as the last argument. This is the reason for incompatibility between OO and functional style in Elixir, which makes it harder to combine two approaches or to switch from one to another. Consequently, you should decide upfront which approach to use. If you wish to adhere to Elixir conventions and best practices, you should almost always opt for the functional approach.

One exception to this rule are record built-in functions, i.e. accessors/modifiers which are auto-generated when a record is defined. In this particular case, Elixir resorts to OO like syntax, which significantly simplifies record manipulation but it does not combine data with behavior. Polymorphic calls are used only to get and set the fields of a record. It is also worth noting that, if some hints are provided, Elixir compiler can actually resolve calls of standard records operations in compile time.


Manipulating complex data

Due to similarities between pipeline and OO chaining, it is fairly straightforward to convert OO code to the FP version. Here is the functional equivalent of the "Complex mutations" example from the previous article:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
defrecord Employee, id: nil, name: nil, salary: nil

defrecord Company, [
  name: nil, employees: HashDict.new, autoid: 1
] do

  def add_employee(company, employee) do
    company
    |> store_employee(employee.id(company.autoid))
    |> inc_autoid
  end
  
  defp store_employee(company, employee) do
    company.update_employees(HashDict.put(&1, employee.id, employee))
  end
  
  defp inc_autoid(company) do
    company.update_autoid(&1 + 1)
  end

  def get_employee(company, id), do: Dict.get(company.employees, id)
end

c = Company.new(name: "Initech")
    |> Company.add_employee(Employee.new(name: "Peter Gibbons",  salary: 10000))
    |> Company.add_employee(Employee.new(name: "Michael Bolton", salary: 12000))

IO.inspect c
IO.inspect Company.get_employee(c, 1)
IO.inspect Company.get_employee(c, 5)

This is very similar to the OO version presented the last time, so I will not discuss all of the details.

One subtle change is introduction of the function inc_autoid (line 17) which didn't exist in the OO version. The sole purpose of this function is to wrap the OO styled call company.update_autoid so we can use it in the chain (line 10). Unfortunately, standard record operations are currently not compatible with pipeline chaining (there has been some talk on the mailing list about tackling this issue).

Another subtle but important improvement is that internal functions store_employee and inc_autoid are now made private. This is possible since we are not relying on runtime OO dispatch mechanism.

The usage of the record has also changed. Whenever we call the Company module function from the outside, we have to explicitly reference the module. In this example, repeated calls are issued to Company.some_fun in lines 24-26 and 29-30. This is the obvious consequence of not using polymorphic dispatch: the code will be a bit polluted with duplicated module references. While I do regard this as a downside (many functional programmers would probably disagree with me), it is not as huge problem as it might initially seem. Usually, a chain of multiple calls of functions from a single module, could be moved to that module as a distinct function. Once inside the module, we can omit the module prefix when calling functions, just like it is done in lines 9-10.

By abandoning OO styled syntax, we also lose polymorphic nature of our code: the function which gets invoked is now determined in compile time. When you do need a polymorphism, i.e. a run-time decision of which function to call, there are two standard ways of doing it. This is really worth its own post, so I will only briefly mention it without providing detailed explanation.

The basic Erlang way of doing run time polymorphism is to use multi-clause functions and pattern matching. Each clause represents a specific implementation which will be invoked depending on the actual values of arguments. The problem with this technique is that it is not flexible. If a new type must be supported, definition of the polymorphic function must be extended with an appropriate clause. This means that all variants must reside in one module. Furthermore, the problem becomes harder to solve if for some reason you can't modify the module. In such situations, you must add another layer of indirection (essentially another multi-clause function) in front of it, which is somewhat clumsy.

To overcome this, Elixir introduces protocols, a construct somewhat similar to OO interfaces. A protocol is essentially an abstract module, where functions are declared, but not implemented. The generic piece of code uses the protocol, while clients provide implementations for their own types. By implementing the protocol, you get the benefit of reusing the generic code, without modifying it. However, comparing to OO interfaces, protocols are again based on pure functions and do not rely on data + behavior coupling. The main benefit of protocols is that new implementations can be freely provided and are not required to reside in one place in code.


Conclusion

This post is not very long since most of the underlying theory has already been covered in the two previous articles. After understanding basic principles, and experimenting with OO style complex data manipulation, it is really easy to switch to pure FP style. It boils down to moving the "subject" argument (aka "this") to the first place in the parameters list, and using |> instead of the OO-ish dot (.) operator.

By doing this, we have decoupled data from the behavior and are now programming in functional style. The code is divided into small functions which depend only on the input arguments, and not on some global or private instance state. Such functions are highly reusable, composable and easy to test and debug which should be a good reason for using this approach. Hopefully, the example has demonstrated that it is not hard.

Immutable programming, OO style

| Comment on this post
Continuing the topic started in the previous post, today I will go a bit deeper and present some examples on how to manipulate hierarchical data using only immutable data structures. In this article I will use the object oriented approach, based on polymorphic dispatch and data + code coupling. This style is often frowned upon by functional programmers, it is usually avoided by Erlang developers, and even the creator of Elixir, José Valim, discourages this approach.

Despite the controversy surrounding it, I will use this aproach as a starting point for easy transition from mainstream OO languages to the immutable world. The pure functional approach will be discussed in the next article.

Since Elixir is often wrongfully labeled as OO language, I would like to stress that the programming style used in these examples is in no way Elixir specific. Erlang itself supports polymorphic dispatches via a lesser known feature called "tuple modules", which allows us to couple code with data. Although somewhat controversial, the feature is still here, and we can exploit it to make the code more readable.

Before starting, let me give you a slight preview of what will we be able to accomplish:

Company.new(name: "Initech").
  add_employee(Employee.new(name: "Peter Gibbons",  salary: 10000)).
  add_employee(Employee.new(name: "Michael Bolton", salary: 12000))

Although the code looks very OO-ish, no in-place mutation of a variable takes place. Instead, each function call creates a new intermediate variable which holds the modified version, which (as explained in my previous post) shares as much data as possible with the old version (assuming appropriate data structures are used).


Basic manipulation

The basic unit of code organization in Elixir is a module, a simple collection of functions. On top of this, drawing from the power of its macro system, Elixir provides records, which are nothing more than modules with some autogenerated functions, namely field readers and modifiers. Records are mainly meant to be used as pure structures which group data together. However, since they are essentially modules, we can extend them with our own functions which, when defined in a particular way, can simulate something akin to OO methods:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
defrecord Company, [
  name: nil, employees: []  # record fields
] do
  
  def add_employee(employee, this) do
    this.update_employees(fn(current_employees) ->
      [employee | current_employees]
    end)
  end
end

c0 = Company.new(name: "Initech")
c1 = c0.add_employee("Peter Gibbons")
c2 = c1.add_employee("Michael Bolton")

IO.inspect c2

We start off by defining a record and its fields. Then in line 5 a function add_employee is defined. The function explicitly states this (an instance of a record) as its last argument. Because of this (pun intended), the function can be used as a "method" (lines 13 and 14).

The code of the function is fairly simple: it invokes update_employees on this to push new employee to the top of the list. The function update_employees is generated by defrecord macro, and it follows the same style as add_employee, i.e. it explicitly expects the instance of the record as its last argument. The lambda we are passing to it takes the current employees collection and must return the modified version. The update_employee function then returns a new instance of the Company record, its employees field containing whatever we returned from the lambda. This is an immutable way of modifying things: instead of an in-place mutation, we return a modified version.

Lines 12-14 demonstrate how we can use the Company record. First we create a new instance, then we can use it to call add_employee "method", and print the final result. The explicit intermediate variables are redundant, but I used them to emphasize the workflow.

The OO dispatch works because of two reasons. First, the call to Company.new(name: "Initech") creates a tuple in the form of {Company, "Initech", []}. When calling c0.add_employee(...) Erlang will in runtime translate the call into Company.add_employee(..., c0). This runtime dispatch/translation takes place because c0 is a tuple, and its first element identifies a module where the "method" must reside. Due to dispatch mechanism, c0 will be implicitly passed as the last argument, which is why we have to explicitly state this as the last argument of the function we want to use as a method. Let me state again: this is completely Erlang mechanism, with no special enhancements from Elixir.


Complex mutations

Extending the basic example above, let's introduce a dedicated Employee record which can hold additional employee data (e.g. its salary). When being added to a company, an employee will obtain its own autogenerated id which can later be used for fast retrieval. Here is the complete code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
defrecord Employee, id: nil, name: nil, salary: nil

defrecord Company, [
  name: nil, employees: HashDict.new, autoid: 1
] do

  def add_employee(employee, this) do
    this.
      store_employee(employee.id(this.autoid)).
      update_autoid(&1 + 1)
  end
  
  def store_employee(employee, this) do
    this.
      update_employees(HashDict.put(&1, employee.id, employee))
  end

  def get_employee(id, this), do: Dict.get(this.employees, id)
end

c = Company.new(name: "Initech").
      add_employee(Employee.new(name: "Peter Gibbons",  salary: 10000)).
      add_employee(Employee.new(name: "Michael Bolton", salary: 12000))

IO.inspect c
IO.inspect c.get_employee(1)
IO.inspect c.get_employee(5)

We start off by defining an Employee record. The record doesn't implement any custom functions, but as I mentioned earlier, defrecord will autogenerate some accessors and modifiers for us.

The Company record is then defined, this time using HashDict (key-value dictionary) to store employees. It also contains a field called autoid, which will be used to automatically provide incremental ids for new employees.

To add a new employee, we must first set its id to the current value of the autoid field, store it to employees dictionary (line 9) and finally increment the autoid field of the company record (line 10). The strange looking &1 + 1 construction is a shorthand of writing a lambda: fn(x) -> x + 1 end.

Notice in lines 8-10 how calls are chained by the dot (.) operator. The return value of a function ("method") servers as an invocation point of the next call. By doing this we can avoid explicit declaration of intermediate variables.

The storing of the employee (lines 13-16) simply amounts to putting an employee record in the dictionary, using its id as the key. Retrieving of an employee is a simple id based lookup.

The usage is illustrated in lines 21-23. Notice that a single call to add_employee internally results in multiple mutations: one modification of the employee record and two modifications of the company record. This should demonstrate that complex mutations are not only possible, but actually very easy to implement with immutables.


Modifying children

Extending the example further, let's implement a support for modifications of company employees. There are a couple of ways to do this, and here I will show the generic approach. The Company record is extended with the update_employee method which can be used by clients to arbitrarily modify an existing employee:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
defrecord Company
  ...
  
  def update_employee(id, update_fun, this) do
    this.get_employee(id)
    |> this.maybe_update_employee(update_fun)
  end
  
  def maybe_update_employee(nil, _, this), do: this
  def maybe_update_employee(employee, update_fun, this) do
    update_fun.(employee) 
    |> this.store_employee
  end
end

The update_employee "method" takes an employee's id and the function which will modify it (return the new version). It first retrieves the employee (line 5) and then maybe (if the employee exists) updates it (line 6). The |> operator feeds the result of the previous call as the first argument to the next call.

In case no employee with a given id exists, the company is not modified (line 9). Otherwise, the updater function is called (line 11) and its result is fed (again via |> operator) to the earlier described store_employee "method" (line 13).

We can now use this function for example to give a 20 percent raise to the specific employee:

1
2
3
4
5
c1 = c.update_employee(1, fn(employee) ->
  employee.update_salary(&1 * 1.2)
end)

IO.inspect c1

The presented code illustrates how to deal with updates in a deep hierarchy. To modify a child, we have to go through its parent. If a child is deep in the hierarchy, we will have to go through all of its parents. This is also known as "Law of Demeter" which states that we should communicate only with our immediate neighbors.

People often complain that immutable hierarchy is hard to manipulate and modify, especially if we want to change a child somewhere deep in the tree. In Elixir, I simply don't find this to be true. If we organize our code systematically, making sure that each level in hierarchy is modeled with the corresponding module, and that updates are appropriately delegated through these modules, it should not be complicated to modify any element, regardless of its position in the tree. Of course, we cannot directly update a child deep in the hierarchy, which is a good thing since it forces us to organize our code and structure our updates, making it harder (often not possible) to resort to quick and dirty hacks.


Iterative updates

So far we have only been doing manual hierarchy modifications. Often we want to perform a series of mutations based on some input. Let's say we need to implement a functionality which uses a list of raw employees data (for example obtained from a database) to construct the corresponding Company instance.

To do this, we must run some kind of loop over employees, adding them to the Company instance, one by one. However, when working with immutables, classical loops, such as while or for, are not possible, since they rely on mutable variables. Instead (as I briefly described in my last article), we have to use either recursion or higher order functions (which are basically wrappers around recursion). The code below uses the latter approach:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
input = [
  [name: "Peter Gibbons",  salary: 10000],
  [name: "Michael Bolton", salary: 12000]
]

c = Enum.reduce(
  input,                          # enumerable
  Company.new(name: "Initech"),   # initial accumulator
  fn(employee_data, company) ->   # accumulator modifier
    company.add_employee(Employee.new(employee_data))
  end
)

IO.inspect c

A list of raw employee data is defined (lines 1-4), then a new company instance is created (line 8), and finally, each employee entry is added to that instance (line 10).

To do this, the reduce function (elsewhere known as inject or fold) is utilised, which can be used to transform anything enumerable into anything else. When calling reduce we must provide enumerable (line 7), the initial accumulator value (line 8) and accumulator modifier function (lines 9-11), which will be called by the reduce function for each element of the enumerable. The modifier receives the current element of the enumerable (first argument in line 9) and the current accumulator value (second argument in line 9). Based on these arguments it must return the new version of the accumulator (line 10), which will then be passed to the modifier in the next iteration step. The final accumulator value will be returned by the reduce function to its caller. In line 6 we store it to the variable c which is then printed in line 14.


OO dispatch downsides

As I mentioned, OO-like dynamic dispatch is often frowned upon by functional programmers. While some arguments are more philosophical, there are a couple of practical downsides. 

First, since the dispatch is resolved in runtime, it means the call will be somewhat slower. I did a quick measure test couple of months ago, and observed a slowdown of a fraction of microsecond per function call. Whether this is acceptable or not depends on the specific case. If you're doing very heavy processing, manage a large number of concurrent processes, or run your system on a limited hardware, you might find this unacceptable. For most uses, though, I don't think it is a significant problem.

For OO dispatch to work, all "methods" must be made public. I think public/private scoping is a generally useful tool to distinguish between interface and implementation of a module, and with OO dispatch this benefit is lost. In previous examples, the internal function store_employee had to be declared as public and consequently can be called by anyone.

Another problem is that OO style doesn't mix well with the pure functional style which is used in Elixir core libraries, and probably will be the preferred style of most Elixir programmers. While two styles can be combined, it doesn't work so elegantly, requiring explicit declaration of intermediate variables. Moreover, since the styles are not compatible, it will not be easy to convert OO styled complex code to pure functional, if you decide to do so sometimes in the future.

Therefore, OO style is usually not the optimal approach. Regardless, for me personally, this approach was the first time immutable programming really clicked, having been struggling with pure functional approach for more than two years. So although burdened with issues, I find the approach as a good way to dive deeper into managing changing data with immutables. Once this style is grasped, it should be much easier to understand and use pure functional approach.


Wrapping up

Hopefully, this article has somewhat dispelled a myth that managing hierarchical data is hard with immutables. The examples should get you started with some basic techniques which you can use in Elixir, but also in most mainstream OO languages.

While the presented OO approach is not very popular in FP community, I still consider it as a good way to start shifting towards more functional style, which I will discuss the next time.

Working with immutable data

| Comment on this post
In the next few articles I will talk about programming with immutable data, a very important property of functional programming languages. Using immutable data might seem scary to a typical OO programmer used to deal with data which can be modified in-place. After many years of C++/C#/Ruby based development, I certainly had difficulties getting used to this concept. However once accustomed, I found it brings many benefits over classical mutable data approach: it forces a cleaner, better structured code, and makes concurrent programming much easier.

This series especially targets classical OO developers, accustomed to standard mutable variables. You might find the topic interesting even if you don't plan on coding in a functional language such as Elixir or Erlang. Immutable structures can be implemented in other languages (like this one in Ruby) and can bring some benefits which are difficult to gain with mutable data.

The first article of this miniseries will be more theoretical, presenting the basic ideas, benefits and building blocks of immutable programming, while the future ones should contain more practical, code driven examples on how to manipulate complex data. As usual, I will talk from the Elixir/Erlang point of view, but most of the ideas presented can be implemented in most modern languages.


Basics

The general idea of immutable programing is that it is impossible to do an in-place modification of a variable. Instead we can call functions which transform current value and return the modified version:

new_data = transform(original_data)

The transform does not in any way changes the original data so after it is executed we have acces to both old and the new data. The transform is a side effect free function: the same input will always produce the same output.

A more specific example of how this works is illustrated by this Elixir snippet:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
defmodule Company do
  def new(name), do: {name, []}
  
  def add_employee({company_name, employees}, new_employee) do
    {company_name, [new_employee | employees]}
  end
end

company_1 = Company.new("Initech")
company_2 = Company.add_employee(company_1, "Peter Gibbons")
company_3 = Company.add_employee(company_2, "Michael Bolton")

The Company module defines transformation functions, and lines 9-11 illustrate how we can use them to create and modify the state without in-place mutations. With proper language support we won't need the intermediate variables. Instead we can somehow chain the calls together, feeding the output of the previous function as the argument of the next one. I plan to discuss this in the future articles, but here is the taste of how is this done in Elixir:

company = 
  Company.new("Initech") |>
  Company.add_employee("Peter Gibbons") |>
  Company.add_employee("Michael Bolton")

If appropriate data structures are chosen (with respect to the actual transformations), the two variables (old and new) will share as much memory as possible, with the rest being shallow copied. For example, if we modify the 3rd element of the Elixir list, the new list will hold a shallow copy of the first two elements, the third one will have the new value, and the rest of the list is completely shared. In the example above, when we call add_employee, the new company variable will completely share the data with the previous one, adding additional information (new employee data) to its own structure.

With appropriately selected data structures, the transformations should work reasonably fast (though not as fast as in place mutations, this is what we are trading off for some other benefits). In the example above, add_employee is an O(1) operation due to the characteristics of Erlang lists and the fact that we are pushing the new employee to the top of it.

How is this used in a long running program which must respond to outer interaction and represent the changing state using immutable data? The following pseudocode illustrates the idea:

forever do
  message = wait_for_external_input()
  new_state = transform(current_state, message)
  store_state(new_state)
end

A program waits for an external input such as user interaction, network message, message from another thread or process, change on a file system, etc. Based on that input and the current state, it computes the new state (without modifying the current one) and somehow stores it (so it can be retrieved when the next external input arrives).

What does it mean to store the state? The simplest approach is to assign a value to a global variable. Since in most mutable languages assigning a value doesn't modifies the old value this won't break the immutability principle (closures might complicate the matter but I won't go into such deep details).
Erlang and Elixir offer more sophisticated solutions such as infinite tail recursion, where a loop is a function which calls itself passing the new state as the argument (if you haven't read it, refer to my article on actors which explains this in more details). Alternatively, an in memory mutable key-value store called ETS can be used.


Benefits

The key benefits of this approach are data consistency, improved code quality and easier concurrent programming.

Data consistency

Since transformation does not change the data (it only returns a modified version), it can be considered as an atomic operation. If it fails somewhere in the middle, it will not leave a mess inside the current state. We can capitalize on this by slightly altering the pseudo code presented earlier:

1
2
3
4
5
6
7
forever do
  message = wait_for_external_input()
  new_state = transform(current_state, message)
  store_state(new_state)
catch
  store_state(current_state)
end

If an invalid message somehow causes an unhandled exception, we can simply continue the execution using the current state (line 6) which is in no way corrupted by the transformation. Think of this as an in memory atomicity: either complete transformation will happen, or nothing will change.

The transformation may still introduce side effects, if it communicates with external entities e.g. by sending messages to other threads or system processes, storing to database or file system, or issuing network requests. Functions with side effects are also called impure functions. To retain consistency, it is best to completely separate impure operations from the state transformations, first calling the pure state transformation and afterwards executing the side effect tasks:

forever do
  message = wait_for_external_input()
  new_state = transform(current_state, message)
  execute_impure_tasks(new_state)
  store_state(new_state)
catch
  store_state(current_state)
end

If the state transformation breaks, no side effects will be introduced and consistency is kept completely. For the impure tasks, we have to either ensure it executes atomically (e.g. by using database transactions) or live with the fact they will not always succeed and develop our system accordingly.


Improved code quality

When an impure transformation causes no side effects, it is easy to reason about the code, and understand what it does and what it simply cannot do. Consider the following two function calls:

# retrieves employee, doesn't modify anything
employee = company.employee(name: "Peter Gibbons")

# retrieves employee, returns modified company
{employee, new_company} = company.employee(name: "Michael Bolton")

The first call retrieves the employee and produces no side effects. We can be 100% positive nothing else is modified.

The second call, however, additionally returns the company, which is a clear indication it does some additional work (e.g. internally caches the employee so the next retrieval works faster).

Another important benefit is the promotion of well factored code. Working with immutables forces you to divide your code into many small functions. Although not immutable specific, here it is more or less the only way to develop maintainable code. Since the code will be factored into many small functions with clean inputs and outputs, which produce no other side effects and do not rely on any global data, it will bring improved reusability and the code will be easier to test.

Easier concurrent programming

With the help of immutables we can write concurrent, multi threaded programs almost without any need for classical synchronization techniques such as locks, semaphores and mutexes:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# main thread
forever do
  message = wait_for_external_input()
  new_state = transform(current_state, message)
  notify_parallel_thread(new_state) # requires synchronization
  store_state(new_state)
end


# parallel thread
forever do
  message = wait_for_message_from_main_thread() # requires synchronization
  do_something(message)
end

The code above presents two threads of execution. The main one again runs the infinite loop which processes external inputs and modifies the state. In line 5, the main thread somehow notifies the parallel thread that it has some new work for it. Upon receiving that notification (line 12) the parallel thread will do something with the data it received.

Consequently, we need some kind of messaging system to communicate between threads (Erlang has it built in, and I have seen libraries available for other languages, or you can always hand code it yourself) and this messaging system will indeed require some synchronization, since both parties, the sender and the receiver, modify the shared communication channel.

However, the data transformations run in parallel, without any need for synchronization. The transformation of the main thread (line 4) can run simultaneously to the computation of the parallel thread (line 13), even if both functions work on exactly the same data residing in the same memory location. This works because neither transform of the main thread nor do_something of the parallel thread can modify the data.

So the data transformations, which we expect to be complex operations (otherwise why parallelize them?), can run completely in parallel, without any need for acquiring the lock, waiting for another thread to finish, or blocking another thread. Not only does this significantly simplifies the code, and reduces deadlock possibility (it can still happen, but far less often), but it also promotes the true parallelism, since the need for synchronization is minimal. I should note that in Erlang, this benefit is not relevant, since the data is never shared between two processes anyway (it is deep copied instead).


Building blocks

The basic idea behind immutable data is not very complicated and can be implemented in many mutable based languages. In addition to primitive types (ints, floats, booleans, ...), we need two complex structures: one to hold together small fixed number of elements (something akin to records), and another one to store arbitrary large, dynamically changed collections. Of course, both of these structures must be implemented as immutables: each modifier method may not modify any existing variable, but instead must create the new instance which represents the modified version of the structure.

To implement fixed size records, Erlang and Elixir use tuples. Tuples have instant read time and O(N) modify time (N being the size of tuple). When we modify a tuple, a new one is always constructed, with modified values in place of original ones, and unchanged values shallow copied. Since tuples usually hold a small number of fixed elements, the updates should be fast enough. In an OO language tuples can be implemented using a class which exposes only public readonly properties (which may return only immutables) and modifier methods, which return new instance.

Arbitrary sized collections are usually implemented with cons, a singly linked list where each element holds a value and a pointer to the rest of the list. When using cons, pushing a new element at the top of the list, and reading the head of the list are O(1) operations. All other operations such as random data access, updates or deletes are O(N). If we update an Mth element of an N sized list, the first M-1 elements will be shallow copied, than the modified element comes, and the rest of the list is completely shared.

All other complex structures such as binary tree, hash, stack, queue or set, can be implemented on top of tuples and cons (lists). For example, in Erlang, a balanced binary tree is implemented as a recursive tuple and gives an O(log2N) performance for both reads and writes. Erlang dictionary, on the other side, uses both tuples and lists, nesting them to divide the data in buckets.

In a language which deals strictly with immutables, there are no classical loops (since a variable cannot be modified). Instead such languages have strong support for recursion, transforming the tail recursion calls to pure jump instructions. Often a support for pattern matching exists, allowing us to write functions consisting of multiple clauses, of which only one will execute depending on the values of input arguments. Here's the example of recursive list iteration in Elixir:

def each([]), do: :ok # empty list
def each([head | tail]) do
  IO.inspect head
  each(tail)
end

Notice how two clauses of function exist, one for empty and another for non empty list.

If the recursive code is too verbose, we can use helper functions to make the code seem more imperative. For example, in Elixir, the function Enum.filter can be called to get all even numbers from the list:

Enum.filter(
  [1, 2, 3, 4, 5], 
  fn(x) -> rem(x, 2) == 0 end
)

This is an example of a so called higher order function, which is a fancy name for a function that takes another function as an argument, or returns a function (or both). In this case Enum.filter takes the list and a filter function which is invoked for each element of the list. The result is another list holding all elements for which the filter function returned true.
Although the fact is hidden, Enum.filter is internally still based on a recursion, as there are no alternatives in an immutable based language.

If you try to emulate immutables in a mutable based language you will have the commodity to use typical loops (lik for and while) which use local mutable variables. I advise avoiding this approach whenever possible. Many modern mutable based languages, such as Ruby, provide high order enumerable functions which can reduce the amount of mutables in a program.


Downsides

As I mentioned earlier, immutable updates will generally be slower than the mutable ones. After all, we are constructing a new variable instead of modifying the existing memory location and in addition shallow copy some elements. However, this should usually not present a problem, since the real performance cost most often lies in the computational part (e.g. transforming the data) or in I/O operations.

Since every modifications returns a new value, the garbage collector will have to deal with a large amount of short lived variables. If the language is immutable based, the GC is probably specifically fine tuned for such situations. For other environments it is best to research and test. I would expect the generational collectors to work fine, since they expect many short lived variables but I don't have any real experience or data to back this up.

The final downside (actually a benefit) is that you can't resort to fast hacks. If you deal with immutables. Forget about quick fixes where a global variable is set in one place and read in another one. Data can only be propagated via function arguments or return values, so you are forced to incorporate your special cases in your conceptual model, which is a good thing because it again promotes cleaner, easier to maintain code.

Wrapping up

If you have endured this far, hopefully you are a bit intrigued in exploring the world of immutable programming. I tried to reduce the boring theory to a bare minimum, which we can use to do something more interesting. Next time I will discuss how to deal with a more complex immutable data.

On a completely unrelated note for all Erlang programmers:  I'll be attending this year's Erlang User Conference, so if you happen to be there and are interested in a live discussion about Erlang, Elixir or anything else, let me know. I am looking forward to finally meeting Erlang programmers in person :-)

Concurrency in a long polling comet server

| Comment on this post
The last couple of articles were a part of a loose miniseries in which I presented a few concurrency patterns I have used in my Erlang systems. Today, I will wrap up this series with some thoughts on how to deal with many concurrent clients and minimize CPU load in an Erlang based comet (HTTP push) server.

The article is based on my own experience in building a long polling based server which pushes fast changing data to connected clients. The data is dynamically categorized into channels to which clients can subscribe to. At any moment, there are no more than 50 channels, and 4000 connected clients. All clients receive data using the so called long polling technique. A client issues an AJAX request asking the server for new data, and receives a response only when something new is available, after which new request must be issued. This results in a fairly high number of HTTP requests, in peak times about 2000 per second, all of which are handled in Erlang, without any front level caches. Each request comes from a separate client which is subscribed to its own subset of available channels, and therefore must be treated individually.

Comparing to big success stories, such as WhatsApp, Facebook, Heroku and others, this server might seem almost like a hello world app. Nevertheless, it took some attention and tuning to be able to handle that kind of load, and here I will present some approaches which have helped me accomplishing the task.


Basic principles

A typical Erlang based web server handles each request in a separate Erlang process, so at any point in time there are at least as many processes as there are connected users. These processes are usually short lived: they get the response, send it back via network and terminate. I will refer to such processes as request processes

Additionally, the server will usually run a smaller number of long running actors, which will be used by the request processes to obtain data. I will refer to these processes as data providers

The basic strategy of minimizing CPU load is to make data providers do everything that is common for multiple requests, and at the same time move all specific operations to request processes. So, whatever is common, we want to execute exactly once, whereas whatever is specific, we want to execute in parallel thus promoting vertical scalability of the system. This strategy is pure common sense not specific to Erlang, but in Erlang it is especially easy to implement, due to its light weight concurrency, message passing and some other mechanisms available.

It is not always obvious what is common for multiple clients. For example, in each request, the server returns the current time in RFC1123 format. The creation of such string might last about 1-2 milliseconds, which is not insignificant if we have 2000 requests/sec. Since this information is common for all processes, it is better to have a "time provider" process which computes the string once (instead of 2000 times) per second.

Although, this might seem like a case of premature optimization, it is important to notice that each CPU bound operation in a request process might be very expensive if there are many simultaneous requests. If 4000 clients perform a 1 millisecond operation, they effectively create a 4 seconds of a single core work. The good news is that you can usually get away if you do it wrong, depending on the load and available CPU resources, thanks to the way Erlang scheduler works (there is a great article about it here). I most certainly did many things wrong (and I probably still do), but still the Erlang VM was able to cope with it, and the server was working fine. So I don't recommend long meditation whether every single statement belongs in the request or in the provider process. Start with some reasonable assumptions, then measure and optimize where needed.


Removing single process reader bottleneck

Data providers are typically implemented as Erlang processes. Since process handles messages sequentially, requesting data directly from them might cause a bottleneck. Imagine 4000 clients at the same time asking a single provider process for some information. These clients effectively become "deparallelized", meaning they will get the answer one by one, instead of obtaining it simultaneously. Moreover, the provider process will be overloaded handling fetch data requests, instead of doing some other meaningful work (e.g. producing new data).

We can get around this problem by using Erlang Term Storage (ETS), an in memory key-value storage, with keys and values being Erlang terms. ETS supports fast lookup, and truly concurrent read/write operations. You can think of it as a lower level Erlang version of Redis. While not as rich in features, ETS has advantage of being in process, which makes it easy to use and requires no serialization/deserialization of the data being stored.

Recall the time provider process mentioned earlier. It is implemented as a separate Erlang process which calculates the time string once per second. Instead of exposing read interface, the process can store the computed string into an ETS table which can be used by the requests processes to read the time string, thus making reads truly parallel and at the same time taking the load of the time provider process. This technique is for example used in the popular Cowboy web server.

ETS implements only a very basic locking mechanism, so you usually want to have one type of data modified by a single process. However, it's ok to have multiple processes modifying different types of data. For example, in the comet server, when client subscribe to channels, a client specific process stores the subscription information to the subscription ETS table. Thus, there are many processes simultaneously writing to the same ETS table, and there are also many other processes reading from that table. This works fine, because each client process handles only its own subscription information, so no record level contention occurs.

ETS tables have their own drawbacks: all data is memory copied during reads and writes, and there is no garbage collection, so you must manually delete the data when it's no longer needed. They are not appropriate for all types of problems, but when used properly, they can really boost performance. I use them often to hold server-wide information, meaning anything that will be read simultaneously from multiple processes.


Permanent client processes

As stated earlier, each HTTP request is handled in a separate Erlang process. Consequently, for every client request, we have to constantly fetch the state of the client which might introduce some processing overhead. In the long polling comet server, many clients return constantly to the server so some processing time is spent repeatedly gathering client specific data. To overcome this, we can create permanent client processes which will maintain client's state. All client related operations will be executed in these processes. When HTTP request arrives from a client, the request process simply must find the corresponding client process (for example via ETS) and ask it to perform the requested operation. The client process now has all data ready available.

Let's see how this works in practice. In the comet server, when a message for a client is generated it is sent to the client process. Since the server is long polling based, the real client may have not yet returned to the server, for example if it still processes the HTTP response of the previous message, or its HTTP request did not yet reach the server. If the client has not yet returned, the corresponding client process simply stores the message in its internal message queue, doing nothing else (since we don't know if the client will return at all). When/if the client returns, the client process immediately knows whether there are new messages, and has them immediately available for serving.

Although this effectively doubles the amount of Erlang processes being used, due to their low cost, it should not present a problem. However, some additional processing must be done to perform client expiry check and terminate processes of clients which didn't return. Additionally, since we will always have some number of phantom client processes for clients which didn't return, some extra memory and CPU will be used. As a rule, I would say the technique is appropriate when you expect mostly returning clients which issue regular, frequent requests in some larger time period.

Alternative approach is to hold client data in the ETS table. The request process can retrieve it directly from the table, which eliminates the need to have extra process per each client. I initially used this approach and then transformed it to explicit client process. The problem in my case was that there are multiple parallel modifiers of client state: long polling requests, channel (un)subscribe requests, and message dispatcher process which modifies client's inbox. Multiple parallel modifiers of the same ETS data require some manual synchronization. Additional downside of ETS is that data is copied while reading/writing, which can cause performance penalty depending on the data size. These issues disappear when using pure processes, and taking this approach increased performance in my case. In a different situation, e.g. with small state data and a single data writer, ETS based approach might work better.


Throttling dispatches

In the comet server, during peak times, the data changes fast, a couple of times per second. Since long polling is used, a typical client usually returns after some new messages are generated, and when this happens, the request must be processed separately: per each request new messages must be collected, joined and the combined final message must be compressed. This again results in extra computations, and higher CPU load.

Knowing that many clients listen to the same channels, what I wanted to achieve is to have most clients connected and in the waiting state before the message dispatch takes place. This opens up the possibility to identify clients which receive the same set of messages, and perform join/compress computation only once per each distinct subscription group (all clients listening to same channels).

To achieve this, I have resorted to the throttling approach. All messages are generated in the channel specific process, and initially, these processes dispatched messages immediately to clients. In the throttling version, I have introduced the single dispatcher process, which receives messages from channels, aggregates them and dispatches them to clients every two seconds. The interval was chosen as big enough to accomodate reasonably fast clients, and small enough not to introduce significant delay, because data must be served as fast as possible. As the result, most of the clients return to the server before new messages are dispatched, so now I could identify which clients listen to the same messages, and minimize join/compress computations (see next section). As an added bonus, since this approach reduces incoming requests rate, both CPU load and generated traffic have dropped. Prior to introducing throttling, I noticed peak load of about 3000 reqs/sec. After it was introduced, the requests rate has never went above 2000 reqs/sec even though the number of parallel clients has since increased by 33% (from 3000 to 4000).

Admittedly, fixed time throttling seems like a "lame" approach, but it was an easy solution to the problem at hand, which helped me take control over the CPU load and paved the way for further optimizations. It might be better to monitor returning clients, and dispatch new messages once some desired percentage of them has returned, which would eliminate arbitrary dispatch time delay. Additionally, switching to true permanent connection, such as web sockets or server sent events might completely eliminate the problem.


Concurrent caching

Although we usually want to have data providers doing all the common processing, sometimes it is easier to identify that commonality from inside client processes. For example, when profiling the comet server I noticed that many requests were spending a considerable amount of time joining new messages and compressing the response. Now, as I have mentioned, each client is its own case, with its own set of subscriptions, and some of them might not support compression. So at first glance, it seemed as if those messages joins and compressions are not common between different clients. However, knowing the nature of the data and the usage flow of the application, I was sure that most of the clients were actually listening to the same subset of channels. On top of this, owing to the throttling approach described above, I knew most of the clients were receiving same messages most of the time.

To take advantage of this, I have introduce the logic in the client processes which can reuse the join/compression operation from another process. The algorithm is a bit involved, but the general idea goes like this. When a client process receives the list of messages it is interested in, it verifies whether a combined message has already been computed by another client process. It does so via look up to the ETS table, using messages unique ids as the key. If a cached version exists, it is served. Otherwise, the client process joins messages, compresses the result and stores it to ETS.

The key thing here is that, while one process is joining messages, all others, interested in the same set of messages, are waiting for it to finish, so they can reuse its result. However, two processes which try to join different set of messages will not block each other. Consequently, join/compress computation is performed only once per distinct set of messages, but it is executed in parallel to join/compress for a different set of messages. There are couple of ways of accomplishing this in Erlang, and after some trials and errors, I have settled for a custom locking implementation. The generic Elixir library which I used for this task can be found here.

As the result, messages join/compress is performed much less often. A glance in the production log, tells me that in a single minute out of about 80,000 join/compress operations needed, only 1,300 are actually computed! For all others, the pre-generated result is fetched from ETS.

This is generally a more complicated approach. Not only do we have to worry about custom inter process synchronization, but there is also an expiry issue: the cached data is stored in the ETS table, so we must manually delete it at some point. Therefore, I generally find it better to prepare data in the provider process, if possible, because this should result in a simpler and better performant code.

A careful reader might also wonder whether a message dispatcher process could determine which clients receive same messages, since all messages are sent from that process. This is essentially possible, but it has its drawbacks. First, the dispatcher doesn't know which clients have returned to the server. A client might return after the next dispatch, so whatever the dispatcher has joined and compressed would then be useless for such clients (since additional new messages have arrived), making the computation completely redundant. Additionally, shifting the join/compress responsibility to the dispatcher, effectively removes parallelization of independent computations, turning a dispatcher into a sequential bottleneck.


Some final thoughts

The most important point of this post is that, in Erlang, it is fairly easy to minimize the number of redundant computations, without sacrificing the scalability. I described a couple of techniques which I find interesting, and which (hopefully) illustrate higher level view on concurrency in Erlang systems. The motivation to use these approaches came to me after observing the system from a higher level perspective, thinking about the nature of the data, how it is transported through the system and how the clients are using it. Of course, profiling, benchmarking and performance monitoring are always useful tools which can help identifying more problematic parts of the system.