It's Actors All The Way Down

Jun

2010

Actors in Clojure – Why Not?

In his article about state management in Clojure, Rich Hickey discusses his reasons for choosing not to use the Erlang-style actor model. While Erlang has made some implementation choices that can lead to problems, the problems are not intrinsic to the Actor Model. As the actor implementation with the longest history of success, Erlang naturally represents many peoples’ understanding of the nature of actors (just as Smalltalk represents many peoples’ understanding of objects). Patterns for actor-based problem solving are still emerging, but my experience with programming actors-in-the-small (i.e.: fine-grained concurrency in a shared-memory multicore context) leads me to believe that there is great potential for this largely-misunderstood model. So with that in mind, let’s break down Rich’s reasons and address them one-at-a-time.

It is a much more complex programming model, requiring 2-message conversations for the simplest data reads, and forcing the use of blocking message receives, which introduce the potential for deadlock.

Erlang’s nested (blocking) receive is not part of Hewitt’s original actor model [1] or Agha’s elaboration of it [2]. By introducing such a mechanism, a kind of deadlock can occur in Erlang. Of course, Erlang provides additional mechanisms, such as time-outs and supervision trees, for handling these failures. In the context of fault-tolerant components and distributed systems these mechanisms are very useful for creating reliable systems, but they are not required for shared-memory multiprocessing.

The actor model does require two messages to “read” data from an object/actor, a request message and its corresponding reply. This is actually what allows you to avoid blocking concurrent requests. The messages are asynchronous, so nothing really needs to be blocked. If the requestor is unable to proceed without the data from the reply, then the requestor may be logically blocked, but that is not a result of using actors, it’s a result of the pattern of interaction used in a particular design.

In most cases, there is much more potential concurrency to exploit in a particular system. Results may not even be “returned” to the requestor. Instead, results can be directed to the object/actor that needs the data. This leads to more of a flow-based approach to decomposing the system. Data flows asynchronously and concurrently to where it is needed. The actors in the system simply react to the arrival of new information in the form of messages representing work to do.

Programming for the failure modes of distribution means utilizing timeouts etc. It causes a bifurcation of the program protocols, some of which are represented by functions and others by the values of messages.

The key idea here is to focus on the protocol of messages. Think of “protocol” as a replacement for “interface” in designing loosely-coupled components. Components that can speak the same protocol can be used interchangeably and even safely upgraded or hot-swapped. Having appropriate strategies and mechanisms for handling distributed failure modes makes it possible to build extremely reliable and resilient systems. Erlang provides many valuable patterns for addressing these issues. However, these mechanisms are not required for communication within the same address space and are not intrinsic to the actor model.

The bifurcation encouraged by actor-based programming is between values and actors. Values remain constant over time. Actors may change their behavior based on messages (values) they receive, so they represent the changeable state of the system. Clojure encourages just the same bifurcation. Most of the language deals with values and functions on values. The “identity” concept is used to represent the changeable state of the system.

It doesn’t let you fully leverage the efficiencies of being in the same process. It is quite possible to efficiently directly share a large immutable data structure between threads, but the actor model forces intervening conversations and, potentially, copying.

The actor model does not force copying of data. Passing messages between address spaces is what forces copying. Actor model messages areÂ always pure immutable data values, and thus can be safely shared within an address space. An efficient actor implementation will fully leverage the ability to share large immutables values (data structures) among multiple actors. When copying must occur (e.g.: between machines) then it happens safely and transparently, since neither the original nor the copy are allowed to change.

Reads and writes get serialized and block each other, etc.

Actors implement a “shared nothing” data model. If you create an actor that has stateful behavior (such as a “storage cell”) then, and only then, you must define a protocol for access. Since messages are asynchronous, a sender never really blocks, not even to wait for the message to be received. If a response is generated, it is sent as a separate asynchronous message to whatever customer is specified in the request (which may not be the requestor). If there is a problem with “blocking” then either the protocol is poorly designed or the problem inherently requires synchronization. If synchronization is really needed, there are several good protocol patterns available. You’re not limited to the intrinsic synchronization assumed by sequential processing and call-return procedural protocols.

It reduces your flexibility in modeling – this is a world in which everyone sits in a windowless room and communicates only by mail.

On the contrary! The actor model is flexible enough to model the mechanisms of practically any other model of computation, including functional, logical, procedural, and object-oriented. The basic mechanisms of the actor model, asynchronous communication of pure values among concurrent components, and dynamic reconfiguration of state, provides a reliable and well-defined semantic foundation.

Thinking differently about the structure of your programs is required for scalable concurrent programming. Fortunately, we have examples all around us. The real world is concurrent. Change requires interaction. State is only observable through behavior. The actor model gives us the tools to represent this directly in our designs.

Programs are decomposed as piles of blocking switch statements.

This is specific to Erlang, which implements actors as tail-recursive functions that block on “receive”. But that is not the only possible implementation. Hewitt/Agha-style actors have no explicit “receive”. Instead, they are activated by the reception of a message. The behavior they execute on activation is finite, and they can not block. In fact, there are really no “threads” at all. Only reactive components that maintain their (passive) state between invocations (messages). All pending work in the system is represented by messages-in-transit.

You can only handle messages you anticipated receiving.

And objects (in a traditional object-oriented language) can only handle messages they anticipated receiving. But both objects and actors can be designed to delegate “unanticipated” messages to another handler. Are all functions in Clojure “total”, or are they undefined for some “unanticipated” input values? In Humus, actors can choose to ignore, modify, redirect, or throw an exception when they receive a message they don’t want to handle directly.

Coordinating activities involving multiple actors is very difficult.

Programming with actors does require a different mental model, just like programming with functions, logic, procedures, or objects. That’s what makes it a model of computation, not just a new set of tools and patterns we can capture in a library. You should expect that a shift to actor-based thinking will be as much of a challenge as shifting to any new computational model.

You can’t observe anything without its cooperation/coordination – making ad-hoc reporting or analysis impossible, instead forcing every actor to participate in each protocol.

Two powerful mechanisms are available to address this issue. First, actors can be easily hidden behind proxies, adapters, or even a façade. Since you can only interact with an actor through its message protocol, you can interpose all kinds of reporting and analysis actors without the knowledge or consent of either the customers or the target actor. All kinds of aspects, monitoring, instrumentation, verification, and adaptation can be implemented this way.

Second, actors can be hosted in a heavily-instrumented meta-configuration which records the full history of all messages and the provenance of all actors in the configuration. The resulting event-trees can be combined with references to the actors’ behaviors for a full picture of any given execution. You can’t get more observable than that.

It is often the case that taking something that works well locally and transparently distributing it doesn’t work out – the conversation granularity is too chatty or the message payloads are too large or the failure modes change the optimal work partitioning, i.e. transparent distribution isn’t transparent and the code has to change anyway.

Properly modularized actor configurations can be distributed, and often replicated, without changing their fundamental operation. This does not make distribution “transparent”, partly for the reasons quoted. However, distributed programming is not the only application for actors. Safe concurrent applications, even on multiple processor cores sharing memory, can be created with actors. And extremely efficient actor implementations do exist.

Conclusion

I have nothing against Clojure. In fact, I think there are a lot of interesting ideas there. Focusing mostly on pure functions and providing explicit mechanisms for handling mutable state is a good idea. In a future article, I intend to explore the implementation of Software Transactional Memory, another interesting idea. I also respect the choice to not support actors. However, I do object to some of the reasons given for making that design decision. This rebuttal is intended to provide a counterpoint to Rich Hickey’s rationale and hopefully dispel some of the misconceptions relating to actor implementations.

References

[1]: C. Hewitt. Viewing Control Structures as Patterns of Passing Messages. Journal of Artificial Intelligence, 8(3):323-364, 1977.
[2]: G. Agha. Actors: A Model of Concurrent Computation in Distributed Systems. MIT Press, Cambridge, Mass., 1986.

Tags: actor, blocking, Clojure, data-flow, deadlock, debugging, distribution, Erlang, functional, protocol, scalability, value

This entry was posted on Thursday, June 17th, 2010 at 7:09 pm and is filed under Uncategorized. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

14 Responses to “Actors in Clojure – Why Not?”

Alain O'Dea June 18th, 2010 at 7:09 pm
This is a clear and concise article. Thank you.

The Actor Model is a complex topic and yet you have managed effectively to boil it down to some essential strengths. This article will be a useful resource when explaining Actors to my colleagues.
Tweets that mention Actors in Clojure â€” Why Not? -- Topsy.com June 18th, 2010 at 7:18 pm
[…] This post was mentioned on Twitter by Dale Schumacher, Alain O'Dea. Alain O'Dea said: RT @dalnefre: Published: "Actors in Clojure â€” Why Not?" http://bit.ly/9ZUXaQ #clojure #erlang #humus […]
Phil June 18th, 2010 at 11:49 pm
I don’t think Rich has anything against the Actor model. It’s just something that’s easy to implement in libraries and doesn’t need to be part of the core of the language.
ishmal June 19th, 2010 at 6:23 am
The limitations of which he speaks are, of course, an implementation thing, and not spec’d that way by the Actor model itself. Two of the quotes, specifically:

> Reads and writes get serialized and block each other, etc.

…and…

> You can only handle messages you anticipated receiving.

…are both because of implementation. I have been wondering if maybe a select()-type model would be better, where incoming messages -of any type- and from any source are received in some order (first come first served, priority queue, etc) and are parsed and dispatched. So whether you have an input stream to read, a socket receive, a pipe, an Actor message, WHATEVER, they all get handled in turn, with none blocking the others.
Tristan June 19th, 2010 at 10:00 am
@Phil

On the contrary. Implementing Actors as a library in another language is automatically restricting by accepting all the assumptions of that language. Fundamentally, Actor language, when implemented true to form, could perform non-Turing computation (think unbounded nondeterminism, hypercomputation, Artificial Recurrent Neural Networks). As soon as one takes another language and builds a library, that key benefit is gone.
Sean June 19th, 2010 at 11:12 am
This is a very insightful post, thanks!

One thing that bears repeating is that Erlang was not designed originally to exploit multiprocessing, it just happened that the Actor model was the most efficient way to design a fault-tolerant system — the isolation provided by message-passing inherently makes components less coupled to each other’s failures. Only recently (within about the last 10 years) has SMP been available for Erlang.
Zak June 19th, 2010 at 6:12 pm
Clojure’s mechanisms for concurrent state are more powerful and flexible than actors. Clojure’s agents actually have a lot in common with actors, but take arbitrary functions instead of predefined messages. From a Clojure point of view, actors are agents with restricted functionality to enable use in a distributed environment.

Adding actors to Clojure might be a good idea, but they should be explicitly for the purpose of distributed systems, and probably built on top of agents.
Gerry June 19th, 2010 at 8:30 pm
Very interesting and helpful.
You wrote
“Erlangâ€™s nested (blocking) receive is not part of Hewittâ€™s original actor model [1] or Aghaâ€™s elaboration of it [2]. By introducing such a mechanism, a kind of deadlock can occur in Erlang. Of course, Erlang provides additional mechanisms, such as time-outs and supervision trees, for handling these failures”
What kind of deadlocks, are they restricted to a distributed or local environment?
How do Clojure or Scala effectively deal with the equivalent of Erlang OTP, and distributed apps in general?
admin June 20th, 2010 at 6:23 pm
@Phil – As Tristan noted, even though a library implementation may be possible, there are advantages to a direct implementation. So far I’ve been able to avoid any blocking operations. Everything is interrupt/event/message driven at the lowest level.

@ishmal – That’s essentially what my prototype actor-based environment does. Message delivery events are interleaved to provide very fine-grained concurrency.

@Sean – You have an excellent point. Since many people’s understanding of actors comes from Erlang, I’m trying to point out some of the differences between Erlang-style actors and the original Actor Model.

@Zak – I’m not arguing for inclusion of actors in Clojure. That’s not my call. I’m simply attempting to dispel some misconceptions that were represented by the original article. In any case, relative power and flexibility are somewhat subjective. I think you may be surprised at how powerful and flexible actors actually are. Hopefully, I can illustrate this further in future articles.

@Gerry – Deadlocks can occur whenever an Erlang-style actor is waiting in a nested receive and thus is unresponsive to other messages (local or remote). OTP is much too big a subject to address here :-)
Jeff Rose June 20th, 2010 at 6:55 pm
While an interesting analysis, this post feels more like partisan politics than an honest investigation of the advantages and disadvantages of using actors to structure concurrent programs. Oh, and we are talking about real programs with state, not pure computations that just heat up the CPU, because I think it would be very difficult to argue for actors over regular pure functions if that were the case, however useless it might be.

First, you say, “if the requestor is unable to proceed without the data from the reply, then the requestor may be logically blocked, but that is not a result of using actors, itâ€™s a result of the pattern of interaction used in a particular design.” This doesn’t address the basic point that I think Hickey intended, which is that actors that have state must be coded to respond to messages which are querying for the state, and vice versa, the requesting actor has to wait for the response message. Sure, it could be asynchronous, but if this is a request from another processor we could be either spinning or paying for context switching when in reality we could have just read a value and saved both time and coding overhead. This isn’t about being logically blocked, it’s about jumping through hoops to satisfy a given programming model when in reality a simple read is sufficient, unless you need to go over the network.

In response to Hickey’s claim that the actor model results in a branching of the programming model, between messaging and function calls, you state that, “The bifurcation encouraged by actor-based programming is between values and actors. Values remain constant over time. Actors may change their behavior based on messages (values) they receive, so they represent the changable state of the system. Clojure encourages just the same bifurcation. ” This is side-stepping his criticism. I ran into this exact problem with Erlang a number of times, so I know first hand what he means. While sitting there breaking up a problem you have to make the call over and over again, “do I send a message and implement this next computation in an actor, or do I just write a function and make a function call?” I think this is the bifurcation he is discussing. Values remain constant? I don’t see how that is relevant. We are talking about performing computation to create new data structures or to generate new values, and the split is in deciding where this computation should occur and how it should be triggered.

Hickey said, “it doesnâ€™t let you fully leverage the efficiencies of being in the same process,” and you responded with discussion of efficient implementations not needing to copy. While copying is potentially an issue, the larger point is that reading a value requires a conversation. To read a value from an in-process actor, even in a zero-copy dream implementation, you have to make a request, handle the request and send a response, and then handle the response, while in equivalent Clojure you just read the value. That means less coding, less debugging, and less processing time. To read the current value of an agent that could be currently executing on a local CPU in Clojure you type @foo. I don’t think an actor equivalent could compare, but I’d like to see a counter example.

Hickey pointed out that an actor can only handle messages which it has been programmed to respond to, and you respond by asking, “are all functions in Clojure â€œtotalâ€, or are they undefined for some â€œunanticipatedâ€ input values?” Again, this misses the point. If an actor is representing something more complicated than a single value, say a data structure, it has to be programmed to respond to all possible queries about that data structure if it is going to be as useful as just getting at the data structure directly, but responding with a whole data structure isn’t a great idea if you imagine that it could one day become a distributed system rather than just a concurrent program (let alone if you have to pay for a copy). Of course functions in Clojure expect specific input arguments, but this is more about getting at state then it is about a protocol. The state in a Clojure program isn’t being guarded by an actor that has to be programmed to respond to requests for that state, it can just be accessed directly, so you don’t have to foresee what questions might be asked of it.

You say that “programming with actors does require a different mental model, just like programming with functions, logic, procedures or objects,” but I think you misunderstood what Hickey was referring to when he said that coordating multiple actors can be difficult. At least in Clojure land coordination typically comes up when speaking about references (refs), which are the special variables that can only be modified inside a transaction. This type of coordination is necessary, for example, if you want to have multiple threads operating on shared data structures. In Clojure the STM protects you from many of the most difficult issues related to deadlock when updating state from multiple threads. With actors you can introduce many of the same problems that you run into with standard multi-threaded programming, hence the need for supervision trees, etc. With Erlang it is typical to hear people talk about just killing off processes when they mis-behave, which is actually a refreshingly pragmatic take on the deadlock issue. If things get funky, just kill them and retry. Clearly this does work, but it is also sub-optimal. With software transactional memory I think we get closer to just making things work correctly instead.

All of that said, I actually like the actor model for distributed computing, and I think Erlang and OTP have many interesting ideas that I’d love to see people work with in Clojure. My guess though is that actors are not the future of concurrent computation exactly because of the issues Hickey brings up.
Dan Creswell June 21st, 2010 at 6:13 am
“Actor model messages are always pure immutable data values”

Did you mean the messages themselves or the messages and all the things they reference?

Or put another way: Are you stating that these messages can’t contain references to mutable state?
admin June 22nd, 2010 at 8:03 am
@Jeff – I appreciate the diversity of opinion represented by your comments. Thanks for adding to the conversation. In my opinion, one of the key mental shifts involved in effectively applying the actor model is to focus on behavior rather than state. The awkwardness of accessing state from outside the actor is a constant reminder to “tell, not ask”. That is, tell the actor what you would like it to do for you, not ask for some state to manipulate yourself. This was the original intent of the object-oriented model, and it is still largely unrealized. For pure data manipulation, I prefer to use the functional model. Humus combines the data manipulation power of pure functions with the use of actors to manage concurrent access to mutable state. I find this to be a valuable design separation. YMMV.

@Dan – Messages may contain actor “identities”, which are immutable, just like all values. You must send a message to the actor in order to affect its state.
Tweets that mention http://www.dalnefre.com/wp/2010/06/actors-in-clojure-why-not/#comment-26?utm_source=pingback -- Topsy.com June 24th, 2010 at 12:03 am
[…] This post was mentioned on Twitter by . said: […]
Tavakankila August 16th, 2010 at 2:16 am
i’m new… anticipation to despatch nearly more oftentimes!