[ANN] celluloid 0.0.3: a concurrent object framework for Ruby

T

Tony Arcieri

[Note: parts of this message were removed to make it a legal post.]

Celluloid is a concurrent object framework for Ruby inspired by Erlang
and the Actor Model:

* Github: http://github.com/tarcieri/celluloid
* RDoc: http://celluloid.github.com/

Celluloid provides thread-backed objects that run concurrently,
allowing the familiarity of plain old Ruby objects for the
most common use cases, but also the ability to call methods
asynchronously. Asynchronous method calls allow the receiver
to do things in the background while the caller carries on with its
business.

If you're looking for a longer introduction, please check out this
post on my blog:

http://www.unlimitednovelty.com/2011/05/introducing-celluloid-concurrent-object.<http://www.unlimitednovelty.com/2011/05/introducing-celluloid-concurrent-object.html>
html<http://www.unlimitednovelty.com/2011/05/introducing-celluloid-concurrent-object.html>

Also view the screencast I did for EMRubyConf here:

 
E

Eric Wong

Tony Arcieri said:
Celluloid provides thread-backed objects that run concurrently,

Cool!

I assume this interacts transparently with existing apps that already
use threads?

For instance, Rainbows! already offers ThreadPool/ThreadSpawn options
for hosting Rack applications, can the Rack applications themselves use
Celluloid without any changes to Rainbows!?

Or would we have to add explicit support for Celluloid in Rainbows!
to support folks that want to write Rack apps using Celluloid?


Rainbows! of course also supports Revactor, Cool.io, EventMachine,
NeverBlock, XEpollThread*, etc... Adding explicit support for Celluloid
isn't out of the question, just a matter of developer time.
 
D

David Masover

Celluloid is a concurrent object framework for Ruby inspired by Erlang
and the Actor Model:

* Github: http://github.com/tarcieri/celluloid
* RDoc: http://celluloid.github.com/

Celluloid provides thread-backed objects that run concurrently,
allowing the familiarity of plain old Ruby objects for the
most common use cases, but also the ability to call methods
asynchronously. Asynchronous method calls allow the receiver
to do things in the background while the caller carries on with its
business.

That's pretty awesome. I was working on something like this, but ended up
abandoning it after I had it deadlocking for awhile, and I couldn't get the
semantics right.



Speaking of semantics, my biggest problems were:

- Is there a way to handle exceptions in a Ruby-esque way?

It looks like I have to explicitly trap actor exceptions. But this is a place
I have to be aware that this is an actor and not just a Ruby object. Your
parallel map is a perfect example of what I'd actually want here: If an
exception is raised, re-raise that exception when I try to call methods on the
actor, rather than DeadActorException. Is there a reason not to do that?

- How do you handle cycles?

An actor can only process one method at a time, which makes sense. One thing I
wanted to do was give two actors references to each other, so they can send
messages back and forth. Futures seem like a good solution to avoid a lot of
annoying asynchronous callbacks. Two problems:

class Foo
include Celluloid::Actor
attr_reader :bar, :value
def initialize
@bar = Bar.spawn self
@value = 3
end

def first
bar.second!
bar.result * 2
end
end

class Bar
include Celluloid::Actor
attr_reader :foo
def initialize parent
@foo = parent
end

def second
@result = foo.value + 3
end

def result
@result
end
end


This doesn't actually run -- it seems to deadlock on 'new'. But there are two
other problems: First, 'self' wouldn't be an actor reference, it'd be the
object itself, right? But more importantly, what happens when I call 'first'?
It looks like we deadlock again, but it seems reasonable that since Foo is
waiting for Bar, that maybe Bar can now call another method on Foo, acting as
though the two actors were just plain objects and we're just building a call-
stack.

That's the part I could never get working.


Couple other annoyances:

- Why spawn instead of new? It seems like if I've decided to make something
an actor, it's going to expect to be an actor most of the time -- it's hard to
imagine a case where I want the original 'new' instead.

- I really don't like the registry -- one flat namespace of actors? Ew. But
I'm not really sure how to solve this -- some sort of super-reference, which
points to the currently-alive actor from a given supervisor? But then I might
send something which asynchronously kills the actor, and I'll get a fresh
actor for the next line, which seems like a bad thing. There needs to be some
clean semantics for "Give me a reference to the currently-alive version of
this actor" which doesn't rely on a global, flat registry.

And a thought: I just had every method return a future. If people wanted
something to run asynchronously, all they had to do is ignore the future. The
downside is that this makes it hard to force things to be synchronous. I
actually thought of this as a good thing -- if I make the call up at the top
of a method, and don't use the result till the bottom, that's some surprise
parallelism right there. The biggest problem is that if there's an exception,
you don't know about it until the future is resolved.
 
T

Tony Arcieri

[Note: parts of this message were removed to make it a legal post.]

Cool!

I assume this interacts transparently with existing apps that already
use threads?

Yep. Check out the screencast for Celluloid being used in conjunction with
Sinatra.

For instance, Rainbows! already offers ThreadPool/ThreadSpawn options
for hosting Rack applications, can the Rack applications themselves use
Celluloid without any changes to Rainbows!?


Indeed, Celluloid should work just fine with ThreadPool/ThreadSpawn.
 
T

Tony Arcieri

[Note: parts of this message were removed to make it a legal post.]

Speaking of semantics, my biggest problems were:

- Is there a way to handle exceptions in a Ruby-esque way?

It looks like I have to explicitly trap actor exceptions. But this is a
place
I have to be aware that this is an actor and not just a Ruby object.


When making synchronous calls, exceptions which occur in the context of the
receiver are automatically reraised in the caller just like any other Ruby
object, regardless of if you're using any actor-specific features like
linking or trapping exits. It will also crash the receiver.

Your parallel map is a perfect example of what I'd actually want here: If an
exception is raised, re-raise that exception when I try to call methods on
the
actor, rather than DeadActorException. Is there a reason not to do that?

I think reraising the original exception in the caller context gives the
caller appropriate context to bail out of whatever they're doing and avoid
making subsequent calls at all. Other threads may be trying to make calls,
and if an exception entirely unrelated to the calls they're making is raised
because the actor is dead, I think that'd be rather confusing.

- How do you handle cycles?
I don't, but they can be detected if you don't mind a bit of a performance
penalty. For that I need to track chains of synchronous calls and detect if
the receiver of a given method exists earlier in the call chain. If so,
Celluloid can raise an exception in the caller context indicating that a
deadlock would occur. This is a bit of a glaring deficiency right now.

First, 'self' wouldn't be an actor reference, it'd be the object itself,


Yes. I provide Celluloid.current_actor to use in lieu of self. This feels a
bit ugly, but I don't know of any way to redefine self (nor do I think
that'd be a particularly good idea either)

Couple other annoyances:

- Why spawn instead of new? It seems like if I've decided to make
something
an actor, it's going to expect to be an actor most of the time -- it's hard
to
imagine a case where I want the original 'new' instead.

This is a good point. I could easily redefine new to have the same behavior
as spawn.

I really don't like the registry -- one flat namespace of actors? Ew. But
I'm not really sure how to solve this -- some sort of super-reference,
which
points to the currently-alive actor from a given supervisor? But then I
might
send something which asynchronously kills the actor, and I'll get a fresh
actor for the next line, which seems like a bad thing. There needs to be
some
clean semantics for "Give me a reference to the currently-alive version of
this actor" which doesn't rely on a global, flat registry.

I don't know of a better solution. This is the same approach Erlang uses.
The only evolution it's seen in recent history is systems like Ulf Wiger's
gproc.

And a thought: I just had every method return a future. If people wanted
something to run asynchronously, all they had to do is ignore the future.
The
downside is that this makes it hard to force things to be synchronous. I
actually thought of this as a good thing -- if I make the call up at the
top
of a method, and don't use the result till the bottom, that's some surprise
parallelism right there. The biggest problem is that if there's an
exception,
you don't know about it until the future is resolved.

That's an interesting approach, but a bit different than the one I'm
shooting for in Celluloid, where I want concurrent objects to quack like
normal Ruby objects as much as possible.
 
D

David Masover

When making synchronous calls, exceptions which occur in the context of the
receiver are automatically reraised in the caller just like any other Ruby
object, regardless of if you're using any actor-specific features like
linking or trapping exits. It will also crash the receiver.

That makes sense.

But when making asynchronous calls:
Your parallel map is a perfect example of what I'd actually want here: If
an


I think reraising the original exception in the caller context gives the
caller appropriate context to bail out of whatever they're doing and avoid
making subsequent calls at all. Other threads may be trying to make calls,
and if an exception entirely unrelated to the calls they're making is
raised because the actor is dead, I think that'd be rather confusing.

That makes a lot of sense.

Still, I shouldn't have to create an entire new actor, link it to your actor,
and have it trap errors in order to find the actual exception I caused which
lead to the actor's death. Maybe it's appropriate for bang methods to return
some object which can be used to retrieve an exception?
- How do you handle cycles?


I don't, but they can be detected if you don't mind a bit of a performance
penalty. For that I need to track chains of synchronous calls and detect if
the receiver of a given method exists earlier in the call chain. If so,
Celluloid can raise an exception in the caller context indicating that a
deadlock would occur. This is a bit of a glaring deficiency right now.

That's what I was trying to do, except I wasn't planning to deadlock. I was
planning to allow the call... somehow. Basically, if you had any sort of
pattern where two objects call methods on each other, it should work the way
it does synchronously.

I think this makes sense, semantically. After all, if an actor calls a method
on itself, we don't get any sort of deadlock. If an actor calls a method on
another object running in the same thread, which then calls a method on the
actor, at least with my implementation, this also doesn't deadlock -- and in
yours, if I pass 'self' around, we get the same result. Why should it be
different if I call a method on another _actor_ which then calls a method on
me?

Still, it's tricky to come up with an efficient way to do this, and I never
managed to get anything to work, no matter how inefficient.
First, 'self' wouldn't be an actor reference, it'd be the object itself,


Yes. I provide Celluloid.current_actor to use in lieu of self. This feels a
bit ugly, but I don't know of any way to redefine self (nor do I think
that'd be a particularly good idea either)

So, there is a way, but you probably won't like it...

One experiment I did here was:

- Grab all methods, stuff them in a hash, and undef them.
- When a method is called, intercept it like a proxy, and do whatever I need
to do to get it to the right thread.
- To actually call the method, grab the method object, bind it to self, and
apply.

It's not really redefining self, but it accomplishes what's needed here.

However, I suspect it breaks all kinds of inheritance, unless I also absorb
that kind of functionality -- that is, whenever something inherits from this
class, give it a clone of the hash to start with.

One advantage to this approach is that I could very easily allow some methods
to require the actor thread, and some methods to run in the calling thread --
by default, they run in the actor thread. The obvious application is when a
method really doesn't need to involve the actor:

class Sheen
include Suit

# define a new threadsafe method
threadsafe :status do
:winning
end
end

But maybe you want to anyway:

class Sheen
include Suit

attr_reader :status, :sober
def initialize
@status = :winning
@sober = true
end
def fall_off_wagon!
@status = :WINNING
@sober = false
end
def is_off_wagon?
!sober && status == :WINNING
end

threadsafe :hello do
if is_off_wagon?
puts 'WINNING!!!'
else
puts 'Hi.'
end
end
end

It makes sense that fall_off_wagon! and is_off_wagon? should run on the actor
thread. It makes sense that the 'hello' method doesn't really need to run on
the actor thread, and maybe it's a performance improvement that the Sheen
thread doesn't actually have to talk, or ever wait for output, etc. I'm really
reaching here, because I don't actually have a real application for this, but
I don't think it's entirely unreasonable -- kind of like the Java
'synchronized' keyword, except message-passing behavior is the default.

But notice that the 'threadsafe' call doesn't have to call 'self' at all. In
fact, that syntax is actually syntactic sugar for:

def hello
...
end
threadsafe :hello

I'm still just writing normal methods, but every method call, whether it's to
'self' or not, is still going through the same logic to determine whether or
not it needs to run on the Sheen thread.

I was much more interested in getting the semantics right, to show that it can
be done, rather than making it performant and immediately useful. Like you, I
wanted to use this to sort of prototype those semantics, with the hope that
they would get into something like Reia eventually. (I started this before I
heard of Reia, and probably before Reia was in any way practical, so I wasn't
deliberately reinventing the wheel.)
I don't know of a better solution. This is the same approach Erlang uses.
The only evolution it's seen in recent history is systems like Ulf Wiger's
gproc.

Looking again, maybe the supervisor already does this?

supervisor = Sheen.supervise "Charlie Sheen"
charlie = supervisor.actor

This would solve both problems, right? (Assuming the supervisor is itself
threadsafe.) It could use some sugar, but I'm not entirely sure how.
That's an interesting approach, but a bit different than the one I'm
shooting for in Celluloid, where I want concurrent objects to quack like
normal Ruby objects as much as possible.

And this does quack like a normal Ruby object, unless something goes wrong and
an exception is raised. But I was never quite satisfied with how exceptions
were dealt with. For one thing, it's not OK that someone might ignore a future
and never see the exception.
 
T

Tony Arcieri

[Note: parts of this message were removed to make it a legal post.]

Still, I shouldn't have to create an entire new actor, link it to your
actor,
and have it trap errors in order to find the actual exception I caused
which
lead to the actor's death. Maybe it's appropriate for bang methods to
return
some object which can be used to retrieve an exception?

If you want that sort of behavior, you can use the built-in
Celluloid::Future functionality. It does exactly what you describe, calling
a block asynchronously, then letting you retrieve the exception (or value)
later. If an exception was raised in the block given to the future
originally, it will be re-raised when the value is requested every single
time.

If something goes wrong in an async call, you can either handle the error
within that method directly, or rely on the supervisor to restart the object
in a clean state. Really I think supervisors are going to be the de facto
way to handle errors in asynchronous calls. I don't think there's a lot of
good use cases for having callers handle errors in asynchronous calls that
aren't already covered by Celluloid::Future.

That's what I was trying to do, except I wasn't planning to deadlock. I was
planning to allow the call... somehow. Basically, if you had any sort of
pattern where two objects call methods on each other, it should work the
way
it does synchronously.

I think this makes sense, semantically. After all, if an actor calls a
method
on itself, we don't get any sort of deadlock. If an actor calls a method on
another object running in the same thread, which then calls a method on the
actor, at least with my implementation, this also doesn't deadlock -- and
in
yours, if I pass 'self' around, we get the same result. Why should it be
different if I call a method on another _actor_ which then calls a method
on
me?

Still, it's tricky to come up with an efficient way to do this, and I never
managed to get anything to work, no matter how inefficient.

Hmmmmmmmmmmm!

I think the best approach would be to wrap the dispatching of incoming calls
in a fiber. Whenever that fiber makes an outgoing call to another actor, it
defers back to the central receive loop which processes the mailbox. This
would let an actor continue processing incoming calls while waiting for a
response to a call.

You're actually the second person I've talked to who's proposed this in
regard to handling circular call chains, the other person was Steven Parkes
who created the Dramatis actor framework. At the time I had my head in
Reia/Erlang, where gen_server state is pure functional and immutable and
there would really be no way to implement this sort of approach. In a
language like Ruby, though, it's possible, and would actually be quite
similar to what you could do with plain old Ruby objects.

So, there is a way, but you probably won't like it...
You're right, I don't like that at all :)

I was much more interested in getting the semantics right, to show that it
can
be done, rather than making it performant and immediately useful. Like you,
I
wanted to use this to sort of prototype those semantics, with the hope that
they would get into something like Reia eventually. (I started this before
I
heard of Reia, and probably before Reia was in any way practical, so I
wasn't
deliberately reinventing the wheel.)

Well, now you definitely have me thinking. If I do allow an actor to process
multiple calls using fibers, I've definitely left the realm of what could be
done in a language like Reia. That sort of approach relies directly on
concurrent objects having mutable hidden state.

While this approach couldn't apply to Reia, I really like it's semantics,
and I think it solves the long-standing problem of circular calls. My answer
to this question for the past two years has been "circular calls are an
error", when really there should be a way to make them work.

Looking again, maybe the supervisor already does this?
supervisor = Sheen.supervise "Charlie Sheen"
charlie = supervisor.actor

This would solve both problems, right? (Assuming the supervisor is itself
threadsafe.) It could use some sugar, but I'm not entirely sure how.

The easiest way to add some sugar would be to have the supervisor create a
thread safe proxy object that always refers to the latest version of a given
actor. That way you could just use that object directly rather than always
having to call supervisor.actor to get to it.
 
D

David Masover

Still, I shouldn't have to create an entire new actor, link it to your
actor,
and have it trap errors in order to find the actual exception I caused
which
lead to the actor's death. Maybe it's appropriate for bang methods to
return
some object which can be used to retrieve an exception?
[...]
I don't think there's a
lot of good use cases for having callers handle errors in asynchronous
calls that aren't already covered by Celluloid::Future.

Maybe not, other than that Future applies to a block, where I want the result
of a method call. Maybe it's not a good use case, but this still seems cool:

actors.map(&:some_calculation).reduce{|a,b| ...}

I guess the bigger annoyance, though I didn't really have a good solution, is
that adopting bang to mean "asynchronous" means that these don't quite quack
like Ruby objects anymore -- they can't have bang methods of their own that
mean something, and every method gets a bang whether it makes sense or not.
You're actually the second person I've talked to who's proposed this in
regard to handling circular call chains, the other person was Steven Parkes
who created the Dramatis actor framework. At the time I had my head in
Reia/Erlang, where gen_server state is pure functional and immutable and
there would really be no way to implement this sort of approach. In a
language like Ruby, though, it's possible, and would actually be quite
similar to what you could do with plain old Ruby objects.

So, it's been awhile since I looked at Erlang, but I don't actually see an
obstacle to this in Erlang itself or in the VM. Maybe in gen_server.

But there's really nothing preventing me from creating the effect of mutable
state in a generic Erlang process, right?
You're right, I don't like that at all :)

I don't like it either, and I avoided it as much as I could. One thing I
thought of was trying to filter the reference any way that it would get out of
the object, since I was already wrapping things in futures and the like
anyway. The problem is, there's no guarantee that a bare 'self' will cross any
filter I set up. I mean, it'd be almost trivial to catch this:

def get_self
self
end

But what if they stuff it deep in some data structure? What if it's in a call
to some other object?

The other option was to make the blankslate-like proxy class a child class of
the original, so calling method 'foo' would look like:

original.instance_method:)foo).bind(self).call(*args, &block)

That's a minor win in that it might be somewhat more tolerant of the parent
classes being redefined. But it's not much of a win, because I have to watch
the parent classes anyway to remove methods from the child -- in fact, the
only sane way I could find to do that was to watch every single method
created. So this doesn't really buy me much.

A way to make that significantly better would be to bind those methods to a
BasicObject proxy instead, but you can't do that, because binding methods is
one case where Ruby is _not_ duck-typed at all -- you can only bind a method
to an object which is actually an instance of that class, or something which
inherits from or includes it.

In the end, while the approach I went with is pretty ridiculous, I still like
it for the simple reason that if I forget to call Celluloid.current_actor
instead of self, I've completely broken the concurrency model by doing the
normal Ruby thing. With my approach, aside from the fact that my attempt at
cycles currently deadlocks, I can still more or less pretend that an actor is
a normal object.
The easiest way to add some sugar would be to have the supervisor create a
thread safe proxy object that always refers to the latest version of a
given actor. That way you could just use that object directly rather than
always having to call supervisor.actor to get to it.

Except in this case, I'm thinking of the call to 'supervisor.actor' as being
something like starting a transaction. That is, let's say someone actually
convinces (or court-orders) Sheen to go to rehab before we let him back on the
road. So we might have a series of calls like:

charlie.rehab!
charlie.give license

But maybe the withdrawal kills him. If 'charlie' is a thread-safe proxy which
always refers to the latest version, we end up with a situation where rehab
kills him, we get a new version who hasn't been to rehab, and we give the new
version a license. This is clearly an error, and worse, it's almost silent.

By contrast, if we force people to call something like supervisor.actor to
start something like this, we end up with the best of both worlds -- we're
guaranteed he's alive before we send him to rehab, and we either fail (because
we have a dead actor) or ensure that he's actually recovered before we give
him the license.

Or, in other words, any time we're sending more than one message to an actor
and depending on those messages being processed in order, we need to know, at
a _minimum_, that we're talking to the same actor. On the other hand, in a
situation like this, we also have to think about what other calls might happen
in between -- for example:

charlie.rehab! if charlie.out_of_control?

There's potentially a race condition between receiving the out_of_control?
value and sending him to rehab. Still, if someone else kills charlie, he's
just as dead and I still don't want to give the new version a license until he
goes through rehab again.

Also: There has got to be a better metaphor.
 
T

Tony Arcieri

[Note: parts of this message were removed to make it a legal post.]

Hmmmmmmmmmmm!

I think the best approach would be to wrap the dispatching of incoming
calls in a fiber. Whenever that fiber makes an outgoing call to another
actor, it defers back to the central receive loop which processes the
mailbox. This would let an actor continue processing incoming calls while
waiting for a response to a call.

You're actually the second person I've talked to who's proposed this in
regard to handling circular call chains, the other person was Steven
Parkes who created the Dramatis actor framework. At the time I had my head
in Reia/Erlang, where gen_server state is pure functional and immutable and
there would really be no way to implement this sort of approach. In a
language like Ruby, though, it's possible, and would actually be quite
similar to what you could do with plain old Ruby objects.

If you check HEAD on Github, Celluloid now supports circular call graphs
by using fibers to dispatch methods:

https://github.com/tarcieri/celluloid
 
T

Tony Arcieri

[Note: parts of this message were removed to make it a legal post.]

If you check HEAD on Github, Celluloid now supports circular call graphs
by using fibers to dispatch methods

And to clarify this a little bit, where before A -> B -> A synchronous call
chains would deadlock your program, now it works!

This brings Celluloid actors one step closer to working as close as possible
to sequential Ruby objects.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,902
Latest member
Elena68X5

Latest Threads

Top