[ANN] celluloid 0.0.3: a concurrent object framework for Ruby

Discussion in 'Ruby' started by Tony Arcieri, Jun 17, 2011.

  1. Tony Arcieri

    Tony Arcieri Guest

    [Note: parts of this message were removed to make it a legal post.]

    Celluloid is a concurrent object framework for Ruby inspired by Erlang
    and the Actor Model:

    * Github: http://github.com/tarcieri/celluloid
    * RDoc: http://celluloid.github.com/

    Celluloid provides thread-backed objects that run concurrently,
    allowing the familiarity of plain old Ruby objects for the
    most common use cases, but also the ability to call methods
    asynchronously. Asynchronous method calls allow the receiver
    to do things in the background while the caller carries on with its
    business.

    If you're looking for a longer introduction, please check out this
    post on my blog:

    http://www.unlimitednovelty.com/2011/05/introducing-celluloid-concurrent-object.<http://www.unlimitednovelty.com/2011/05/introducing-celluloid-concurrent-object.html>
    html<http://www.unlimitednovelty.com/2011/05/introducing-celluloid-concurrent-object.html>

    Also view the screencast I did for EMRubyConf here:

     
    Tony Arcieri, Jun 17, 2011
    #1
    1. Advertisements

  2. Tony Arcieri

    Eric Wong Guest

    Cool!

    I assume this interacts transparently with existing apps that already
    use threads?

    For instance, Rainbows! already offers ThreadPool/ThreadSpawn options
    for hosting Rack applications, can the Rack applications themselves use
    Celluloid without any changes to Rainbows!?

    Or would we have to add explicit support for Celluloid in Rainbows!
    to support folks that want to write Rack apps using Celluloid?


    Rainbows! of course also supports Revactor, Cool.io, EventMachine,
    NeverBlock, XEpollThread*, etc... Adding explicit support for Celluloid
    isn't out of the question, just a matter of developer time.
     
    Eric Wong, Jun 17, 2011
    #2
    1. Advertisements

  3. That's pretty awesome. I was working on something like this, but ended up
    abandoning it after I had it deadlocking for awhile, and I couldn't get the
    semantics right.



    Speaking of semantics, my biggest problems were:

    - Is there a way to handle exceptions in a Ruby-esque way?

    It looks like I have to explicitly trap actor exceptions. But this is a place
    I have to be aware that this is an actor and not just a Ruby object. Your
    parallel map is a perfect example of what I'd actually want here: If an
    exception is raised, re-raise that exception when I try to call methods on the
    actor, rather than DeadActorException. Is there a reason not to do that?

    - How do you handle cycles?

    An actor can only process one method at a time, which makes sense. One thing I
    wanted to do was give two actors references to each other, so they can send
    messages back and forth. Futures seem like a good solution to avoid a lot of
    annoying asynchronous callbacks. Two problems:

    class Foo
    include Celluloid::Actor
    attr_reader :bar, :value
    def initialize
    @bar = Bar.spawn self
    @value = 3
    end

    def first
    bar.second!
    bar.result * 2
    end
    end

    class Bar
    include Celluloid::Actor
    attr_reader :foo
    def initialize parent
    @foo = parent
    end

    def second
    @result = foo.value + 3
    end

    def result
    @result
    end
    end


    This doesn't actually run -- it seems to deadlock on 'new'. But there are two
    other problems: First, 'self' wouldn't be an actor reference, it'd be the
    object itself, right? But more importantly, what happens when I call 'first'?
    It looks like we deadlock again, but it seems reasonable that since Foo is
    waiting for Bar, that maybe Bar can now call another method on Foo, acting as
    though the two actors were just plain objects and we're just building a call-
    stack.

    That's the part I could never get working.


    Couple other annoyances:

    - Why spawn instead of new? It seems like if I've decided to make something
    an actor, it's going to expect to be an actor most of the time -- it's hard to
    imagine a case where I want the original 'new' instead.

    - I really don't like the registry -- one flat namespace of actors? Ew. But
    I'm not really sure how to solve this -- some sort of super-reference, which
    points to the currently-alive actor from a given supervisor? But then I might
    send something which asynchronously kills the actor, and I'll get a fresh
    actor for the next line, which seems like a bad thing. There needs to be some
    clean semantics for "Give me a reference to the currently-alive version of
    this actor" which doesn't rely on a global, flat registry.

    And a thought: I just had every method return a future. If people wanted
    something to run asynchronously, all they had to do is ignore the future. The
    downside is that this makes it hard to force things to be synchronous. I
    actually thought of this as a good thing -- if I make the call up at the top
    of a method, and don't use the result till the bottom, that's some surprise
    parallelism right there. The biggest problem is that if there's an exception,
    you don't know about it until the future is resolved.
     
    David Masover, Jun 17, 2011
    #3
  4. Tony Arcieri

    Tony Arcieri Guest

    [Note: parts of this message were removed to make it a legal post.]

    Yep. Check out the screencast for Celluloid being used in conjunction with
    Sinatra.


    Indeed, Celluloid should work just fine with ThreadPool/ThreadSpawn.
     
    Tony Arcieri, Jun 17, 2011
    #4
  5. Tony Arcieri

    Tony Arcieri Guest

    [Note: parts of this message were removed to make it a legal post.]


    When making synchronous calls, exceptions which occur in the context of the
    receiver are automatically reraised in the caller just like any other Ruby
    object, regardless of if you're using any actor-specific features like
    linking or trapping exits. It will also crash the receiver.

    Your parallel map is a perfect example of what I'd actually want here: If an
    I think reraising the original exception in the caller context gives the
    caller appropriate context to bail out of whatever they're doing and avoid
    making subsequent calls at all. Other threads may be trying to make calls,
    and if an exception entirely unrelated to the calls they're making is raised
    because the actor is dead, I think that'd be rather confusing.

    - How do you handle cycles?
    I don't, but they can be detected if you don't mind a bit of a performance
    penalty. For that I need to track chains of synchronous calls and detect if
    the receiver of a given method exists earlier in the call chain. If so,
    Celluloid can raise an exception in the caller context indicating that a
    deadlock would occur. This is a bit of a glaring deficiency right now.

    First, 'self' wouldn't be an actor reference, it'd be the object itself,

    Yes. I provide Celluloid.current_actor to use in lieu of self. This feels a
    bit ugly, but I don't know of any way to redefine self (nor do I think
    that'd be a particularly good idea either)

    This is a good point. I could easily redefine new to have the same behavior
    as spawn.

    I don't know of a better solution. This is the same approach Erlang uses.
    The only evolution it's seen in recent history is systems like Ulf Wiger's
    gproc.

    That's an interesting approach, but a bit different than the one I'm
    shooting for in Celluloid, where I want concurrent objects to quack like
    normal Ruby objects as much as possible.
     
    Tony Arcieri, Jun 17, 2011
    #5
  6. That makes sense.

    But when making asynchronous calls:
    That makes a lot of sense.

    Still, I shouldn't have to create an entire new actor, link it to your actor,
    and have it trap errors in order to find the actual exception I caused which
    lead to the actor's death. Maybe it's appropriate for bang methods to return
    some object which can be used to retrieve an exception?
    That's what I was trying to do, except I wasn't planning to deadlock. I was
    planning to allow the call... somehow. Basically, if you had any sort of
    pattern where two objects call methods on each other, it should work the way
    it does synchronously.

    I think this makes sense, semantically. After all, if an actor calls a method
    on itself, we don't get any sort of deadlock. If an actor calls a method on
    another object running in the same thread, which then calls a method on the
    actor, at least with my implementation, this also doesn't deadlock -- and in
    yours, if I pass 'self' around, we get the same result. Why should it be
    different if I call a method on another _actor_ which then calls a method on
    me?

    Still, it's tricky to come up with an efficient way to do this, and I never
    managed to get anything to work, no matter how inefficient.
    So, there is a way, but you probably won't like it...

    One experiment I did here was:

    - Grab all methods, stuff them in a hash, and undef them.
    - When a method is called, intercept it like a proxy, and do whatever I need
    to do to get it to the right thread.
    - To actually call the method, grab the method object, bind it to self, and
    apply.

    It's not really redefining self, but it accomplishes what's needed here.

    However, I suspect it breaks all kinds of inheritance, unless I also absorb
    that kind of functionality -- that is, whenever something inherits from this
    class, give it a clone of the hash to start with.

    One advantage to this approach is that I could very easily allow some methods
    to require the actor thread, and some methods to run in the calling thread --
    by default, they run in the actor thread. The obvious application is when a
    method really doesn't need to involve the actor:

    class Sheen
    include Suit

    # define a new threadsafe method
    threadsafe :status do
    :winning
    end
    end

    But maybe you want to anyway:

    class Sheen
    include Suit

    attr_reader :status, :sober
    def initialize
    @status = :winning
    @sober = true
    end
    def fall_off_wagon!
    @status = :WINNING
    @sober = false
    end
    def is_off_wagon?
    !sober && status == :WINNING
    end

    threadsafe :hello do
    if is_off_wagon?
    puts 'WINNING!!!'
    else
    puts 'Hi.'
    end
    end
    end

    It makes sense that fall_off_wagon! and is_off_wagon? should run on the actor
    thread. It makes sense that the 'hello' method doesn't really need to run on
    the actor thread, and maybe it's a performance improvement that the Sheen
    thread doesn't actually have to talk, or ever wait for output, etc. I'm really
    reaching here, because I don't actually have a real application for this, but
    I don't think it's entirely unreasonable -- kind of like the Java
    'synchronized' keyword, except message-passing behavior is the default.

    But notice that the 'threadsafe' call doesn't have to call 'self' at all. In
    fact, that syntax is actually syntactic sugar for:

    def hello
    ...
    end
    threadsafe :hello

    I'm still just writing normal methods, but every method call, whether it's to
    'self' or not, is still going through the same logic to determine whether or
    not it needs to run on the Sheen thread.

    I was much more interested in getting the semantics right, to show that it can
    be done, rather than making it performant and immediately useful. Like you, I
    wanted to use this to sort of prototype those semantics, with the hope that
    they would get into something like Reia eventually. (I started this before I
    heard of Reia, and probably before Reia was in any way practical, so I wasn't
    deliberately reinventing the wheel.)
    Looking again, maybe the supervisor already does this?

    supervisor = Sheen.supervise "Charlie Sheen"
    charlie = supervisor.actor

    This would solve both problems, right? (Assuming the supervisor is itself
    threadsafe.) It could use some sugar, but I'm not entirely sure how.
    And this does quack like a normal Ruby object, unless something goes wrong and
    an exception is raised. But I was never quite satisfied with how exceptions
    were dealt with. For one thing, it's not OK that someone might ignore a future
    and never see the exception.
     
    David Masover, Jun 18, 2011
    #6
  7. Tony Arcieri

    Tony Arcieri Guest

    [Note: parts of this message were removed to make it a legal post.]

    If you want that sort of behavior, you can use the built-in
    Celluloid::Future functionality. It does exactly what you describe, calling
    a block asynchronously, then letting you retrieve the exception (or value)
    later. If an exception was raised in the block given to the future
    originally, it will be re-raised when the value is requested every single
    time.

    If something goes wrong in an async call, you can either handle the error
    within that method directly, or rely on the supervisor to restart the object
    in a clean state. Really I think supervisors are going to be the de facto
    way to handle errors in asynchronous calls. I don't think there's a lot of
    good use cases for having callers handle errors in asynchronous calls that
    aren't already covered by Celluloid::Future.

    That's what I was trying to do, except I wasn't planning to deadlock. I was
    Hmmmmmmmmmmm!

    I think the best approach would be to wrap the dispatching of incoming calls
    in a fiber. Whenever that fiber makes an outgoing call to another actor, it
    defers back to the central receive loop which processes the mailbox. This
    would let an actor continue processing incoming calls while waiting for a
    response to a call.

    You're actually the second person I've talked to who's proposed this in
    regard to handling circular call chains, the other person was Steven Parkes
    who created the Dramatis actor framework. At the time I had my head in
    Reia/Erlang, where gen_server state is pure functional and immutable and
    there would really be no way to implement this sort of approach. In a
    language like Ruby, though, it's possible, and would actually be quite
    similar to what you could do with plain old Ruby objects.

    So, there is a way, but you probably won't like it...
    You're right, I don't like that at all :)

    Well, now you definitely have me thinking. If I do allow an actor to process
    multiple calls using fibers, I've definitely left the realm of what could be
    done in a language like Reia. That sort of approach relies directly on
    concurrent objects having mutable hidden state.

    While this approach couldn't apply to Reia, I really like it's semantics,
    and I think it solves the long-standing problem of circular calls. My answer
    to this question for the past two years has been "circular calls are an
    error", when really there should be a way to make them work.

    Looking again, maybe the supervisor already does this?
    The easiest way to add some sugar would be to have the supervisor create a
    thread safe proxy object that always refers to the latest version of a given
    actor. That way you could just use that object directly rather than always
    having to call supervisor.actor to get to it.
     
    Tony Arcieri, Jun 18, 2011
    #7
  8. Maybe not, other than that Future applies to a block, where I want the result
    of a method call. Maybe it's not a good use case, but this still seems cool:

    actors.map(&:some_calculation).reduce{|a,b| ...}

    I guess the bigger annoyance, though I didn't really have a good solution, is
    that adopting bang to mean "asynchronous" means that these don't quite quack
    like Ruby objects anymore -- they can't have bang methods of their own that
    mean something, and every method gets a bang whether it makes sense or not.
    So, it's been awhile since I looked at Erlang, but I don't actually see an
    obstacle to this in Erlang itself or in the VM. Maybe in gen_server.

    But there's really nothing preventing me from creating the effect of mutable
    state in a generic Erlang process, right?
    I don't like it either, and I avoided it as much as I could. One thing I
    thought of was trying to filter the reference any way that it would get out of
    the object, since I was already wrapping things in futures and the like
    anyway. The problem is, there's no guarantee that a bare 'self' will cross any
    filter I set up. I mean, it'd be almost trivial to catch this:

    def get_self
    self
    end

    But what if they stuff it deep in some data structure? What if it's in a call
    to some other object?

    The other option was to make the blankslate-like proxy class a child class of
    the original, so calling method 'foo' would look like:

    original.instance_method:)foo).bind(self).call(*args, &block)

    That's a minor win in that it might be somewhat more tolerant of the parent
    classes being redefined. But it's not much of a win, because I have to watch
    the parent classes anyway to remove methods from the child -- in fact, the
    only sane way I could find to do that was to watch every single method
    created. So this doesn't really buy me much.

    A way to make that significantly better would be to bind those methods to a
    BasicObject proxy instead, but you can't do that, because binding methods is
    one case where Ruby is _not_ duck-typed at all -- you can only bind a method
    to an object which is actually an instance of that class, or something which
    inherits from or includes it.

    In the end, while the approach I went with is pretty ridiculous, I still like
    it for the simple reason that if I forget to call Celluloid.current_actor
    instead of self, I've completely broken the concurrency model by doing the
    normal Ruby thing. With my approach, aside from the fact that my attempt at
    cycles currently deadlocks, I can still more or less pretend that an actor is
    a normal object.
    Except in this case, I'm thinking of the call to 'supervisor.actor' as being
    something like starting a transaction. That is, let's say someone actually
    convinces (or court-orders) Sheen to go to rehab before we let him back on the
    road. So we might have a series of calls like:

    charlie.rehab!
    charlie.give license

    But maybe the withdrawal kills him. If 'charlie' is a thread-safe proxy which
    always refers to the latest version, we end up with a situation where rehab
    kills him, we get a new version who hasn't been to rehab, and we give the new
    version a license. This is clearly an error, and worse, it's almost silent.

    By contrast, if we force people to call something like supervisor.actor to
    start something like this, we end up with the best of both worlds -- we're
    guaranteed he's alive before we send him to rehab, and we either fail (because
    we have a dead actor) or ensure that he's actually recovered before we give
    him the license.

    Or, in other words, any time we're sending more than one message to an actor
    and depending on those messages being processed in order, we need to know, at
    a _minimum_, that we're talking to the same actor. On the other hand, in a
    situation like this, we also have to think about what other calls might happen
    in between -- for example:

    charlie.rehab! if charlie.out_of_control?

    There's potentially a race condition between receiving the out_of_control?
    value and sending him to rehab. Still, if someone else kills charlie, he's
    just as dead and I still don't want to give the new version a license until he
    goes through rehab again.

    Also: There has got to be a better metaphor.
     
    David Masover, Jun 19, 2011
    #8
  9. Tony Arcieri

    Tony Arcieri Guest

    [Note: parts of this message were removed to make it a legal post.]

    If you check HEAD on Github, Celluloid now supports circular call graphs
    by using fibers to dispatch methods:

    https://github.com/tarcieri/celluloid
     
    Tony Arcieri, Jun 20, 2011
    #9
  10. Tony Arcieri

    Tony Arcieri Guest

    [Note: parts of this message were removed to make it a legal post.]

    And to clarify this a little bit, where before A -> B -> A synchronous call
    chains would deadlock your program, now it works!

    This brings Celluloid actors one step closer to working as close as possible
    to sequential Ruby objects.
     
    Tony Arcieri, Jun 20, 2011
    #10
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.