Are all Ruby built-in objects thread safe?

Discussion in 'Ruby' started by Just Another Victim of the Ambient Morality, Dec 23, 2008.

  1. Are all built-in objects thread safe? For example, if I have an array
    and one thread is constant appending to it while another thread is shifting
    elements off of it and there's no synchronization going on, can the array
    object ever get corrupted? What about a similar scenario for hashes? These
    are surely complicated objects with internal state that must be maintained.
    Are they implemented to be thread safe?
    Thank you...
     
    Just Another Victim of the Ambient Morality, Dec 23, 2008
    #1
    1. Advertising

  2. On 23.12.2008 10:24, Just Another Victim of the Ambient Morality wrote:
    > Are all built-in objects thread safe? For example, if I have an array
    > and one thread is constant appending to it while another thread is shifting
    > elements off of it and there's no synchronization going on, can the array
    > object ever get corrupted? What about a similar scenario for hashes? These
    > are surely complicated objects with internal state that must be maintained.
    > Are they implemented to be thread safe?


    The answer will sound a bit odd: they are not built to be thread safe
    but it may well be that they are. In fact, it may well depend whether
    they are thread safe in practical terms depending on the Ruby version
    you are using. Since the classic interpreter uses green threads, i.e.
    no preemption is happening, it may be that some or all operations are
    atomic and thus thread safe.

    The bottom line is: you should not rely on this and assume they are not
    thread safe!

    In your example, you should be using class Queue which is thread safe
    and can be imported by requiring "thread"

    $ irb -r thread
    irb(main):001:0> q = Queue.new
    => #<Queue:0x7ff6afc0>
    irb(main):002:0> t = Thread.new(q) {|qq| while (o = qq.deq) != qq; p o; end}
    => #<Thread:0x7ff61c68 sleep>
    irb(main):003:0> 5.times {|i| q.enq i}; q.enq q; t.join
    0
    1
    2
    3
    4
    => #<Thread:0x7ff61c68 dead>

    Then there are also classes Mutex and Monitor plus module MonitorMixin.
    The difference is reentrance:

    irb(main):009:0> require 'monitor'
    => true
    irb(main):010:0> m=Mutex.new
    => #<Mutex:0x7ff3e100>
    irb(main):011:0> m.synchronize { m.synchronize { 1 } }
    ThreadError: stopping only thread
    note: use sleep to stop forever
    from (irb):11:in `synchronize'
    from (irb):11
    from (irb):11:in `synchronize'
    from (irb):11
    from (null):0
    irb(main):012:0> m=Monitor.new
    => #<Monitor:0x7ff329f4 @mon_count=0, @mon_owner=nil,
    @mon_waiting_queue=[], @mon_entering_queue=[]>
    irb(main):013:0> m.synchronize { m.synchronize { 1 } }
    => 1
    irb(main):014:0>

    Kind regards

    robert
     
    Robert Klemme, Dec 23, 2008
    #2
    1. Advertising

  3. Just Another Victim of the Ambient Morality wrote:
    > Are all built-in objects thread safe? For example, if I have an array
    > and one thread is constant appending to it while another thread is shifting
    > elements off of it and there's no synchronization going on, can the array
    > object ever get corrupted? What about a similar scenario for hashes? These
    > are surely complicated objects with internal state that must be maintained.
    > Are they implemented to be thread safe?


    This is a *very* interesting question! And it is a question that can
    ultimately *only* be answered by a formal Ruby Specification or more
    specifically a formal Ruby Memory Model.

    Until we have such a specification, the C source code of MRI or YARV
    is considered to be the "specfication". However, there is a problem:
    that source code can actually be interpreted several different ways.

    If you look at the implementations of Hash, Array and friends, you
    will see that they are not thread-safe. Ergo: the specification says
    that the user is responsible for locking Arrays and Hashes.

    If, however, you look at the implementation of threads, you will see
    that both MRI and YARV are actually incapable of running more than one
    thread at a time -- even on a 1000-core machine MRI and YARV will only
    ever use one core. So, since two threads can never access an Array at
    the same time, there is no need for locking. Ergo: the specification
    says that the user is *not* responsible for locking Arrays and Hashes.

    There is a conflict here -- on the one hand, Arrays aren't
    thread-safe, on the other hand, MRI's broken threading implementation
    accidentally *makes* them thread-safe. Which do you depend on? As it
    turns out, different people interpret this differently.

    A couple of months ago, this actually became an issue. Originally, the
    JRuby developers had implemented Arrays to be not safe. One of the big
    selling points of JRuby was and still is the promise of true
    concurrency and better scalability. So, naturally, people wanted to
    take advantage of this feature and started running their concurrent
    programs on JRuby. And those programs crashed left and right, because
    they didn't lock their Arrays properly. So, the JRuby team decided to
    implement thread-safe data structures on their end, so that code that
    didn't crash on MRI could be run unmodified on JRuby.

    However, they didn't *have* to do that. They could just as well have
    concluded that those programs were broken and *they* needed to become
    thread-safe. That would have been perfectly acceptable. And there is
    no guarantee that *all* Ruby Implementations will do it that way (and
    there's lots of them, something around 14 or so at the moment). Well,
    *unless* of course, there is a specification which tells them to.

    So, in short: when in doubt, lock.

    jwm
     
    Jörg W Mittag, Dec 23, 2008
    #3
  4. Jörg W Mittag wrote:
    > A couple of months ago, this actually became an issue. Originally, the
    > JRuby developers had implemented Arrays to be not safe. One of the big
    > selling points of JRuby was and still is the promise of true
    > concurrency and better scalability. So, naturally, people wanted to
    > take advantage of this feature and started running their concurrent
    > programs on JRuby. And those programs crashed left and right, because
    > they didn't lock their Arrays properly. So, the JRuby team decided to
    > implement thread-safe data structures on their end, so that code that
    > didn't crash on MRI could be run unmodified on JRuby.


    Actually, we made a minimal attempt to ensure that concurrent operations
    against Array were usually safe across threads but also would raise a
    Ruby-land "ConcurrencyError" when concurrent changes could not be
    reconciled. It's a trade-off; adding locks to all the core collections
    would severely penalize performance for what's generally the rare case
    of concurrent access. And really there should be a separate set of
    classes with guaranteed concurrency that people can use if the
    performance considerations of locking are acceptabe for safe concurrent
    access.

    So in short, you're absolutely right; nobody should ever rely on the
    core collections to be thread-safe, even if they happen to be
    thread-safe by accident in the C implementations right now. That won't
    be the case on all implementations, and may not even be the case on
    future versions of the C impls. The safe answer is to ensure you're
    watching your own back and properly synchronizing access to shared data
    structures.

    - Charlie
     
    Charles Oliver Nutter, Dec 23, 2008
    #4
  5. "Jörg W Mittag" <> wrote in message
    news:a4cg2zmfjkh9$-fqdn.de...
    > Just Another Victim of the Ambient Morality wrote:
    >> Are all built-in objects thread safe? For example, if I have an
    >> array
    >> and one thread is constant appending to it while another thread is
    >> shifting
    >> elements off of it and there's no synchronization going on, can the array
    >> object ever get corrupted? What about a similar scenario for hashes?
    >> These
    >> are surely complicated objects with internal state that must be
    >> maintained.
    >> Are they implemented to be thread safe?

    >
    > This is a *very* interesting question! And it is a question that can
    > ultimately *only* be answered by a formal Ruby Specification or more
    > specifically a formal Ruby Memory Model.
    >
    > Until we have such a specification, the C source code of MRI or YARV
    > is considered to be the "specfication". However, there is a problem:
    > that source code can actually be interpreted several different ways.
    >
    > If you look at the implementations of Hash, Array and friends, you
    > will see that they are not thread-safe. Ergo: the specification says
    > that the user is responsible for locking Arrays and Hashes.
    >
    > If, however, you look at the implementation of threads, you will see
    > that both MRI and YARV are actually incapable of running more than one
    > thread at a time -- even on a 1000-core machine MRI and YARV will only
    > ever use one core. So, since two threads can never access an Array at
    > the same time, there is no need for locking. Ergo: the specification
    > says that the user is *not* responsible for locking Arrays and Hashes.


    I don't think this is relevant. Concurrency isn't about how many
    processors you use. Multitasking systems existed long before SMP hardware
    existed. Concurrency is about doing tasks concurrently. If you have one
    method running and it may be preempted by another method then they are
    running concurrently. If the two methods share data then they may corrupt
    that data for each other. This is true regardless of how these
    concurrencies, or threads, are implemented. It doesn't matter if they're
    hardware supported system threads or if they're Ruby green threads...


    > So, in short: when in doubt, lock.


    This is the popular wisdom so this is what I will do. Better safe than
    not thread-safe!
     
    Just Another Victim of the Ambient Morality, Dec 24, 2008
    #5
  6. Just Another Victim wrote:
    > I don't think this is relevant. Concurrency isn't about how many
    > processors you use. Multitasking systems existed long before SMP
    > hardware
    > existed. Concurrency is about doing tasks concurrently. If you have
    > one
    > method running and it may be preempted by another method then they are
    > running concurrently. If the two methods share data then they may
    > corrupt
    > that data for each other. This is true regardless of how these
    > concurrencies, or threads, are implemented. It doesn't matter if
    > they're
    > hardware supported system threads or if they're Ruby green threads...


    Not exactly. Ruby green threads aren't fully pre-emptive; they will only
    pre-empt at the boundaries between execution steps, not within an
    execution step.

    Hence the single operations @array.pop and @array.push(...) are safe
    against preemption. You could consider each method call to be wrapped
    implicitly by Thread.critical { ... }
    --
    Posted via http://www.ruby-forum.com/.
     
    Brian Candler, Dec 24, 2008
    #6
  7. On Dec 24, 2008, at 1:56 AM, Just Another Victim of the Ambient
    Morality wrote:

    > I don't think this is relevant. Concurrency isn't about how many
    > processors you use. Multitasking systems existed long before SMP
    > hardware
    > existed. Concurrency is about doing tasks concurrently. If you
    > have one
    > method running and it may be preempted by another method then they are
    > running concurrently. If the two methods share data then they may
    > corrupt
    > that data for each other. This is true regardless of how these
    > concurrencies, or threads, are implemented. It doesn't matter if
    > they're
    > hardware supported system threads or if they're Ruby green threads...


    I think the key here is the granularity of Ruby's atomicity. You're
    assuming that preemption can occur on the granularity of machine
    instructions. Were that the case, two simultaneous threads on a single
    core could, potentially, cause problems. I think what Matz was saying
    is that, because of the GIL, simultaneous threads will only preempt at
    a much higher granularity.

    So I have a question for Matz and Charles: Would it be reasonable to
    specify that YARV instructions should be atomic? Charles, how does
    this work with JVM ops? Last I heard, JRuby was still skipping YARV
    and going straight to Java bytecodes, which could make this a
    difficult proposition. My completely uneducated guess, though, is that
    unless we specify that certain implementation provided data structures
    must be thread safe (at the very least Mutex), then there would have
    to be a minimum level at which everything is atomic to be able to
    write implementation independent thread-safe libraries.

    - Josh
     
    Joshua Ballanco, Dec 24, 2008
    #7
  8. Joshua Ballanco wrote:
    > I think the key here is the granularity of Ruby's atomicity. You're
    > assuming that preemption can occur on the granularity of machine
    > instructions. Were that the case, two simultaneous threads on a single
    > core could, potentially, cause problems. I think what Matz was saying is
    > that, because of the GIL, simultaneous threads will only preempt at a
    > much higher granularity.


    And rarely within C-based code, unless that code explicitly yields
    control to the thread scheduler. This also means that calls out to C
    libraries have to be written to use asynchronous calls or they just
    plain block all threads.

    In JRuby, threads may preempt at any time, and indeed can and will run
    "really" in parallel at any given time if the hardware supports it.

    > So I have a question for Matz and Charles: Would it be reasonable to
    > specify that YARV instructions should be atomic? Charles, how does this
    > work with JVM ops? Last I heard, JRuby was still skipping YARV and going
    > straight to Java bytecodes, which could make this a difficult
    > proposition. My completely uneducated guess, though, is that unless we
    > specify that certain implementation provided data structures must be
    > thread safe (at the very least Mutex), then there would have to be a
    > minimum level at which everything is atomic to be able to write
    > implementation independent thread-safe libraries.


    Everything in the thread(.rb) library obviously has thread-safety as
    part of its contract, so you don't have to worry about that. The core
    collections (Array, String, Hash) do not have such guarantees explicitly
    in their contract, and I believe they should stay that way since the
    vast majority of uses are single-threaded. We (JRuby) have additionally
    added locking (as smartly as possible) to ensure that method and
    instance variable tables are thread-safe, since they're crucial to
    Ruby's operation.

    I don't think YARV instructions are good atomic units. In JRuby we can't
    even (and won't even) guarantee individual Java bytecodes are atomic.
    Nothing can be atomic unless you lock around it, and in most cases you
    can't have atomicity and still allow code to execute in parallel. Plus
    YARV instructions cover a wide range of things, some of which are
    obviously not atomic like arbitrary method calls or test-and-set (||=,
    &&=) logic. Atomicity and thread-safety should be specified on a
    per-mutator basis for all the mutable structures in Ruby, rather than as
    a blanket assertion.

    - Charlie
     
    Charles Oliver Nutter, Dec 24, 2008
    #8
  9. Just Another Victim of the Ambient Morality wrote:
    > I don't think this is relevant. Concurrency isn't about how many
    > processors you use. Multitasking systems existed long before SMP hardware
    > existed. Concurrency is about doing tasks concurrently. If you have one
    > method running and it may be preempted by another method then they are
    > running concurrently. If the two methods share data then they may corrupt
    > that data for each other. This is true regardless of how these
    > concurrencies, or threads, are implemented. It doesn't matter if they're
    > hardware supported system threads or if they're Ruby green threads...


    Keep in mind though that on smp systems, most instructions are not
    atomic. On a single processor, they are.

    --
    vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407
     
    Joel VanderWerf, Dec 26, 2008
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Irmen de Jong
    Replies:
    5
    Views:
    385
    Irmen de Jong
    Sep 28, 2004
  2. Gabriel Rossetti
    Replies:
    0
    Views:
    1,332
    Gabriel Rossetti
    Aug 29, 2008
  3. RajNewbie

    Are python objects thread-safe?

    RajNewbie, Dec 21, 2008, in forum: Python
    Replies:
    6
    Views:
    3,279
    Aaron Brady
    Dec 23, 2008
  4. Replies:
    1
    Views:
    343
    Brian Candler
    Aug 12, 2003
  5. John Nagle
    Replies:
    5
    Views:
    475
    John Nagle
    Mar 12, 2012
Loading...

Share This Page