Rinda: start_keeper bug? Looking for second opinion

Discussion in 'Ruby' started by Daniel Azuma, Aug 10, 2007.

  1. Daniel Azuma

    Daniel Azuma Guest

    I believe I've found a race condition in Rinda. But it seems a little
    "too easy", so I wonder if I'm missing something. I'd like a second
    opinion from someone more familiar with the implementation before I file
    a bug report.

    The issue is in Rinda::TupleSpace#move (and also #write and #read). Each
    of those methods calls Rinda::TupleSpace#start_keeper to start the
    "keeper" thread that checks periodically for expired tuples.
    Unfortunately, I believe the thread is started too early. As a
    consequence, it can terminate prematurely and leave the TupleSpace with
    expiring tuples and no thread to clear them out. (e.g. calls fail to
    time out properly, etc.)

    Here is the sequence of events, based on ruby 1.8.6p0 (but it seems that
    trunk has not changed since then).

    ts = Rinda::TupleSpace.new(1) # Check interval of 1 second
    # Don't write anything into the TupleSpace before...
    tup = ts.take([:foo], 1) # <- hangs instead of timing out

    Why? In tuplespace.rb, line 442 (TupleSpace#move), start_keeper is
    called. It starts the thread (line 568). If the thread is scheduled
    immediately, it will immediately check need_keeper? (line 579) and,
    discovering that the TupleSpace is empty, it promptly terminates. Later,
    the main thread pushes the WaitTemplateEntry (with the 1 second timeout)
    onto @take_waiter (line 454), expecting that the keeper thread will
    collect it after the entry expires. But the keeper thread is already
    gone. Thus, the wait template never dies, and the take call never
    returns-- at least until someone else comes in and starts another keeper
    thread.

    Shouldn't start_keeper be called AFTER pushing the wait template onto
    @take_waiter? Similarly, in #read, shouldn't line 479 come AFTER line
    486, and similarly also in #write? Am I missing something?

    Furthermore, it doesn't seem that start_keeper should care about
    remaining outside the synchronize sections in those methods. In fact, if
    anything, maybe it should be INSIDE the monitor lock to prevent
    interleaved calls to start_keeper (and resultant spawning of multiple
    keeper threads).

    Puzzled,
    Daniel Azuma

    --
    Posted via http://www.ruby-forum.com/.
    Daniel Azuma, Aug 10, 2007
    #1
    1. Advertising

  2. Daniel Azuma

    Eric Hodel Guest

    On Aug 9, 2007, at 23:48, Daniel Azuma wrote:
    > I believe I've found a race condition in Rinda. But it seems a little
    > "too easy", so I wonder if I'm missing something. I'd like a second
    > opinion from someone more familiar with the implementation before I
    > file
    > a bug report.
    >
    > The issue is in Rinda::TupleSpace#move (and also #write and #read).
    > Each
    > of those methods calls Rinda::TupleSpace#start_keeper to start the
    > "keeper" thread that checks periodically for expired tuples.
    > Unfortunately, I believe the thread is started too early. As a
    > consequence, it can terminate prematurely and leave the TupleSpace
    > with
    > expiring tuples and no thread to clear them out. (e.g. calls fail to
    > time out properly, etc.)
    >
    > Here is the sequence of events, based on ruby 1.8.6p0 (but it seems
    > that
    > trunk has not changed since then).
    >
    > ts = Rinda::TupleSpace.new(1) # Check interval of 1 second
    > # Don't write anything into the TupleSpace before...
    > tup = ts.take([:foo], 1) # <- hangs instead of timing out
    >
    > Why? In tuplespace.rb, line 442 (TupleSpace#move), start_keeper is
    > called. It starts the thread (line 568). If the thread is scheduled
    > immediately, it will immediately check need_keeper? (line 579) and,
    > discovering that the TupleSpace is empty, it promptly terminates.
    > Later,
    > the main thread pushes the WaitTemplateEntry (with the 1 second
    > timeout)
    > onto @take_waiter (line 454), expecting that the keeper thread will
    > collect it after the entry expires. But the keeper thread is already
    > gone. Thus, the wait template never dies, and the take call never
    > returns-- at least until someone else comes in and starts another
    > keeper
    > thread.
    >
    > Shouldn't start_keeper be called AFTER pushing the wait template onto
    > @take_waiter? Similarly, in #read, shouldn't line 479 come AFTER line
    > 486, and similarly also in #write? Am I missing something?


    I think you are right. I have CC'd Masatoshi SEKI.

    > Furthermore, it doesn't seem that start_keeper should care about
    > remaining outside the synchronize sections in those methods. In
    > fact, if
    > anything, maybe it should be INSIDE the monitor lock to prevent
    > interleaved calls to start_keeper (and resultant spawning of multiple
    > keeper threads).


    At worst, a separate synchronize block could be put in start_keeper.

    --
    Poor workers blame their tools. Good workers build better tools. The
    best workers get their tools to do the work for them. -- Syndicate Wars
    Eric Hodel, Aug 11, 2007
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. JollyK
    Replies:
    4
    Views:
    318
    Kevin Spencer
    Apr 27, 2004
  2. =?Utf-8?B?cm9kY2hhcg==?=

    second opinion

    =?Utf-8?B?cm9kY2hhcg==?=, Sep 24, 2005, in forum: ASP .Net
    Replies:
    4
    Views:
    381
    Kevin Spencer
    Sep 25, 2005
  3. Kirk Haines

    Bug in Rinda (Drb 2.0.4)?

    Kirk Haines, May 6, 2004, in forum: Ruby
    Replies:
    2
    Views:
    89
    Kirk Haines
    May 6, 2004
  4. Silent Stone
    Replies:
    13
    Views:
    442
    Silent Stone
    Feb 18, 2012
  5. yelipolok
    Replies:
    4
    Views:
    239
    John W. Krahn
    Jan 27, 2010
Loading...

Share This Page