Rinda: start_keeper bug? Looking for second opinion

Daniel Azuma · Aug 10, 2007

I believe I've found a race condition in Rinda. But it seems a little
"too easy", so I wonder if I'm missing something. I'd like a second
opinion from someone more familiar with the implementation before I file
a bug report.

The issue is in Rinda::TupleSpace#move (and also #write and #read). Each
of those methods calls Rinda::TupleSpace#start_keeper to start the
"keeper" thread that checks periodically for expired tuples.
Unfortunately, I believe the thread is started too early. As a
consequence, it can terminate prematurely and leave the TupleSpace with
expiring tuples and no thread to clear them out. (e.g. calls fail to
time out properly, etc.)

Here is the sequence of events, based on ruby 1.8.6p0 (but it seems that
trunk has not changed since then).

ts = Rinda::TupleSpace.new(1) # Check interval of 1 second
# Don't write anything into the TupleSpace before...
tup = ts.take([:foo], 1) # <- hangs instead of timing out

Why? In tuplespace.rb, line 442 (TupleSpace#move), start_keeper is
called. It starts the thread (line 568). If the thread is scheduled
immediately, it will immediately check need_keeper? (line 579) and,
discovering that the TupleSpace is empty, it promptly terminates. Later,
the main thread pushes the WaitTemplateEntry (with the 1 second timeout)
onto @take_waiter (line 454), expecting that the keeper thread will
collect it after the entry expires. But the keeper thread is already
gone. Thus, the wait template never dies, and the take call never
returns-- at least until someone else comes in and starts another keeper
thread.

Shouldn't start_keeper be called AFTER pushing the wait template onto
@take_waiter? Similarly, in #read, shouldn't line 479 come AFTER line
486, and similarly also in #write? Am I missing something?

Furthermore, it doesn't seem that start_keeper should care about
remaining outside the synchronize sections in those methods. In fact, if
anything, maybe it should be INSIDE the monitor lock to prevent
interleaved calls to start_keeper (and resultant spawning of multiple
keeper threads).

Puzzled,
Daniel Azuma
(e-mail address removed)

Eric Hodel · Aug 11, 2007

I believe I've found a race condition in Rinda. But it seems a little
"too easy", so I wonder if I'm missing something. I'd like a second
opinion from someone more familiar with the implementation before I
file
a bug report.

The issue is in Rinda::TupleSpace#move (and also #write and #read).
Each
of those methods calls Rinda::TupleSpace#start_keeper to start the
"keeper" thread that checks periodically for expired tuples.
Unfortunately, I believe the thread is started too early. As a
consequence, it can terminate prematurely and leave the TupleSpace
with
expiring tuples and no thread to clear them out. (e.g. calls fail to
time out properly, etc.)

Here is the sequence of events, based on ruby 1.8.6p0 (but it seems
that
trunk has not changed since then).

ts = Rinda::TupleSpace.new(1) # Check interval of 1 second
# Don't write anything into the TupleSpace before...
tup = ts.take([:foo], 1) # <- hangs instead of timing out

Why? In tuplespace.rb, line 442 (TupleSpace#move), start_keeper is
called. It starts the thread (line 568). If the thread is scheduled
immediately, it will immediately check need_keeper? (line 579) and,
discovering that the TupleSpace is empty, it promptly terminates.
Later,
the main thread pushes the WaitTemplateEntry (with the 1 second
timeout)
onto @take_waiter (line 454), expecting that the keeper thread will
collect it after the entry expires. But the keeper thread is already
gone. Thus, the wait template never dies, and the take call never
returns-- at least until someone else comes in and starts another
keeper
thread.

Shouldn't start_keeper be called AFTER pushing the wait template onto
@take_waiter? Similarly, in #read, shouldn't line 479 come AFTER line
486, and similarly also in #write? Am I missing something?

I think you are right. I have CC'd Masatoshi SEKI.

Furthermore, it doesn't seem that start_keeper should care about
remaining outside the synchronize sections in those methods. In
fact, if
anything, maybe it should be INSIDE the monitor lock to prevent
interleaved calls to start_keeper (and resultant spawning of multiple
keeper threads).

At worst, a separate synchronize block could be put in start_keeper.

RINDA Probem	2	Aug 20, 2010
Rinda frustration	11	Sep 26, 2005
Four Rinda Questions	1	Nov 20, 2005
Rinda and notifications example?	1	Oct 26, 2005
Rinda not working through the network - what's wrong?	0	Mar 12, 2006
Distributed testing with Test::Unit and Rinda	0	Oct 9, 2006
I have to finish this code for my assignment but I cant figure out how to solve it	1	Jun 27, 2023
sys.setcheckinterval() never gets updated second time. Bug?	0	Apr 17, 2009

Rinda: start_keeper bug? Looking for second opinion

Daniel Azuma

Eric Hodel

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads