Rinda: start_keeper bug? Looking for second opinion

D

Daniel Azuma

I believe I've found a race condition in Rinda. But it seems a little
"too easy", so I wonder if I'm missing something. I'd like a second
opinion from someone more familiar with the implementation before I file
a bug report.

The issue is in Rinda::TupleSpace#move (and also #write and #read). Each
of those methods calls Rinda::TupleSpace#start_keeper to start the
"keeper" thread that checks periodically for expired tuples.
Unfortunately, I believe the thread is started too early. As a
consequence, it can terminate prematurely and leave the TupleSpace with
expiring tuples and no thread to clear them out. (e.g. calls fail to
time out properly, etc.)

Here is the sequence of events, based on ruby 1.8.6p0 (but it seems that
trunk has not changed since then).

ts = Rinda::TupleSpace.new(1) # Check interval of 1 second
# Don't write anything into the TupleSpace before...
tup = ts.take([:foo], 1) # <- hangs instead of timing out

Why? In tuplespace.rb, line 442 (TupleSpace#move), start_keeper is
called. It starts the thread (line 568). If the thread is scheduled
immediately, it will immediately check need_keeper? (line 579) and,
discovering that the TupleSpace is empty, it promptly terminates. Later,
the main thread pushes the WaitTemplateEntry (with the 1 second timeout)
onto @take_waiter (line 454), expecting that the keeper thread will
collect it after the entry expires. But the keeper thread is already
gone. Thus, the wait template never dies, and the take call never
returns-- at least until someone else comes in and starts another keeper
thread.

Shouldn't start_keeper be called AFTER pushing the wait template onto
@take_waiter? Similarly, in #read, shouldn't line 479 come AFTER line
486, and similarly also in #write? Am I missing something?

Furthermore, it doesn't seem that start_keeper should care about
remaining outside the synchronize sections in those methods. In fact, if
anything, maybe it should be INSIDE the monitor lock to prevent
interleaved calls to start_keeper (and resultant spawning of multiple
keeper threads).

Puzzled,
Daniel Azuma
(e-mail address removed)
 
E

Eric Hodel

I believe I've found a race condition in Rinda. But it seems a little
"too easy", so I wonder if I'm missing something. I'd like a second
opinion from someone more familiar with the implementation before I
file
a bug report.

The issue is in Rinda::TupleSpace#move (and also #write and #read).
Each
of those methods calls Rinda::TupleSpace#start_keeper to start the
"keeper" thread that checks periodically for expired tuples.
Unfortunately, I believe the thread is started too early. As a
consequence, it can terminate prematurely and leave the TupleSpace
with
expiring tuples and no thread to clear them out. (e.g. calls fail to
time out properly, etc.)

Here is the sequence of events, based on ruby 1.8.6p0 (but it seems
that
trunk has not changed since then).

ts = Rinda::TupleSpace.new(1) # Check interval of 1 second
# Don't write anything into the TupleSpace before...
tup = ts.take([:foo], 1) # <- hangs instead of timing out

Why? In tuplespace.rb, line 442 (TupleSpace#move), start_keeper is
called. It starts the thread (line 568). If the thread is scheduled
immediately, it will immediately check need_keeper? (line 579) and,
discovering that the TupleSpace is empty, it promptly terminates.
Later,
the main thread pushes the WaitTemplateEntry (with the 1 second
timeout)
onto @take_waiter (line 454), expecting that the keeper thread will
collect it after the entry expires. But the keeper thread is already
gone. Thus, the wait template never dies, and the take call never
returns-- at least until someone else comes in and starts another
keeper
thread.

Shouldn't start_keeper be called AFTER pushing the wait template onto
@take_waiter? Similarly, in #read, shouldn't line 479 come AFTER line
486, and similarly also in #write? Am I missing something?

I think you are right. I have CC'd Masatoshi SEKI.
Furthermore, it doesn't seem that start_keeper should care about
remaining outside the synchronize sections in those methods. In
fact, if
anything, maybe it should be INSIDE the monitor lock to prevent
interleaved calls to start_keeper (and resultant spawning of multiple
keeper threads).

At worst, a separate synchronize block could be put in start_keeper.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top