deadlock in ThreadPool using backtick

J

Justin Johnson

Hello,

I'm using the ThreadPool class as defined in the Ruby Cookbook, recipe
20.7 and running on Windows XP with the One Click installer of Ruby
1.8.6 patch level 0. Normally it works fine. However, whenever I call
a command using backticks I get a deadlock. The script I'm running is
below, followed by the output. To see it succeed, comment out the
backtick line and uncomment out the system line.

Does anyone know what the problem is?

Thanks,
Justin

##### BEGIN FILE ######
require 'thread'

class ThreadPool
def initialize(max_size)
@pool = []
@max_size = max_size
@pool_mutex = Mutex.new
@pool_cv = ConditionVariable.new
end
#---
def dispatch(*args)
Thread.new do
# Wait for space in the pool.
@pool_mutex.synchronize do
while @pool.size >= @max_size
print "Pool is full; waiting to run #{args.join(',')}...\n" if
$DEBUG
# Sleep until some other thread calls @pool_cv.signal.
@pool_cv.wait(@pool_mutex)
end
end
#---
@pool << Thread.current
begin
yield(*args)
rescue => e
exception(self, e, *args)
ensure
@pool_mutex.synchronize do
# Remove the thread from the pool.
@pool.delete(Thread.current)
# Signal the next waiting thread that there's a space in the
pool.
@pool_cv.signal
end
end
end
end

def shutdown
@pool_mutex.synchronize { @pool_cv.wait(@pool_mutex) until
@pool.empty? }
end

def exception(thread, exception, *original_args)
# Subclass this method to handle an exception within a thread.
puts "Exception in thread #{thread}: #{exception}"
end
end
#---



pool = ThreadPool.new 2
begin
1.upto(200) do |i|
pool.dispatch do
puts "Job #{i} started."
`dir c:\\winnt > NUL`
#system("dir c:\\winnt > NUL")
puts "Job #{i} stopped."
end
end
ensure
pool.shutdown
end
##### END FILE ######


##### BEGIN OUTPUT ######
C:\temp>ruby thread_test.rb
Job 1 started.
Job 2 started.
Job 1 stopped.
Job 2 stopped.
Job 3 started.
Job 6 started.
Job 5 started.
Job 4 started.
Job 3 stopped.
Job 6 stopped.
Job 5 stopped.
Job 4 stopped.
deadlock 0x2b45008: sleep:- - thread_test.rb:18
deadlock 0x2b450d0: sleep:- - thread_test.rb:18
deadlock 0x2b45198: sleep:- - thread_test.rb:18
deadlock 0x2b45260: sleep:- - thread_test.rb:18
... Skip many similar lines ...
deadlock 0x2b5627c: sleep:- - thread_test.rb:28
deadlock 0x2b5636c: sleep:- - thread_test.rb:28
deadlock 0x27ac748: sleep:- (main) - thread_test.rb:39
thread_test.rb:39:in `wait': Thread(0x27ac748): deadlock (fatal)
from thread_test.rb:39:in `shutdown'
from thread_test.rb:39:in `synchronize'
from thread_test.rb:39:in `shutdown'
from thread_test.rb:62

C:\temp>
##### END OUTPUT ######
 
R

Roger Pack

Hmm. I'm not a thread pro, but I'll give it a shot.
I don't think it's necessarily a problem with backtick, but with the
code itself :)

first off

Justin said:
Hello,

I'm using the ThreadPool class as defined in the Ruby Cookbook, recipe
20.7 and running on Windows XP with the One Click installer of Ruby
1.8.6 patch level 0. Normally it works fine. However, whenever I call
a command using backticks I get a deadlock. The script I'm running is
below, followed by the output. To see it succeed, comment out the
backtick line and uncomment out the system line.

Does anyone know what the problem is?

Thanks,
Justin

##### BEGIN FILE ######
require 'thread'

class ThreadPool
def initialize(max_size)
@pool = []
@max_size = max_size
@pool_mutex = Mutex.new
@pool_cv = ConditionVariable.new
end
#---
def dispatch(*args)
Thread.new do
# Wait for space in the pool.
@pool_mutex.synchronize do
while @pool.size >= @max_size
print "Pool is full; waiting to run #{args.join(',')}...\n" if
$DEBUG
# Sleep until some other thread calls @pool_cv.signal.
@pool_cv.wait(@pool_mutex)
end
end
#---

@pool << Thread.current

The above line should be within the synchronize (above), or it might
spawn more than @max_size threads (I think).
Replace with
@pool_mutex.synchronize do
while @pool.size >= @max_size
print "Pool is full; waiting to run #{args.join(',')}...\n" if
$DEBUG
# Sleep until some other thread calls @pool_cv.signal.
@pool_cv.wait(@pool_mutex)
end
@pool << Thread.current
end
begin
yield(*args)
rescue => e
exception(self, e, *args)
ensure
@pool_mutex.synchronize do

In reality you don't need to synchronize here, as the @pool size/add to
is synchronized elsewhere. Taking this synch out may introduce new
problems, but...I'm not sure what they are :)
My question is if the wait within the shutdown function will 'suck'
signals away from processes, decreasing the number that can go at a
time, if all are at the 'wait' phase. Perhaps rethinking this code
would be nice :)

-Roger

# Remove the thread from the pool.
@pool.delete(Thread.current)
# Signal the next waiting thread that there's a space in the
pool.
@pool_cv.signal
end
end
end
end

def shutdown
@pool_mutex.synchronize { @pool_cv.wait(@pool_mutex) until
@pool.empty? }
end

def exception(thread, exception, *original_args)
# Subclass this method to handle an exception within a thread.
puts "Exception in thread #{thread}: #{exception}"
end
end
#---



pool = ThreadPool.new 2
begin
1.upto(200) do |i|
pool.dispatch do
puts "Job #{i} started."
`dir c:\\winnt > NUL`
#system("dir c:\\winnt > NUL")
puts "Job #{i} stopped."
end
end
ensure
pool.shutdown
end
##### END FILE ######


##### BEGIN OUTPUT ######
C:\temp>ruby thread_test.rb
Job 1 started.
Job 2 started.
Job 1 stopped.
Job 2 stopped.
Job 3 started.
Job 6 started.
Job 5 started.
Job 4 started.
Job 3 stopped.
Job 6 stopped.
Job 5 stopped.
Job 4 stopped.
deadlock 0x2b45008: sleep:- - thread_test.rb:18
deadlock 0x2b450d0: sleep:- - thread_test.rb:18
deadlock 0x2b45198: sleep:- - thread_test.rb:18
deadlock 0x2b45260: sleep:- - thread_test.rb:18
... Skip many similar lines ...
deadlock 0x2b5627c: sleep:- - thread_test.rb:28
deadlock 0x2b5636c: sleep:- - thread_test.rb:28
deadlock 0x27ac748: sleep:- (main) - thread_test.rb:39
thread_test.rb:39:in `wait': Thread(0x27ac748): deadlock (fatal)
from thread_test.rb:39:in `shutdown'
from thread_test.rb:39:in `synchronize'
from thread_test.rb:39:in `shutdown'
from thread_test.rb:62

C:\temp>
##### END OUTPUT ######

There may also (I'm not positive about this), be a way that multiple
signals may then occur very quickly, and only one processed. Another
possible problem, too.

GL
-Roger
 
J

Justin Johnson

The above line should be within the synchronize (above), or it might
spawn more than @max_size threads (I think).
Replace with
@pool_mutex.synchronize do
while @pool.size >= @max_size
print "Pool is full; waiting to run #{args.join(',')}...\n" if
$DEBUG
# Sleep until some other thread calls @pool_cv.signal.
@pool_cv.wait(@pool_mutex)
end
@pool << Thread.current
end

I tried your suggestion but still get the error, and only when using
backticks or any other method that reads from stdout or stderr.
In reality you don't need to synchronize here, as the @pool size/add to
is synchronized elsewhere. Taking this synch out may introduce new
problems, but...I'm not sure what they are :)

I'm not sure what you mean. This makes sense to me, as we do not want
more than one thread removing itself from the list at a time.
My question is if the wait within the shutdown function will 'suck'
signals away from processes, decreasing the number that can go at a
time, if all are at the 'wait' phase. Perhaps rethinking this code
would be nice :)

I'm not sure what you mean by "suck signals away from processes". The
shutdown method is just waiting until all threads have ended so we don't
end our program before the threads are done.

It still seems to me that stdout and stderr have something to do with my
problem, since it always occurs when using backticks or even
win32/popen3 methods that read from stdout and stderr, but never with
system. Anyone else have any ideas on this?

Thanks for the reply.

Justin
 
J

Justin Johnson

It still seems to me that stdout and stderr have something to do with my
problem, since it always occurs when using backticks or even
win32/popen3 methods that read from stdout and stderr, but never with
system. Anyone else have any ideas on this?

I should also point out that the exact same script never fails on UNIX.
Maybe this is a Windows bug? :-(
 
R

Roger Pack

It seems (in my opinion--haven't checked it out too closely yet) that
the problem is only caused when the shutdown method is 'in the mix'--you
might try rewriting it to create an array of all threads trying to enter
the pool [i.e. array of size 200] and then joining on each element of
the array
while !allThreadArray.empty?
nextThread = allThreadArray.shift
nextThread.join
end

that type of thing.

Attempting it with the latest patch level of ruby might help.
-Roger
 
C

Clifford Heath

Roger said:
Attempting it with the latest patch level of ruby might help.

I had the same problem, posted here last Friday, and sure enough,
the problem seems fixed with the -p110 build. I wasn't doing any
I/O or using backticks - just two threads synchronizing using Mutex
and ConditionVariable. The -p0 implemention is catastrophically
broken and unusable, and apparently untested as well.

One thread is waiting in the ConditionVariable, another tries to
enter the Mutex so it can change things and signal the condition,
but deadlock is declared, killing the process. How hard can this be
to test for, really? Appalling.

I believe the -p110 version of the One-Click installer is waiting
on some fixes to the ODBC drivers. I hadn't realized what a boon
this installer is until I tried to configure everything manually.
Hint: Get the Windows build (URL below), the rubygems ZIP, and
minimally, ssleay32.dll and libeay32.dll from openssl. Put it all
together and you can install the rest from GEMs.

<http://www.garbagecollect.jp/ruby/mswin32/en/download/release.html>,
<http://jarp.does.notwork.org/win32/>.

.... or offer Curt Hibbs some help getting the latest one-click
installer finished.

Clifford Heath.
 
C

Clifford Heath

Clifford said:
Put it all together and you can install the rest from GEMs.

Oh, whoops, you need zlib as well for gems to work.
Same location as openssl. Sorry.
 
R

Roger Pack

Maybe once you get a working copy you can zip it up and put it online
for lazy folks like me to get (until one-click gets fixed). Or if you
need some space I could host it or whatever.
-Roger
 
C

Clifford Heath

Roger said:
Maybe once you get a working copy you can zip it up and put it online
for lazy folks like me to get (until one-click gets fixed).

I don't think it's worth it. You only have to add rubygems and 3 DLLs d/l'd
from one site to the Windows binary build, then whatever else you need comes
from gems. My bundle includes Gtk and RMagick, which you probably don't want.
 
R

Roger Pack

I'm using the ThreadPool class as defined in the Ruby Cookbook, recipe
20.7 and running on Windows XP with the One Click installer of Ruby
1.8.6 patch level 0. Normally it works fine. However, whenever I call
a command using backticks I get a deadlock. The script I'm running is
below, followed by the output. To see it succeed, comment out the
backtick line and uncomment out the system line.

Turns out that using backticks with OS X can cause threads to lock up,
too, in a multi-threaded environment. Not sure if this is a prob with
backticks or just Ruby I/O in general, or just mac OS X/win32 specific
(doubtful on the last one).
GL
-Roger
 
A

ara.t.howard

Does anyone know what the problem is?

my mind is much too feeble to write robust threaded code that's safe
on all platforms. invert your process flow and it's easy: this works
on *nix and winblows ->


cfp:~ > cat a.rb

require 'thread'

class ThreadPool < ::Array
Done = Object.new.freeze

attr 'q'
attr 'exception_handler'

def initialize size = 42, &exception_handler
@q = Queue.new
@exception_handler = exception_handler || lambda{|e| raise}
Integer(size).times{ push new_consumer }
end

def new_consumer
Thread.new do
Thread.current.abort_on_exception
loop do
object = q.pop
break if object == Done
begin
object.call
rescue Exception => e
exception_handler.call e
end
end
end
end

def dispatch &block
@q.push block
end

def shutdown
size.times{ q.push Done }
each{|t| t.join}
end
end


null = test(?e, "/dev/null") ? "/dev/null" : "NUL"
pool = ThreadPool.new 2

begin
1.upto(200) do |i|
pool.dispatch do
puts "Job #{ i } started."
command = "ruby -e' fill_stdout_pipe = 42.chr * 8193; puts
fill_stdout_pipe '"
` #{ command } `
system "#{ command } > #{ null } 2>&1"
puts "Job #{ i } stopped."
end
end
ensure
pool.shutdown
end




by spooling up all the thread at once you'll save the overhead of
creating all those threads. the @q.pop stuff does all the condition
variable stuff you were trying to do so you can check that code out
if you want to.

ps - when testing stuff that reads output (backtick, popen, or
blatant io) be sure to test will a full pipe on stdout - typically >
8192 bytes.


kind regards.

a @ http://codeforpeople.com/
 
C

Charles Oliver Nutter

ara.t.howard said:
my mind is much too feeble to write robust threaded code that's safe on
all platforms. invert your process flow and it's easy: this works on
*nix and winblows -> ...
by spooling up all the thread at once you'll save the overhead of
creating all those threads. the @q.pop stuff does all the condition
variable stuff you were trying to do so you can check that code out if
you want to.

Isn't this largely just working around a lack of concurrency in Ruby's
threading model (or perhaps a bug in backtick handling)? With concurrent
threads I can implement the same thing in a trivial amount of code and
not have to worry about backticks et al blocking, nor about spinning up
processes. In fact with native threads the original ThreadPool from Ruby
Cookbook ought to work fine for basically anything you want to throw at
it (including backticks).

Perhaps backtick needs to be fixed in MRI so it doesn't block the entire
runtime?

- Charlie
 
A

ara.t.howard

Isn't this largely just working around a lack of concurrency in
Ruby's threading model (or perhaps a bug in backtick handling)?
With concurrent threads I can implement the same thing in a trivial
amount of code and not have to worry about backticks et al
blocking, nor about spinning up processes. In fact with native
threads the original ThreadPool from Ruby Cookbook ought to work
fine for basically anything you want to throw at it (including
backticks).

Perhaps backtick needs to be fixed in MRI so it doesn't block the
entire runtime?

in general you may be correct - but i think the OP's code has a race
condition. if you look at the stacktrace it's not reading stdout
(backticks) that's got everyone's undies in a wedgie - it's the
shutdown method. so basically the original code starts out by firing
off a bunch of threads, limited at the number 2. let's assume it
safely got two running, but that none of have completed, that leaves
every other thread blocked here

@pool_mutex.synchronize do
while @pool.size >= @max_size
# Sleep until some other thread calls @pool_cv.signal.
@pool_cv.wait(@pool_mutex)

so let's assume they've all just went to sleep...

now the main code starts to shutdown, calling

def shutdown
@pool_mutex.synchronize ...

so it has the lock

now the first two finish and try to take them selves out of the @pool
by doing

ensure
@pool_mutex.synchronize do
@pool.delete(Thread.current)

and of course go to sleep since the main thread has that lock

now the main thread goes to sleep via

@pool_cv.wait(@pool_mutex) ...

so now everyone is asleep and no one will be signaled - deadlock.

i think the backticks are a red herring and that the shutdown method
would need to be reworked, at minimum.


threads are just too complicated without Queue IMHO. the classic
race condition you see all over the place is

Thread.new{ thread_working }

something_that_depends_on_thread_working

but that only works sometimes, specifically when the thread_working
gets started before the next line - which is not always true. etc. etc.

i just glanced at the code so maybe i've overlooked something - but i
think the OP's code is racing, not ruby.

cheers.

a @ http://codeforpeople.com/
 
R

Roger Pack

Turns out that the largest problem with the OP was that they were using
1.8.6 patch level 0, which had some buggy buggy thread stuff, so it was
declaring 'deadlock' in error. You'll notice some of the threads
'stuck' on lines that should go forward always.

I would point out, however, that there still seem to be some concurrency
problems in the 1.8.6 MRI.
in general you may be correct - but i think the OP's code has a race
condition. if you look at the stacktrace it's not reading stdout
(backticks) that's got everyone's undies in a wedgie - it's the
shutdown method. so basically the original code starts out by firing

I always questioned the shutdown method. I think overall it might still
work using the current shutdown method 'if the spawned threads happen to
all get to cv.wait before the shutdown thread does' as it probably puts
them in an array and sends 'signal' to the first ones that waited for
the signal, which would be the 'real' threads not the shutdown thread.
Just my $0.02.
 
C

Charles Oliver Nutter

ara.t.howard said:
in general you may be correct - but i think the OP's code has a race
condition. if you look at the stacktrace it's not reading stdout
(backticks) that's got everyone's undies in a wedgie - it's the shutdown
method. so basically the original code starts out by firing off a bunch
of threads, limited at the number 2. let's assume it safely got two
running, but that none of have completed, that leaves every other thread
blocked here

Yes, what you pointed out is completely busted. I hadn't seen that code.
But why would you spin things off into a bunch of processes? Just fixing
shutdown should make it work fine, if backticks aren't the problem.

- Charlie
 
A

ara.t.howard

Yes, what you pointed out is completely busted. I hadn't seen that
code. But why would you spin things off into a bunch of processes?
Just fixing shutdown should make it work fine, if backticks aren't
the problem.

you could just fix shutdown to be sure. but the way i think is that
i'd be right back in the hot seat when it came time to implement
#pause and #start, etc. condition vars are, to me, just code smell -
they're asking for trouble. the other reason is that, to me, the
name 'ThreadPool' suggests a pool of threads, not a stack of ever
changing threads. so, in much the same way as i'd expect something
called 'ConnectionPool' to reuse connections i except a thread pool
to reuse a few threads. that's reading into the code i realize - but
that's my personal through process. as you well know there are
always many ways to skin a cat.

kind regards.

a @ http://codeforpeople.com/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,983
Messages
2,570,188
Members
46,755
Latest member
HudsonP082

Latest Threads

Top