Hard hang, 1.8.7 build 248. May be threading related.

W

Wayne Conrad

I've got a little piece of code that can hang Ruby 1.8.7 patchlevel
248 about one out of every 500 times it is run. Here is foo.rb:

#!/usr/bin/ruby1.8

# The puts is not necessary to reproduce the problem. It just makes
# it easy to tell when the problem happens.
puts ARGV.first
Thread.new do
sleep 1
end
system("#")

To reproduce the problem, execute foo.rb in a shell loop:

for i in `seq 10000` ; do ./foo.rb $i ; done

The hang is hard: A TERM (15) signal won't stop it. A KILL (9) signal
will.

If it's going to hang, it will do it in 10,000 iterations. On the
boxes I've tested, the hang is usually within the first 500 iterations.

I've got five linux boxes that will show this behavior, and one that
won't:

- There are three fairly fast Intel 4-core boxes that can reproduce
the problem. One is running Debian testing ("squeeze") and two
running Debian unstable ("sid").

- There is a moderate speed AMD two-core box that reproduces the
problem, but takes longer. It runs Debian testing.

- There is an old, slow single-core Intel box that does not show the
problem. It runs Debian testing.

Although I found the problem using the Debian ruby/libruby packages, I
confirmed that the problem happens when using a Ruby built without any
of Debian's patches.

Using the git tree at git://git.phusion.nl/ruby.git (branch
v1_8_7_248) I used git bisect and found that the following commit
introduced this problem:

commit d83cd902207920368dfe2de34a4be37fc774e6c8
Author: shyouhei <shyouhei@b2dd03c8-39d4-4d8f-98ff-823fe69b080e>
Date: Tue Jul 14 11:31:37 2009 +0000

merge revision(s) 23202,23268,23305:
* eval.c (safe_mutex_lock): pthread_cleanup_push() must not be
inside parens.
* eval.c (rb_thread_start_timer): guard condition was inverted.
[ruby-dev:38319]
* eval.c (get_ts): use readtime clock. [ruby-dev:38354]
* eval.c (rb_thread_stop_timer): clear thread_init while
locking.

This is a small commit that doesn't change very many lines, so I was
able to eliminate all but a single changed line as the cause of this
problem:

--- a/eval.c
+++ b/eval.c
@@ -12316,7 +12316,7 @@ rb_thread_start_timer()
void *args[2];
static pthread_cond_t start = PTHREAD_COND_INITIALIZER;

- if (!thread_init) return;
+ if (thread_init) return;
args[0] = &time_thread;
args[1] = &start;
safe_mutex_lock(&time_thread.lock);

I think this is CVS rev 23268, ruby-dev:38319, bug #1402.

I confirmed that this line is the problem by reverse-applying it to
build 248 and seeing that the problem no longer occurs. But rev 23268
seems correct to my untrained eye, so perhaps the real problem is
somewhere else.

Best Regards,
Wayne Conrad
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top