timeouts with threads and SIGALRM

E

Eric Schwartz

Following advice in an old ruby-talk thread (can't remember which one,
offhand), I'm trying to implement a timeout with threads. The
canonical example was something like:

Thread.new do
sleep 5
Process.kill "ALRM", $$
end
begin
.... some stuff ...
rescue SignalException => se
return FAIL_CODE
end

Well, the problem with that is that the thread keeps executing, and if
the process as a whole takes more than 5 seconds to complete, then the
SIGALRM kills the process.

So fine, thinks I, I'll just stop the thread when I'm done with it.
No problem. Only I can't figure out how. I can't call Thread#stop
from outside the thread. What I finally ended up with was something
like:

rv = nil
Thread.new do
sleep 5
Process.kill "ALRM", $$ if rv == nil
end
..... catch SIGALRM if it happens ...
rv = query_some_stuff
....
return rv

I'm not excruciatingly happy about this solution, and I thought I'd
open it up to the Ruby community: What's the best way to do the
equivalent of 'alarm(5)' in C?

-=Eric
 
L

Lennon Day-Reynolds

Is this more like what you need?

trap "ALRM" do
puts "Look's like we've worn out our welcome. Goodbye!"
exit
end

Thread.new do
sleep 5
Process.kill "ALRM", $$
end

puts "Waiting for the fun to start..."
i = 1
loop do
puts i.to_s
i += 1
sleep 1
end

...or do you definitely need to catch the SignalException within your main code?

Lennon
 
E

Eric Schwartz

Lennon Day-Reynolds said:
Is this more like what you need?

..or do you definitely need to catch the SignalException within your main code?

I really do need to catch it within my main code. I'm querying a
number of remote machines for test status. Sometimes, very rarely,
that query will just go off into the weeds and stay there. I don't
know why yet, so I want to leave that situation in place, mark that
machine as nonresponsive, and move on. AFAIK, catching
SignalException is the only way to do that.

-=Eric
 
L

Lennon Day-Reynolds

Eric,

I realized that this probably should have been my first suggestion:
how about using the standard 'timeout' module to accomplish the same
thing?

Ex:

require 'timeout'

begin
timeout(TIMELIMIT) do
my_sometimes_too_long_method()
end
rescue Timeout::Error
# Handle timeout here
end
 
J

Joel VanderWerf

Lennon said:
Eric,

I realized that this probably should have been my first suggestion:
how about using the standard 'timeout' module to accomplish the same
thing?

Ex:

require 'timeout'

begin
timeout(TIMELIMIT) do
my_sometimes_too_long_method()
end
rescue Timeout::Error
# Handle timeout here
main_thread.raise WhateverException

This addition lets you handle the exception in your main thread.
 
E

Eric Schwartz

Lennon Day-Reynolds said:
I realized that this probably should have been my first suggestion:
how about using the standard 'timeout' module to accomplish the same
thing?

I'll see if it works. I'm currently surprised by the fact that
somehow the SIGALRM I'm sending to the main process isn't apparently
being received. I'm currently building an instrumented Ruby
interpreter to validate that it's not Ruby's fault.
require 'timeout'

begin
timeout(TIMELIMIT) do
my_sometimes_too_long_method()
end
rescue Timeout::Error
# Handle timeout here
end

Alas, no love with this example. my_sometimes_too_long_method() just
goes on forever. I guess I'll just have to wait until my instrumented
interpreter finishes building.

-=Eric
 
E

Eric Schwartz

Eric Schwartz said:
Alas, no love with this example. my_sometimes_too_long_method() just
goes on forever. I guess I'll just have to wait until my instrumented
interpreter finishes building.

Not to follow up on myself or anything, but trying to rebuild the
Debian ruby1.8 package gives me:

$ make
../ext/extmk.rb:27:in `require': unexpected break (LocalJumpError)
from ./ext/extmk.rb:27
make: *** [all] Error 1

I couldn't find anything obvious from poking at google-- if anybody
has advice to share, I'd welcome it.

-=Eric
 
L

Lennon Day-Reynolds

Eric,

I'm not sure what your problem with the Ruby rebuild is, (though I
might recommend just doing a local build of the 1.8.1 sources, rather
than the Debian package) but I may have an idea about the
SIGALRM/timeout issue you're having.

Is the long-running method calling out into C code? Even something
like a socket operation? If so, that system code may be blocking
signals before they can percolate up to the Ruby layer I would try
sending signals from outside the Ruby process to see if they can
interrupt it during the long method.

Lennon
 
E

Eric Schwartz

Lennon Day-Reynolds said:
Is the long-running method calling out into C code? Even something
like a socket operation? If so, that system code may be blocking
signals before they can percolate up to the Ruby layer I would try
sending signals from outside the Ruby process to see if they can
interrupt it during the long method.

I'm way ahead of you. :) I've tried it with a fork() instead of a new
thread, and I've even sent signals from a completely separate shell
process. No dice. I'm 99% sure it's the Ruby interpreter's fault,
because although I know that multiple SIGALRMs can be condensed into
one, I've never heard of only one taking over 30 seconds to be sent to
the process it's intended for.

-=Eric
 
E

Eric Schwartz

Lennon Day-Reynolds said:
Is the long-running method calling out into C code? Even something
like a socket operation?

I forgot to mention: yes, this is exactly what's happening. I built a
Ruby extension for the STAF library:

http://sourceforge.net/tracker/?group_id=33142&atid=407383

The STAF library itself is doing all sorts of C++ weirdness I dare not
attempt to decipher, lest I go insane trying. I fear some bizarre
interaction between STAF and Ruby, perhaps.

-=Eric
 
L

Lennon Day-Reynolds

Eric,

It could be the interpreter, or it could be something inside the STAF
library itself trapping SIGALRM, and not letting the events reach the
intepreter (though a testing library that didn't allow you to use
SIGALRM in the code being tested.

However, I really know nothing about STAF, so I couldn't speculate as
to what might be causing the problem. I've never had any problems with
the Kernel.trap method in Ruby before, which is the only reason I keep
leaning towards the bug being elsewhere.

Have the STAF maintainers been able to offer any sense of whether
other language bindings (specifically, I notice they list Python on
the homepage) have had any problems with signal handling?
 
E

Eric Schwartz

Eric Schwartz said:
I'll see if it works. I'm currently surprised by the fact that
somehow the SIGALRM I'm sending to the main process isn't apparently
being received. I'm currently building an instrumented Ruby
interpreter to validate that it's not Ruby's fault.

Okay, it's Ruby's fault. Or, more probably, my fault for how I am
extending Ruby.

I instrumented signal.c, and what I've found is that sighandler() is
being called for the SIGALRM. In it, rb_trap_immediate is NOT set, so
rb_trap_pending is incremented, and the SIGALRM entry in trap_pending
list is incremented. So far so good-- it appears this is Ruby's way
of deferring handling of signals until it's safe to handle them.

The problem is, this signal is never getting handled. And, well,
kinda the point of a SIGALRM is that it gets sent in a reasonably
timely manner. :) I've noticed this behaviour seems to exist with
every signal, though, except SIGSTOP and SIGKILL (for obvious
reasons).

My code is at
http://sourceforge.net/tracker/?group_id=33142&atid=407383 if anyone
wants to double-check me. My questions are:

* Is there some way to force Ruby to deliver this signal?
* How can I tell why it's not being delivered?

Thanks for any help,

-=Eric
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,754
Messages
2,569,527
Members
44,998
Latest member
MarissaEub

Latest Threads

Top