timeouts with threads and SIGALRM

Eric Schwartz · Aug 13, 2004

Following advice in an old ruby-talk thread (can't remember which one,
offhand), I'm trying to implement a timeout with threads. The
canonical example was something like:

Thread.new do
sleep 5
Process.kill "ALRM", $$
end
begin
.... some stuff ...
rescue SignalException => se
return FAIL_CODE
end

Well, the problem with that is that the thread keeps executing, and if
the process as a whole takes more than 5 seconds to complete, then the
SIGALRM kills the process.

So fine, thinks I, I'll just stop the thread when I'm done with it.
No problem. Only I can't figure out how. I can't call Thread#stop
from outside the thread. What I finally ended up with was something
like:

rv = nil
Thread.new do
sleep 5
Process.kill "ALRM", $$ if rv == nil
end
..... catch SIGALRM if it happens ...
rv = query_some_stuff
....
return rv

I'm not excruciatingly happy about this solution, and I thought I'd
open it up to the Ruby community: What's the best way to do the
equivalent of 'alarm(5)' in C?

-=Eric

Lennon Day-Reynolds · Aug 13, 2004

Is this more like what you need?

trap "ALRM" do
puts "Look's like we've worn out our welcome. Goodbye!"
exit
end

Thread.new do
sleep 5
Process.kill "ALRM", $$
end

puts "Waiting for the fun to start..."
i = 1
loop do
puts i.to_s
i += 1
sleep 1
end

...or do you definitely need to catch the SignalException within your main code?

Lennon

Eric Schwartz · Aug 13, 2004

Lennon Day-Reynolds said:
Is this more like what you need?

..or do you definitely need to catch the SignalException within your main code?

I really do need to catch it within my main code. I'm querying a
number of remote machines for test status. Sometimes, very rarely,
that query will just go off into the weeds and stay there. I don't
know why yet, so I want to leave that situation in place, mark that
machine as nonresponsive, and move on. AFAIK, catching
SignalException is the only way to do that.

-=Eric

Lennon Day-Reynolds · Aug 13, 2004

Eric,

I realized that this probably should have been my first suggestion:
how about using the standard 'timeout' module to accomplish the same
thing?

Ex:

require 'timeout'

begin
timeout(TIMELIMIT) do
my_sometimes_too_long_method()
end
rescue Timeout::Error
# Handle timeout here
end

Joel VanderWerf · Aug 13, 2004

Lennon said:
Eric,

I realized that this probably should have been my first suggestion:
how about using the standard 'timeout' module to accomplish the same
thing?

Ex:

require 'timeout'

begin
timeout(TIMELIMIT) do
my_sometimes_too_long_method()
end
rescue Timeout::Error
# Handle timeout here

main_thread.raise WhateverException

end

This addition lets you handle the exception in your main thread.

Eric Schwartz · Aug 13, 2004

Lennon Day-Reynolds said:
I realized that this probably should have been my first suggestion:
how about using the standard 'timeout' module to accomplish the same
thing?

I'll see if it works. I'm currently surprised by the fact that
somehow the SIGALRM I'm sending to the main process isn't apparently
being received. I'm currently building an instrumented Ruby
interpreter to validate that it's not Ruby's fault.

require 'timeout'

begin
timeout(TIMELIMIT) do
my_sometimes_too_long_method()
end
rescue Timeout::Error
# Handle timeout here
end

Alas, no love with this example. my_sometimes_too_long_method() just
goes on forever. I guess I'll just have to wait until my instrumented
interpreter finishes building.

-=Eric

Eric Schwartz · Aug 13, 2004

Eric Schwartz said:
Alas, no love with this example. my_sometimes_too_long_method() just
goes on forever. I guess I'll just have to wait until my instrumented
interpreter finishes building.

Not to follow up on myself or anything, but trying to rebuild the
Debian ruby1.8 package gives me:

$ make
../ext/extmk.rb:27:in `require': unexpected break (LocalJumpError)
from ./ext/extmk.rb:27
make: *** [all] Error 1

I couldn't find anything obvious from poking at google-- if anybody
has advice to share, I'd welcome it.

-=Eric

Lennon Day-Reynolds · Aug 13, 2004

Eric,

I'm not sure what your problem with the Ruby rebuild is, (though I
might recommend just doing a local build of the 1.8.1 sources, rather
than the Debian package) but I may have an idea about the
SIGALRM/timeout issue you're having.

Is the long-running method calling out into C code? Even something
like a socket operation? If so, that system code may be blocking
signals before they can percolate up to the Ruby layer I would try
sending signals from outside the Ruby process to see if they can
interrupt it during the long method.

Lennon

Eric Schwartz · Aug 14, 2004

Lennon Day-Reynolds said:
Is the long-running method calling out into C code? Even something
like a socket operation? If so, that system code may be blocking
signals before they can percolate up to the Ruby layer I would try
sending signals from outside the Ruby process to see if they can
interrupt it during the long method.

I'm way ahead of you.

I've tried it with a fork() instead of a new
thread, and I've even sent signals from a completely separate shell
process. No dice. I'm 99% sure it's the Ruby interpreter's fault,
because although I know that multiple SIGALRMs can be condensed into
one, I've never heard of only one taking over 30 seconds to be sent to
the process it's intended for.

-=Eric

Eric Schwartz · Aug 14, 2004

Lennon Day-Reynolds said:
Is the long-running method calling out into C code? Even something
like a socket operation?

I forgot to mention: yes, this is exactly what's happening. I built a
Ruby extension for the STAF library:

http://sourceforge.net/tracker/?group_id=33142&atid=407383

The STAF library itself is doing all sorts of C++ weirdness I dare not
attempt to decipher, lest I go insane trying. I fear some bizarre
interaction between STAF and Ruby, perhaps.

-=Eric

Lennon Day-Reynolds · Aug 14, 2004

Eric,

It could be the interpreter, or it could be something inside the STAF
library itself trapping SIGALRM, and not letting the events reach the
intepreter (though a testing library that didn't allow you to use
SIGALRM in the code being tested.

However, I really know nothing about STAF, so I couldn't speculate as
to what might be causing the problem. I've never had any problems with
the Kernel.trap method in Ruby before, which is the only reason I keep
leaning towards the bug being elsewhere.

Have the STAF maintainers been able to offer any sense of whether
other language bindings (specifically, I notice they list Python on
the homepage) have had any problems with signal handling?

Eric Schwartz · Aug 20, 2004

Eric Schwartz said:
I'll see if it works. I'm currently surprised by the fact that
somehow the SIGALRM I'm sending to the main process isn't apparently
being received. I'm currently building an instrumented Ruby
interpreter to validate that it's not Ruby's fault.

Okay, it's Ruby's fault. Or, more probably, my fault for how I am
extending Ruby.

I instrumented signal.c, and what I've found is that sighandler() is
being called for the SIGALRM. In it, rb_trap_immediate is NOT set, so
rb_trap_pending is incremented, and the SIGALRM entry in trap_pending
list is incremented. So far so good-- it appears this is Ruby's way
of deferring handling of signals until it's safe to handle them.

The problem is, this signal is never getting handled. And, well,
kinda the point of a SIGALRM is that it gets sent in a reasonably
timely manner.

I've noticed this behaviour seems to exist with
every signal, though, except SIGSTOP and SIGKILL (for obvious
reasons).

My code is at
http://sourceforge.net/tracker/?group_id=33142&atid=407383 if anyone
wants to double-check me. My questions are:

* Is there some way to force Ruby to deliver this signal?
* How can I tell why it's not being delivered?

Thanks for any help,

-=Eric

Timeouts with Thread#join and Net:HTTP	2	Mar 18, 2009
sockets, windoze, and threads	4	Mar 31, 2008
Threads	1	Oct 14, 2008
Using threads to show progress	21	Dec 15, 2009
Ruby & Threads	3	Jul 14, 2008
eventmachine and threads	3	Aug 25, 2009
Ruby 1.9.1, Threads and "[BUG] The handle is invalid."	10	Apr 21, 2010
Creating and Executing New Threads	4	May 11, 2011

timeouts with threads and SIGALRM

Eric Schwartz

Lennon Day-Reynolds

Eric Schwartz

Lennon Day-Reynolds

Joel VanderWerf

Eric Schwartz

Eric Schwartz

Lennon Day-Reynolds

Eric Schwartz

Eric Schwartz

Lennon Day-Reynolds

Eric Schwartz

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads