timeouts with threads and SIGALRM

Discussion in 'Ruby' started by Eric Schwartz, Aug 13, 2004.

  1. Following advice in an old ruby-talk thread (can't remember which one,
    offhand), I'm trying to implement a timeout with threads. The
    canonical example was something like:

    Thread.new do
    sleep 5
    Process.kill "ALRM", $$
    .... some stuff ...
    rescue SignalException => se
    return FAIL_CODE

    Well, the problem with that is that the thread keeps executing, and if
    the process as a whole takes more than 5 seconds to complete, then the
    SIGALRM kills the process.

    So fine, thinks I, I'll just stop the thread when I'm done with it.
    No problem. Only I can't figure out how. I can't call Thread#stop
    from outside the thread. What I finally ended up with was something

    rv = nil
    Thread.new do
    sleep 5
    Process.kill "ALRM", $$ if rv == nil
    ..... catch SIGALRM if it happens ...
    rv = query_some_stuff
    return rv

    I'm not excruciatingly happy about this solution, and I thought I'd
    open it up to the Ruby community: What's the best way to do the
    equivalent of 'alarm(5)' in C?

    Eric Schwartz, Aug 13, 2004
    1. Advertisements

  2. Is this more like what you need?

    trap "ALRM" do
    puts "Look's like we've worn out our welcome. Goodbye!"

    Thread.new do
    sleep 5
    Process.kill "ALRM", $$

    puts "Waiting for the fun to start..."
    i = 1
    loop do
    puts i.to_s
    i += 1
    sleep 1

    ...or do you definitely need to catch the SignalException within your main code?

    Lennon Day-Reynolds, Aug 13, 2004
    1. Advertisements

  3. I really do need to catch it within my main code. I'm querying a
    number of remote machines for test status. Sometimes, very rarely,
    that query will just go off into the weeds and stay there. I don't
    know why yet, so I want to leave that situation in place, mark that
    machine as nonresponsive, and move on. AFAIK, catching
    SignalException is the only way to do that.

    Eric Schwartz, Aug 13, 2004
  4. Eric,

    I realized that this probably should have been my first suggestion:
    how about using the standard 'timeout' module to accomplish the same


    require 'timeout'

    timeout(TIMELIMIT) do
    rescue Timeout::Error
    # Handle timeout here
    Lennon Day-Reynolds, Aug 13, 2004
  5. main_thread.raise WhateverException
    This addition lets you handle the exception in your main thread.
    Joel VanderWerf, Aug 13, 2004
  6. I'll see if it works. I'm currently surprised by the fact that
    somehow the SIGALRM I'm sending to the main process isn't apparently
    being received. I'm currently building an instrumented Ruby
    interpreter to validate that it's not Ruby's fault.
    Alas, no love with this example. my_sometimes_too_long_method() just
    goes on forever. I guess I'll just have to wait until my instrumented
    interpreter finishes building.

    Eric Schwartz, Aug 13, 2004
  7. Not to follow up on myself or anything, but trying to rebuild the
    Debian ruby1.8 package gives me:

    $ make
    ../ext/extmk.rb:27:in `require': unexpected break (LocalJumpError)
    from ./ext/extmk.rb:27
    make: *** [all] Error 1

    I couldn't find anything obvious from poking at google-- if anybody
    has advice to share, I'd welcome it.

    Eric Schwartz, Aug 13, 2004
  8. Eric,

    I'm not sure what your problem with the Ruby rebuild is, (though I
    might recommend just doing a local build of the 1.8.1 sources, rather
    than the Debian package) but I may have an idea about the
    SIGALRM/timeout issue you're having.

    Is the long-running method calling out into C code? Even something
    like a socket operation? If so, that system code may be blocking
    signals before they can percolate up to the Ruby layer I would try
    sending signals from outside the Ruby process to see if they can
    interrupt it during the long method.

    Lennon Day-Reynolds, Aug 13, 2004
  9. I'm way ahead of you. :) I've tried it with a fork() instead of a new
    thread, and I've even sent signals from a completely separate shell
    process. No dice. I'm 99% sure it's the Ruby interpreter's fault,
    because although I know that multiple SIGALRMs can be condensed into
    one, I've never heard of only one taking over 30 seconds to be sent to
    the process it's intended for.

    Eric Schwartz, Aug 14, 2004
  10. I forgot to mention: yes, this is exactly what's happening. I built a
    Ruby extension for the STAF library:


    The STAF library itself is doing all sorts of C++ weirdness I dare not
    attempt to decipher, lest I go insane trying. I fear some bizarre
    interaction between STAF and Ruby, perhaps.

    Eric Schwartz, Aug 14, 2004
  11. Eric,

    It could be the interpreter, or it could be something inside the STAF
    library itself trapping SIGALRM, and not letting the events reach the
    intepreter (though a testing library that didn't allow you to use
    SIGALRM in the code being tested.

    However, I really know nothing about STAF, so I couldn't speculate as
    to what might be causing the problem. I've never had any problems with
    the Kernel.trap method in Ruby before, which is the only reason I keep
    leaning towards the bug being elsewhere.

    Have the STAF maintainers been able to offer any sense of whether
    other language bindings (specifically, I notice they list Python on
    the homepage) have had any problems with signal handling?
    Lennon Day-Reynolds, Aug 14, 2004
  12. Okay, it's Ruby's fault. Or, more probably, my fault for how I am
    extending Ruby.

    I instrumented signal.c, and what I've found is that sighandler() is
    being called for the SIGALRM. In it, rb_trap_immediate is NOT set, so
    rb_trap_pending is incremented, and the SIGALRM entry in trap_pending
    list is incremented. So far so good-- it appears this is Ruby's way
    of deferring handling of signals until it's safe to handle them.

    The problem is, this signal is never getting handled. And, well,
    kinda the point of a SIGALRM is that it gets sent in a reasonably
    timely manner. :) I've noticed this behaviour seems to exist with
    every signal, though, except SIGSTOP and SIGKILL (for obvious

    My code is at
    http://sourceforge.net/tracker/?group_id=33142&atid=407383 if anyone
    wants to double-check me. My questions are:

    * Is there some way to force Ruby to deliver this signal?
    * How can I tell why it's not being delivered?

    Thanks for any help,

    Eric Schwartz, Aug 20, 2004
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.