having problems with open4 and stuck forked processes

Discussion in 'Ruby' started by Tim Uckun, Sep 22, 2010.

  1. Tim Uckun

    Tim Uckun Guest

    I am running a batch process which uses the wkhtmltoimage-i386 binary
    to make screenshots of urls. Unfortunately this is in beta and it
    frequently hangs up and takes up 100% of one of the CPUs on the
    machine.

    I have the following code to try and detect the hung process and kill
    it but it doesn't always work and I was wondering if anybody has a
    better idea of how to do this. When I run it by testing simple
    commands like sleep it works perfectly. In production with this binary
    it doesn't seem to always work.

    def Util.shell_with_timeout(cmd, seconds = 3600)
    #the default timeout is an hour. That's probably way too long

    Timeout::timeout(seconds) {
    @pid, @stdin, @stdout, @stderr = Open4.popen4(cmd)
    ignored, @status = Process::waitpid2 @pid
    if @status.exitstatus != 0
    raise "Exit Status not zero"
    end
    }

    @stdout ? @stdout.read.strip : ''
    rescue Timeout::Error
    Process.detach @pid
    Process.kill 'SIGKILL', @pid
    raise "Process Timed out"
    rescue => e
    msg = @stderr ? @stderr.read.strip : ''
    msg += e.to_s
    raise "Error during execution of command #{cmd}\n #{msg}"
    end
     
    Tim Uckun, Sep 22, 2010
    #1
    1. Advertising

  2. On Wed, Sep 22, 2010 at 2:31 PM, Tim Uckun <> wrote:
    > I am running a batch process which uses the wkhtmltoimage-i386 binary
    > to make screenshots of urls. =A0Unfortunately this is in beta and it
    > frequently hangs up and takes up 100% of one of the CPUs on the
    > machine.
    >
    > I have the following code to try and detect the hung process and kill
    > it but it doesn't always work and I was wondering if anybody has a
    > better idea of how to do this. =A0When I run it by testing simple
    > commands like sleep it works perfectly. In production with this binary
    > it doesn't seem to always work.


    What do you mean by that? Goes the timeout undetected? Can't you
    kill the process? Are there any unexpected error messages /
    exceptions?

    > def Util.shell_with_timeout(cmd, seconds =3D 3600)
    > =A0 =A0#the default timeout is an hour. That's probably way too long
    >
    > =A0 =A0Timeout::timeout(seconds) {
    > =A0 =A0 =A0@pid, @stdin, @stdout, @stderr =3D Open4.popen4(cmd)
    > =A0 =A0 =A0ignored, @status =3D Process::waitpid2 @pid
    > =A0 =A0 =A0if @status.exitstatus !=3D 0
    > =A0 =A0 =A0 =A0raise "Exit Status not zero"
    > =A0 =A0 =A0end
    > =A0 =A0}
    >
    > =A0 =A0@stdout ? @stdout.read.strip : ''
    > =A0rescue Timeout::Error
    > =A0 =A0Process.detach @pid
    > =A0 =A0Process.kill 'SIGKILL', @pid
    > =A0 =A0raise "Process Timed out"
    > =A0rescue =3D> e
    > =A0 =A0msg =3D =A0@stderr ? = : ''
    > =A0 =A0msg +=3D =A0 e.to_s
    > =A0 =A0raise "Error during execution of command #{cmd}\n #{msg}"
    > =A0end


    A frequent problem with #popen methods is to not read file descriptors
    which can make the client hang (i.e. if it writes more than fits into
    a pipe). That could be something to check since you are not reading
    any of the streams.

    Kind regards

    robert

    --=20
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
     
    Robert Klemme, Sep 22, 2010
    #2
    1. Advertising

  3. Tim Uckun

    Tim Uckun Guest

    > What do you mean by that? =C2=A0Goes the timeout undetected? =C2=A0Can't =
    you
    > kill the process? =C2=A0Are there any unexpected error messages /
    > exceptions?
    >


    Obviously the timeout is not detected. I am not sure about the
    exceptions as it happens when I am not looking but I will ramp up the
    logging and see if I can trap anything.

    >
    > A frequent problem with #popen methods is to not read file descriptors
    > which can make the client hang (i.e. if it writes more than fits into
    > a pipe). =C2=A0That could be something to check since you are not reading
    > any of the streams.


    If you have any pointers to documentation about this I would really
    appreciate it. I know so little about unix processes and pipes and
    such.
     
    Tim Uckun, Sep 23, 2010
    #3
  4. On 23.09.2010 01:59, Tim Uckun wrote:
    >> What do you mean by that? Goes the timeout undetected? Can't you
    >> kill the process? Are there any unexpected error messages /
    >> exceptions?

    >
    > Obviously the timeout is not detected.


    I don't find that obvious at all from your initial description.

    > I am not sure about the
    > exceptions as it happens when I am not looking but I will ramp up the
    > logging and see if I can trap anything.


    You could start by doing

    Thread.abort_on_exception = true

    at the beginning of your script.

    >> A frequent problem with #popen methods is to not read file descriptors
    >> which can make the client hang (i.e. if it writes more than fits into
    >> a pipe). That could be something to check since you are not reading
    >> any of the streams.

    >
    > If you have any pointers to documentation about this I would really
    > appreciate it. I know so little about unix processes and pipes and
    > such.


    I don't have anything handy but I guess Google will help.

    A pipe is basically what it looks like: it's a piece of pipe with you
    write to on one end and read from at the other end. At the read end
    there is a valve. If nobody reads the valve stays closed and you can't
    fill in more at the write end. If you use blocking IO your process
    blocks on the system call and won't be active before you read from the
    other end. (This is a bit simplistic because it leaves threads and
    interpreter implementation out of the way but this is basically what
    happens).

    http://en.wikipedia.org/wiki/Pipe_(Unix)

    Kind regards

    robert


    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
     
    Robert Klemme, Sep 23, 2010
    #4
  5. Tim Uckun

    elise huard Guest

    > Thread.abort_on_exception = true
    >
    > at the beginning of your script.


    Euhm (asking this because I honestly don't know) will this work for
    Processes ? (he's not using Thread)

    Elise
     
    elise huard, Sep 23, 2010
    #5
  6. elise huard wrote:
    > Euhm (asking this because I honestly don't know) will this work for
    > Processes ? (he's not using Thread)


    Timeout::timeout uses a thread internally - and it raises an exception
    asynchronously in the main thread, which makes it unsafe in just about
    any application you can think of for it.

    It would be safer to use select() on the data coming from the child to
    wait for the process to terminate (when you read end-of-file)
    --
    Posted via http://www.ruby-forum.com/.
     
    Brian Candler, Sep 23, 2010
    #6
  7. On Thu, Sep 23, 2010 at 9:42 AM, elise huard <> wrote:
    >> Thread.abort_on_exception = true
    >>
    >> at the beginning of your script.

    >
    > Euhm (asking this because I honestly don't know) will this work for
    > Processes ? (he's not using Thread)


    But he uses Timeout which AFAIK uses threads internally for monitoring.

    Cheers

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
     
    Robert Klemme, Sep 23, 2010
    #7
  8. Tim Uckun

    Tim Uckun Guest

    > It would be safer to use select() on the data coming from the child to
    > wait for the process to terminate (when you read end-of-file)



    Do you know of any examples on how to do that? I am willing to rewrite
    my code obviously.
     
    Tim Uckun, Sep 24, 2010
    #8
  9. Tim Uckun

    Tim Uckun Guest

    > It would be safer to use select() on the data coming from the child to
    > wait for the process to terminate (when you read end-of-file)


    But what if the process hangs?

    Wouldn't I need to use timeout to check for that anyway?
     
    Tim Uckun, Sep 24, 2010
    #9
  10. On 25.09.2010 00:25, Tim Uckun wrote:
    >> It would be safer to use select() on the data coming from the child to
    >> wait for the process to terminate (when you read end-of-file)

    >
    > But what if the process hangs?
    >
    > Wouldn't I need to use timeout to check for that anyway?


    Select can be called with a timeout which guarantees that the call
    returns in time regardless whether there is any data available.

    Kind regards

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
     
    Robert Klemme, Sep 25, 2010
    #10
  11. Tim Uckun

    Tim Uckun Guest

    On Sat, Sep 25, 2010 at 9:05 PM, Robert Klemme
    <> wrote:
    > On 25.09.2010 00:25, Tim Uckun wrote:
    >>>
    >>> It would be safer to use select() on the data coming from the child to
    >>> wait for the process to terminate (when you read end-of-file)

    >>
    >> But what if the process hangs?
    >>
    >> Wouldn't I need to use timeout to check for that anyway?

    >
    > Select can be called with a timeout which guarantees that the call returns
    > in time regardless whether there is any data available.
    >



    Hey guys I want to revist this issue because I can't seem to find any
    documentation on how to do this.

    What I want to do seem simple enough. I want to shell out to a process
    which sometimes gets stuck. It won't return at all. It just sits there
    taking up 100% of the CPU (one of the cores anyway). I just want to
    make sure that if the process does not end in a reasonable amount of
    time I want to kill it.

    So far I have tried wrapping it in a timeout block but that doesn't
    always trigger for some reason. I have plenty of error handling and
    have an ensure block which says to kill the process if it exists but
    nothing I do seems to work. Sooner or later I get a stuck process that
    hangs around forever till I kill it by hand.

    Surely there is a simple way to do this.

    Here is the code I have so far.

    http://gist.github.com/609119
     
    Tim Uckun, Oct 4, 2010
    #11
  12. On 10/3/10, Tim Uckun <> wrote:
    > Hey guys I want to revist this issue because I can't seem to find any
    > documentation on how to do this.
    >
    > What I want to do seem simple enough. I want to shell out to a process
    > which sometimes gets stuck. It won't return at all. It just sits there
    > taking up 100% of the CPU (one of the cores anyway). I just want to
    > make sure that if the process does not end in a reasonable amount of
    > time I want to kill it.
    >
    > So far I have tried wrapping it in a timeout block but that doesn't
    > always trigger for some reason. I have plenty of error handling and
    > have an ensure block which says to kill the process if it exists but
    > nothing I do seems to work. Sooner or later I get a stuck process that
    > hangs around forever till I kill it by hand.


    Timeout::timeout is kind of a hack. It's probably better to avoid it.

    > Surely there is a simple way to do this.
    >
    > Here is the code I have so far.
    >
    > http://gist.github.com/609119


    Your problem may be that you're sending signal 0; you should pass
    "TERM" or (if that won't work) "KILL" as the first parameter to
    Signal.kill. signal 0 just queries if the process can receive signals
    or not...

    If you want to use select instead of timeout, then instead of this:

    Timeout::timeout(seconds) {

    @pid, @stdin, @stdout, @stderr = Open4.popen4(cmd)

    ignored, @status = Process::waitpid2 @pid

    if @status.exitstatus != 0
    raise "Exit Status not zero"
    end
    }

    You should use something like this: (UNTESTED)

    @pid, @stdin, @stdout, @stderr = Open4.popen4(cmd)

    if IO::select([@stdout],nil,nil,seconds)
    Util.kill_process_if_exists? @pid
    else
    fail 'unexpected data on stdout'
    end

    ignored, @status = Process::waitpid2 @pid

    if @status.exitstatus != 0
    raise "Exit Status not zero"
    end

    Except, if the external process actually prints something to stdout,
    then you need to call select in a loop until select returns nil, with
    decreasing timeouts depending on how much time has passed.

    Unfortunately, 'ri Kernel#select' seems to be broken... it just
    refers you back to Kernel#select. I hope somebody fixes that. Check
    what it says in the pickaxe instead. (There's a free version available
    online if you don't own a copy yourself.)
     
    Caleb Clausen, Oct 4, 2010
    #12
  13. Tim Uckun

    Tim Uckun Guest

    >
    > Except, if the external process actually prints something to stdout,
    > then you need to call select in a loop until select returns nil, with
    > decreasing timeouts depending on how much time has passed.
    >



    Well I tried to go a different route and ran into a strange issue.

    I found a shell script on the net and modified it a bit see this

    http://gist.github.com/626072

    This shell script works perfectly when I use it from bash but it works
    weird when I call it with backtics in ruby.

    basically what happens is that the backtics don't return until the
    timeout is expired no matter what happens.

    It's the weirdest thing.

    Does anybody have an explanation for that?
     
    Tim Uckun, Oct 14, 2010
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. kmkz

    Kill forked processes

    kmkz, Feb 26, 2006, in forum: Python
    Replies:
    4
    Views:
    445
  2. Pierre Morel
    Replies:
    5
    Views:
    131
    Pierre Morel
    Feb 17, 2010
  3. andrea crotti

    forked processes and testing

    andrea crotti, Sep 12, 2012, in forum: Python
    Replies:
    0
    Views:
    216
    andrea crotti
    Sep 12, 2012
  4. Kushal Kumaran

    Re: forked processes and testing

    Kushal Kumaran, Sep 12, 2012, in forum: Python
    Replies:
    0
    Views:
    188
    Kushal Kumaran
    Sep 12, 2012
  5. Terry Reedy

    Re: forked processes and testing

    Terry Reedy, Sep 12, 2012, in forum: Python
    Replies:
    0
    Views:
    195
    Terry Reedy
    Sep 12, 2012
Loading...

Share This Page