sleep/fork/shell/SIGCHLD interaction problem

Discussion in 'Perl Misc' started by Justin Fletcher, Nov 11, 2007.

  1. Hiya,

    I'm having a problem trying to get a simple program to respond the way
    that I expect. The basic premise is thus :

    1. Fork a child.
    2. Sleep for a while.
    3. Do other stuff.

    This seems pretty simple, and I have a SIGCHLD handler which will catch my
    forked process if it exits. I thought everything was fine. Then I found
    that is I press ctrl-Z to suspend the parent whilst I'm running the
    program and then background it, it hangs. I've reduced the problem to the
    simplest I can, as follows :

    ----
    #!/bin/perl

    $SIG{'CHLD'} = sub {
    print "SIGCHLD\n";
    $pid = wait;
    print "leave SIGCHLD for pid $pid\n";
    };

    print "Forking to do some long running task\n";
    unless ($pid = fork) {
    $SIG{'CHLD'} = 'DEFAULT';
    exec "tail -f /dev/null";
    die "failed\n";
    };

    print "Sleeping\n";
    sleep 50;
    print "Waking\n";
    ----

    The problem is that if I press ctrl-Z whilst the program is sleeping, and
    then resume it in the background with 'bg', a SIGCHLD is triggered. The
    handler then does a 'wait' to get the PID and hangs because there isn't a
    child that's exited. We never leave the SIGCHLD handler (unless the long
    running task completes). The use of 'tail -f /dev/null' is purely to
    simulate a task which just keeps running.

    In the shell, the following sequence is seen:

    ----
    justin@buttercup:~/Root/perltest$ perl testsleep.pl
    Forking to do some long running task
    Sleeping

    [1]+ Stopped perl testsleep.pl
    justin@buttercup:~/Root/perltest$ bg
    [1]+ perl testsleep.pl &
    SIGCHLD
    justin@buttercup:~/Root/perltest$
    ----

    I'm running bash 3.1.17, linux kernel 2.6.18, from debian stable, with
    perl 5.8.8.

    I believe this sort of construct to be normal and even recommended from
    the perlipc pages; so... am I doing something wrong ? is bash ? is the
    kernel ? is perl ?

    I'm hoping I'm just misunderstanding how process control should be done.

    --
    Gerph <http://gerph.org/>
    .... And you never see me walking toward you.
    Justin Fletcher, Nov 11, 2007
    #1
    1. Advertising

  2. On Sun, 11 Nov 2007 15:41:34 +0000, Justin Fletcher wrote:

    > The problem is that if I press ctrl-Z whilst the program is sleeping,
    > and then resume it in the background with 'bg', a SIGCHLD is triggered.
    > The handler then does a 'wait' to get the PID and hangs because there
    > isn't a child that's exited. We never leave the SIGCHLD handler (unless
    > the long running task completes). The use of 'tail -f /dev/null' is
    > purely to simulate a task which just keeps running.



    > I believe this sort of construct to be normal and even recommended from
    > the perlipc pages; so... am I doing something wrong ? is bash ? is the
    > kernel ? is perl ?
    >
    > I'm hoping I'm just misunderstanding how process control should be done.


    It seems you are getting signals for the stop and start of the child, see
    man sigaction and look at the possible CHLD signals.

    This is worrying, your code is quite a normal construct and there must be
    a lot of production code out there that has this same problem.

    Additionally I could not find out how to get at the si_code for the
    signal.

    The solution seems to me to use (thanks to perldoc perlipc):

    #!/usr/bin/perl

    use strict;
    use warnings;
    use POSIX ":sys_wait_h";

    sub REAPER {
    print "entering reaper\n";
    my $child;
    # If a second child dies while in the signal handler caused by the
    # first death, we won’t get another signal. So must loop here else
    # we will leave the unreaped child as a zombie. And the next time
    # two children die we get another zombie. And so on.

    # Also, we can get signals on stopping and continuation of children
    # so there is no process to wait on

    while (($child = waitpid(-1,WNOHANG)) > 0) {
    print "Reaped $child: $?\n";
    }
    $SIG{CHLD} = \&REAPER; # still loathe sysV
    print "Leaving reaper\n";
    }
    $SIG{CHLD} = \&REAPER;

    my $pid;
    print "Forking to do some long running task\n";
    unless ($pid = fork) {
    $SIG{'CHLD'} = 'DEFAULT';
    my $i=0;
    while (1) {
    print $i++, "\n";
    sleep 1;
    }
    }

    print "pid=$pid\n";
    print "Sleeping\n";
    sleep 20;
    print "Waking\n";
    kill 'INT', $pid;
    sleep 2;
    Martijn Lievaart, Nov 11, 2007
    #2
    1. Advertising

  3. Justin Fletcher

    Ben Morrow Guest

    Quoth Justin Fletcher <>:
    >
    > The problem is that if I press ctrl-Z whilst the program is sleeping, and
    > then resume it in the background with 'bg', a SIGCHLD is triggered.


    This is expected bahaviour if your signal handler is installed with
    sigaction without specifying the SA_NOCLDSTOP flag, which is what perl
    does. See your system's sigaction(2).

    > The handler then does a 'wait' to get the PID and hangs because there
    > isn't a child that's exited.


    You shouldn't simply call wait in a SIGCHLD handler, anyway. You don't
    know how many children have exitted before you could handle the signal.
    The usual idiom is something like

    use POSIX qw/:sys_wait_h/;

    $SIG{CHLD} = sub { 1 while 0 < waitpid -1, WNOHANG };

    which will wait for everything that needs waiting for. See perlipc for
    examples which let you get the child pid and exit status, and waitpid(2)
    for how to check for children that have stopped/continued.

    > In the shell, the following sequence is seen:
    >
    > ----
    > justin@buttercup:~/Root/perltest$ perl testsleep.pl
    > Forking to do some long running task
    > Sleeping
    >
    > [1]+ Stopped perl testsleep.pl


    How do you think the shell knew its child had stopped? It relies on
    SIGCHLD being sent when the process's status changes.

    Ben
    Ben Morrow, Nov 12, 2007
    #3
  4. Justin Fletcher

    Guest

    Justin Fletcher <> wrote:
    > Hiya,
    >
    > I'm having a problem trying to get a simple program to respond the way
    > that I expect. The basic premise is thus :
    >
    > 1. Fork a child.
    > 2. Sleep for a while.
    > 3. Do other stuff.
    >
    > This seems pretty simple, and I have a SIGCHLD handler which will catch
    > my forked process if it exits. I thought everything was fine. Then I
    > found that is I press ctrl-Z to suspend the parent whilst I'm running the
    > program and then background it, it hangs.


    I find that this only occurs if I hit ctrl-Z from the keyboard. If I
    send the process the TSTP signal via some other means, it doesn't happen.
    I know that shells often respond to ctrl-Z, ctrl-C, etc, by sending signals
    to entire process groups, rather than just the main process. I don't
    exactly how this leads to the observed phenomena, though.

    Also, be using "strace", I see that the process actually is getting a
    SIGCHLD, (as opposed to some bug in Perl causing it to think that it did
    when really it didn't)

    <snip. Thank you for providing the sample code. But I don't think I need
    to quote it.>


    > I believe this sort of construct to be normal and even recommended from
    > the perlipc pages; so... am I doing something wrong ? is bash ? is the
    > kernel ? is perl ?


    I see the same or similar behavior under tcsh. So I'm thinking it is the
    kernel. I often find that programs which spawn other program do not behave
    well when put into the background after the fact, but yours is the only
    simple demonstration of this that I've seen. When using programs that fork
    or spawn others, I've learned to try to start such programs in the
    background with &, and if I forget then I just kill them and restart them
    in the background rather than using ctrl-Z

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    The costs of publication of this article were defrayed in part by the
    payment of page charges. This article must therefore be hereby marked
    advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
    this fact.
    , Nov 12, 2007
    #4
  5. Justin Fletcher

    Ben Morrow Guest

    Quoth :
    > Justin Fletcher <> wrote:
    > >
    > > I'm having a problem trying to get a simple program to respond the way
    > > that I expect. The basic premise is thus :
    > >
    > > 1. Fork a child.
    > > 2. Sleep for a while.
    > > 3. Do other stuff.
    > >
    > > This seems pretty simple, and I have a SIGCHLD handler which will catch
    > > my forked process if it exits. I thought everything was fine. Then I
    > > found that is I press ctrl-Z to suspend the parent whilst I'm running the
    > > program and then background it, it hangs.

    >
    > I find that this only occurs if I hit ctrl-Z from the keyboard. If I
    > send the process the TSTP signal via some other means, it doesn't happen.
    > I know that shells often respond to ctrl-Z, ctrl-C, etc, by sending signals
    > to entire process groups, rather than just the main process. I don't
    > exactly how this leads to the observed phenomena, though.


    SIGCHLD is sent to the parent whenever a child changes status. So when
    you press ctrl-Z, the whole process group is signalled, the child is
    stopped, and the parent gets a SIGCHLD. When the process group is
    resumed (bg or fg) the parent gets another SIGCHLD: since it hasn't
    responded to the first yet (because it was stopped), this is not
    usually apparent.

    If the OP really doesn't want SIGCHLDs when a child stops, he can
    install the signal handler explicitly with sigaction and SA_NOCLDSTOP
    (under systems which support that). Since one must assume that any
    number of children may have exitted when handling SIGCHLD anyway,
    including 0 in 'any number' is generally easier.

    Ben
    Ben Morrow, Nov 12, 2007
    #5
  6. Justin Fletcher

    Guest

    Ben Morrow <> wrote:
    > Quoth :
    > > Justin Fletcher <> wrote:
    > > >
    > > > I'm having a problem trying to get a simple program to respond the
    > > > way that I expect. The basic premise is thus :
    > > >
    > > > 1. Fork a child.
    > > > 2. Sleep for a while.
    > > > 3. Do other stuff.
    > > >
    > > > This seems pretty simple, and I have a SIGCHLD handler which will
    > > > catch my forked process if it exits. I thought everything was fine.
    > > > Then I found that is I press ctrl-Z to suspend the parent whilst I'm
    > > > running the program and then background it, it hangs.

    > >
    > > I find that this only occurs if I hit ctrl-Z from the keyboard. If I
    > > send the process the TSTP signal via some other means, it doesn't
    > > happen. I know that shells often respond to ctrl-Z, ctrl-C, etc, by
    > > sending signals to entire process groups, rather than just the main
    > > process. I don't exactly how this leads to the observed phenomena,
    > > though.

    >
    > SIGCHLD is sent to the parent whenever a child changes status. So when
    > you press ctrl-Z, the whole process group is signalled, the child is
    > stopped, and the parent gets a SIGCHLD. When the process group is
    > resumed (bg or fg) the parent gets another SIGCHLD: since it hasn't
    > responded to the first yet (because it was stopped), this is not
    > usually apparent.


    Thanks for the explanation. I did notice sometimes the parent went into
    the $SIG{CHLD} code when ctrl-Z was hit. Presumably the child received its
    TSTP first, and the parent for some reason got the CHLD from that before it
    got the initial TSTP.

    > If the OP really doesn't want SIGCHLDs when a child stops, he can
    > install the signal handler explicitly with sigaction and SA_NOCLDSTOP
    > (under systems which support that).


    Oy. That forces me to know more about the system thing than I wish I had
    to know, at least for such a conceptually simple thing. Not that that is
    surprising--there are limits to how much Perl can do to insulate me. malloc
    and free it does a good job of, but signals I guess are harder.


    > Since one must assume that any
    > number of children may have exitted when handling SIGCHLD anyway,


    This is only true if one knows there is more than one child to exit, or one
    is writing code that is only a small part of a larger unknown system. If
    one knows that there is only one child to exit, because only one has been
    started, then one doesn't need to assume that any number greater than one
    may have exited. And if it *is* part of a larger system, than all the
    other parts need to agree on how to go about doing it. If one part does a
    waitpid -1, WNOHANG and comes up with some other part's child, that could
    cause problems. Maybe there should be a way to unwait on a child, which
    would store the pid and exit status away somewhere, then if the localized
    $SIG{CHLD} becomes unlocalized it would fire a fake SIG_CHLD and waitpid
    could return the stored away value when it is next called.

    > including 0 in 'any number' is generally easier.


    I find it easier to design/work around the need to ever set $SIG{CHLD} (to
    anything other than the default or IGNORE) in the first place. :)

    I'm perhaps fortunate in that I've usually been able to do so. Obviously,
    not all people will be lucky enough to be able get away with that.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    The costs of publication of this article were defrayed in part by the
    payment of page charges. This article must therefore be hereby marked
    advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
    this fact.
    , Nov 12, 2007
    #6
  7. writes:
    > Oy. That forces me to know more about the system thing than I wish I had
    > to know, at least for such a conceptually simple thing. Not that that is
    > surprising--there are limits to how much Perl can do to insulate me. malloc
    > and free it does a good job of, but signals I guess are harder.


    I am reminded of some commercial Unix kernel hackers who were
    responsible for the signal handling code. They had a pole 8 or 10
    feet high with a sign on the top saying, "You must be THIS TALL to use
    signals." As much as possible, they included themselves in this rule.

    -=Eric
    Eric Schwartz, Nov 12, 2007
    #7
  8. On Nov 11, 7:41 am, Justin Fletcher <> wrote:
    > Hiya,
    >
    > I'm having a problem trying to get a simple program to respond the way
    > that I expect. The basic premise is thus :
    >
    > 1. Fork a child.
    > 2. Sleep for a while.
    > 3. Do other stuff.
    >
    > This seems pretty simple, and I have a SIGCHLD handler which will catch my
    > forked process if it exits. I thought everything was fine. Then I found
    > that is I press ctrl-Z to suspend the parent whilst I'm running the
    > program and then background it, it hangs. I've reduced the problem to the
    > simplest I can, as follows :
    >
    > ----
    > #!/bin/perl
    >
    > $SIG{'CHLD'} = sub {
    > print "SIGCHLD\n";
    > $pid = wait;
    > print "leave SIGCHLD for pid $pid\n";
    > };
    >
    > print "Forking to do some long running task\n";
    > unless ($pid = fork) {
    > $SIG{'CHLD'} = 'DEFAULT';
    > exec "tail -f /dev/null";
    > die "failed\n";
    > };
    >
    > print "Sleeping\n";
    > sleep 50;
    > print "Waking\n";
    > ----
    >
    > The problem is that if I press ctrl-Z whilst the program is sleeping, and
    > then resume it in the background with 'bg', a SIGCHLD is triggered. The
    > handler then does a 'wait' to get the PID and hangs because there isn't a
    > child that's exited. We never leave the SIGCHLD handler (unless the long
    > running task completes). The use of 'tail -f /dev/null' is purely to
    > simulate a task which just keeps running.
    >
    > In the shell, the following sequence is seen:
    >
    > ----
    > justin@buttercup:~/Root/perltest$ perl testsleep.pl
    > Forking to do some long running task
    > Sleeping
    >
    > [1]+ Stopped perl testsleep.pl
    > justin@buttercup:~/Root/perltest$ bg
    > [1]+ perl testsleep.pl &
    > SIGCHLD
    > justin@buttercup:~/Root/perltest$
    > ----
    >
    > I'm running bash 3.1.17, linux kernel 2.6.18, from debian stable, with
    > perl 5.8.8.
    >
    > I believe this sort of construct to be normal and even recommended from
    > the perlipc pages; so... am I doing something wrong ? is bash ? is the
    > kernel ? is perl ?
    >

    think you could lose the SIGCHLD handler
    as it's not necessary at all here. You're
    not spawning multiple processes and SIGSTP
    is problematic as you've seen. A simple
    waitpid on the child should eliminate the
    problems, eg.,

    my $pid = fork;
    die "fork: $!" unless defined $pid;

    unless ($pid) { # child
    exec "tail -f /dev/null"
    or die "exec failed: $!\n";

    } else { # parent
    sleep 50;
    waitpid $pid, 0;
    }

    --
    Charles DeRykus
    comp.llang.perl.moderated, Nov 13, 2007
    #8
  9. On Mon, 12 Nov 2007, Ben Morrow wrote:

    >
    > Quoth :
    >> Justin Fletcher <> wrote:
    >>>
    >>> I'm having a problem trying to get a simple program to respond the way
    >>> that I expect. The basic premise is thus :
    >>>
    >>> 1. Fork a child.
    >>> 2. Sleep for a while.
    >>> 3. Do other stuff.
    >>>
    >>> This seems pretty simple, and I have a SIGCHLD handler which will catch
    >>> my forked process if it exits. I thought everything was fine. Then I
    >>> found that is I press ctrl-Z to suspend the parent whilst I'm running the
    >>> program and then background it, it hangs.

    >>
    >> I find that this only occurs if I hit ctrl-Z from the keyboard. If I
    >> send the process the TSTP signal via some other means, it doesn't happen.
    >> I know that shells often respond to ctrl-Z, ctrl-C, etc, by sending signals
    >> to entire process groups, rather than just the main process. I don't
    >> exactly how this leads to the observed phenomena, though.

    >
    > SIGCHLD is sent to the parent whenever a child changes status. So when
    > you press ctrl-Z, the whole process group is signalled, the child is
    > stopped, and the parent gets a SIGCHLD. When the process group is
    > resumed (bg or fg) the parent gets another SIGCHLD: since it hasn't
    > responded to the first yet (because it was stopped), this is not
    > usually apparent.
    >
    > If the OP really doesn't want SIGCHLDs when a child stops, he can
    > install the signal handler explicitly with sigaction and SA_NOCLDSTOP
    > (under systems which support that). Since one must assume that any
    > number of children may have exitted when handling SIGCHLD anyway,
    > including 0 in 'any number' is generally easier.


    Thanks for your (everyone on this group) help! I hadn't appreciated that
    SIGCHLD was delivered for all the information signals, or that there might
    be multiple children present.

    The explanations given have helped me resolve the odd hangs I've been
    getting. Yay :)

    --
    Gerph <http://gerph.org/>
    .... Over the hills and far away there's a place that's heaven.
    Justin Fletcher, Nov 15, 2007
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Gonzalo Moreno
    Replies:
    2
    Views:
    12,386
  2. Sam Roberts
    Replies:
    16
    Views:
    367
    Daniel Berger
    Feb 18, 2005
  3. Eric Jacoboni
    Replies:
    4
    Views:
    93
    Eric Jacoboni
    Feb 16, 2006
  4. Richard
    Replies:
    7
    Views:
    219
    Richard
    May 22, 2007
  5. msoulier
    Replies:
    1
    Views:
    134
    Ilmari Karonen
    Jul 16, 2005
Loading...

Share This Page