waitpid interrupted problem

Discussion in 'Perl Misc' started by Jason Godfrey, Mar 3, 2004.

  1. Hello.

    I'm using perl 5.8 on a Linux system. I have a script that fires off multiple
    copies of a program, then goes through a loop doing a waitpid on each child
    process. While the waitpid is going on I have an alarm going off every two
    seconds to handle some other tasks.

    I hit some unexpected behaivor that could be a bug.

    Doing an strace it appears that perl calls wait4 with the pid I am waiting for.
    After it is interrupted due to the signal wait4 is called again, but this time
    it is called with -1 as the pid. My second child exits first, so the waitpid
    for the first child returns with the exit of the second child.(The return value
    for waitpid is the pid of the second child.) When I then do the waitpid for the
    second child it fails because the second child no longer exsists.

    An excerpt of the strace is below.

    I can work around the problem by reworking the code to not wait for specific
    pids. It does strike me as undesirable behaivor though. I was wondering if
    anyone else has had experience with this or any comments?

    Thanks
    - Jason

    clone2(child_stack=0, stack_size=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTI
    D|SIGCHLD, child_tidptr=0x200000000002ede0) = 32201
    write(1, "[oldisk.2] /tmp/diags/oldisk -fi"..., 67[oldisk.2] /tmp/diags/oldisk -
    filename /dev/shm/oldiskTestFile3219
    ) = 67
    clone2(child_stack=0, stack_size=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTI
    D|SIGCHLD, child_tidptr=0x200000000002ede0) = 32202
    rt_sigaction(SIGALRM, {0x20000000006ddce0, [], 0}, {SIG_DFL}, 8) = 0
    setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={2, 0}}, {it_interval={0, 0
    }, it_value={0, 0}}) = 0
    wait4(32201, 0x60000fffffff9da0, 0, NULL) = ? ERESTARTSYS (To be restarted)
    --- SIGALRM (Alarm clock) @ 20000000003dcdf1 (0) ---
    rt_sigreturn() = ? (mask now [])
    write(1, "|", 1|) = 1
    setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={2, 0}}, {it_interval={0, 0
    }, it_value={0, 0}}) = 0
    wait4(-1, 0x60000fffffff9da0, 0, NULL) = ? ERESTARTSYS (To be restarted)
    --- SIGALRM (Alarm clock) @ 20000000003dcdf1 (0) ---
    rt_sigreturn() = ? (mask now [])
    write(1, "\10", ) = 1
    write(1, "/", 1/) = 1

    (and so on till)

    wait4(-1, [WIFEXITED(s) && WEXITSTATUS(s) == 0], 0, NULL) = 32202
    write(1, "Diag 1 (pid 32201 or 32202) exit"..., 42Diag 1 (pid 32201 or 32202) ex
    ited with 0
    ) = 42
    write(1, "Diag 1 (pid 32201) exited with 0"..., 33Diag 1 (pid 32201) exited with
    0
    ) = 33
    write(1, "\10", ) = 1
    write(1, "\33[30m\33[42mPASS\33[0m(oldisk.1)\n", 29PASS(oldisk.1)
    ) = 29
    lstat("/tmp/diagTestOutput.1", {st_mode=S_IFREG|0644, st_size=460, ...}) = 0
    unlink("/tmp/diagTestOutput.1") = 0
    wait4(32202, 0x60000fffffff9da0, 0, NULL) = -1 ECHILD (No child processes)
    write(1, "Diag 2 (pid 32202 or -1) exited "..., 40Diag 2 (pid 32202 or -1) exite
    d with -1
    ) = 40
     
    Jason Godfrey, Mar 3, 2004
    #1
    1. Advertising

  2. Jason Godfrey

    Ben Morrow Guest

    (Jason Godfrey) wrote:
    >
    > Doing an strace it appears that perl calls wait4 with the pid I am waiting
    > for. After it is interrupted due to the signal wait4 is called again, but
    > this time it is called with -1 as the pid. My second child exits first, so
    > the waitpid for the first child returns with the exit of the second
    > child.(The return value for waitpid is the pid of the second child.) When I
    > then do the waitpid for the second child it fails because the second child no
    > longer exsists.


    I *don't* see this behaviour with 5.8.2 i686-linux-thread-multi: command-line
    and strace follow. Do you get the same results with this command?

    Command:
    strace perl -le'$SIG{ALRM} = "IGNORE";
    $x = fork; defined $x or die "fork: $!";
    $x or do { sleep 5; exit 1};
    $y = fork; defined $y or die "fork: $!";
    $y or do { sleep 6; exit 1};
    alarm 2;
    print waitpid $y, 0;
    print waitpid $x, 0;'

    Output:
    22485
    22484

    Strace:
    ....
    fork() = 22484
    fork() = 22485
    alarm(2) = 0
    wait4(22485, 0xbffff3e8, 0, NULL) = ? ERESTARTSYS (To be restarted)
    --- SIGALRM (Alarm clock) @ 0 (0) ---
    wait4(22485, 0xbffff3e8, 0, NULL) = ? ERESTARTSYS (To be restarted)
    --- SIGCHLD (Child exited) @ 0 (0) ---
    wait4(22485, [WIFEXITED(s) && WEXITSTATUS(s) == 1], 0, NULL) = 22485
    --- SIGCHLD (Child exited) @ 0 (0) ---
    brk(0) = 0x817c000
    brk(0x817d000) = 0x817d000
    wait4(22484, [WIFEXITED(s) && WEXITSTATUS(s) == 1], 0, NULL) = 22484
    ....

    Ben

    --
    For the last month, a large number of PSNs in the Arpa[Inter-]net have been
    reporting symptoms of congestion ... These reports have been accompanied by an
    increasing number of user complaints ... As of June,... the Arpanet contained
    47 nodes and 63 links. [ftp://rtfm.mit.edu/pub/arpaprob.txt] *
     
    Ben Morrow, Mar 3, 2004
    #2
    1. Advertising

  3. Ben Morrow wrote:

    > I *don't* see this behaviour with 5.8.2 i686-linux-thread-multi: command-line
    > and strace follow. Do you get the same results with this command?
    >
    > Command:
    > strace perl -le'$SIG{ALRM} = "IGNORE";
    > $x = fork; defined $x or die "fork: $!";
    > $x or do { sleep 5; exit 1};
    > $y = fork; defined $y or die "fork: $!";
    > $y or do { sleep 6; exit 1};
    > alarm 2;
    > print waitpid $y, 0;
    > print waitpid $x, 0;'
    >
    > Output:
    > 22485
    > 22484


    No, with your script I don't see the behaivour either. However, with a slight
    modification I can see it:

    strace perl -le'sub sigAlarm{ $i++; }
    $SIG{'ALRM'} = 'sigAlarm';
    $x = fork; defined $x or die "fork: $!";
    $x or do { sleep 5; exit 1};
    $y = fork; defined $y or die "fork: $!";
    $y or do { sleep 6; exit 1};
    alarm 2;
    print waitpid $y, 0;
    print waitpid $x, 0;'

    Output:
    2008
    -1

    Strace:
    rt_sigaction(SIGALRM, {0x20000000006ddce0, [], 0}, {SIG_DFL}, 8) = 0
    clone2(child_stack=0, stack_size=0,
    flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
    child_tidptr=0x200000000002ede0) = 2008
    clone2(child_stack=0, stack_size=0,
    flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
    child_tidptr=0x200000000002ede0) = 2009
    setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={2, 0}}, {it_interval={0, 0},
    it_value={0, 0}}) = 0
    wait4(2009, 0x60000fffffff9510, 0, NULL) = ? ERESTARTSYS (To be restarted)
    --- SIGALRM (Alarm clock) @ 20000000003dcdf1 (0) ---
    rt_sigreturn() = ? (mask now [])
    wait4(-1, [WIFEXITED(s) && WEXITSTATUS(s) == 1], 0, NULL) = 2008
    write(1, "2008\n", ) = 5
    wait4(2008, 0x60000fffffff9510, 0, NULL) = -1 ECHILD (No child processes)
    write(1, "-1\n", 3) = 3

    Thanks
    - Jason

    >
     
    Jason Godfrey, Mar 3, 2004
    #3
  4. Jason Godfrey

    Ben Morrow Guest

    Jason Godfrey <> wrote:
    >
    > No, with your script I don't see the behaivour either. However, with a slight
    > modification I can see it:
    >
    > strace perl -le'sub sigAlarm{ $i++; }
    > $SIG{'ALRM'} = 'sigAlarm';
    > $x = fork; defined $x or die "fork: $!";
    > $x or do { sleep 5; exit 1};
    > $y = fork; defined $y or die "fork: $!";
    > $y or do { sleep 6; exit 1};
    > alarm 2;
    > print waitpid $y, 0;
    > print waitpid $x, 0;'
    >
    > Output:
    > 2008
    > -1
    >
    > Strace:
    > rt_sigaction(SIGALRM, {0x20000000006ddce0, [], 0}, {SIG_DFL}, 8) = 0
    > clone2(child_stack=0, stack_size=0,
    > flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
    > child_tidptr=0x200000000002ede0) = 2008
    > clone2(child_stack=0, stack_size=0,
    > flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
    > child_tidptr=0x200000000002ede0) = 2009
    > setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={2, 0}}, {it_interval={0, 0},
    > it_value={0, 0}}) = 0
    > wait4(2009, 0x60000fffffff9510, 0, NULL) = ? ERESTARTSYS (To be restarted)
    > --- SIGALRM (Alarm clock) @ 20000000003dcdf1 (0) ---
    > rt_sigreturn() = ? (mask now [])
    > wait4(-1, [WIFEXITED(s) && WEXITSTATUS(s) == 1], 0, NULL) = 2008
    > write(1, "2008\n", ) = 5
    > wait4(2008, 0x60000fffffff9510, 0, NULL) = -1 ECHILD (No child processes)
    > write(1, "-1\n", 3) = 3


    Interesting... I don't get the error with that, either; but my strace
    looks somewhat different from yours:

    rt_sigaction(SIGALRM, {0x4002ffc0, [], SA_RESTORER, 0x40109218},
    {SIG_DFL}, 8) = 0
    rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0
    fork() = 23540
    fork() = 23541
    alarm(2) = 0
    wait4(23541, 0xbffff3a8, 0, NULL) = ? ERESTARTSYS (To be restarted)
    --- SIGALRM (Alarm clock) @ 0 (0) ---
    sigreturn() = ? (mask now [RTMIN])
    rt_sigprocmask(SIG_BLOCK, [ALRM], NULL, 8) = 0
    rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0
    wait4(23541, 0xbffff3a8, 0, NULL) = ? ERESTARTSYS (To be restarted)
    --- SIGCHLD (Child exited) @ 0 (0) ---
    wait4(23541, [WIFEXITED(s) && WEXITSTATUS(s) == 1], 0, NULL) = 23541
    --- SIGCHLD (Child exited) @ 0 (0) ---
    brk(0) = 0x817c000
    brk(0x817d000) = 0x817d000
    write(1, "23541\n", 623541) = 6
    wait4(23540, [WIFEXITED(s) && WEXITSTATUS(s) == 1], 0, NULL) = 23540
    write(1, "23540\n", 623540) = 6

    notably, mine uses fork(2) instead of clone2(2) and alarm(2) instead of
    setitimer(2). What are your linux/glibc/perl versions? I have

    linux 2.4.20-xfs_pre6 (with Gentoo patches)
    glibc 2.3.2
    linuxthreads-0.10
    perl 5.8.2 for i686-linux-thread-multi
    with mostly defaults taken for Configure (in particular, perl
    *doesn't* use vfork)

    Ben

    --
    And if you wanna make sense / Whatcha looking at me for? (Fiona Apple)
    * *
     
    Ben Morrow, Mar 3, 2004
    #4
  5. Hello.

    Sorry for the delay, I wanted to try on a different box before replying.

    Ben Morrow wrote:

    >
    > notably, mine uses fork(2) instead of clone2(2) and alarm(2) instead of
    > setitimer(2). What are your linux/glibc/perl versions? I have
    >
    > linux 2.4.20-xfs_pre6 (with Gentoo patches)
    > glibc 2.3.2
    > linuxthreads-0.10
    > perl 5.8.2 for i686-linux-thread-multi
    > with mostly defaults taken for Configure (in particular, perl
    > *doesn't* use vfork)
    >


    glibc-2.3.2-95.6
    perl-5.8.0-88.4
    linux 2.4.21-9.EL #1 SMP

    The machine has NPTL backported to 2.4 kernel.

    In any event, I think I can code around this behaivor. I just wanted to see
    if this was a general perl problem. It looks like it is configuration
    specific.

    Thanks
    - Jason

    >
    > Ben
    >
    > --
    > And if you wanna make sense / Whatcha looking at me for? (Fiona Apple)
    > * *


    --
    ---------------------------------------------------------
    Jason Godfrey
    Engineering Software & Diagnostics Vnet: 233-3432
    Online Diagnostics (651) 683-3432
     
    Jason Godfrey, Mar 5, 2004
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. spawnl and waitpid

    , Feb 27, 2007, in forum: Python
    Replies:
    13
    Views:
    794
  2. lasek

    Fork + Waitpid

    lasek, May 13, 2005, in forum: C Programming
    Replies:
    4
    Views:
    6,055
    SM Ryan
    May 14, 2005
  3. Mike

    'waitpid' query

    Mike, Jan 28, 2009, in forum: C Programming
    Replies:
    10
    Views:
    603
    Kenny McCormack
    Jan 29, 2009
  4. Fan
    Replies:
    1
    Views:
    382
    Christopher Head
    Jul 16, 2011
  5. Thomas Hafner

    chaining processes, Process.waitpid

    Thomas Hafner, Apr 14, 2007, in forum: Ruby
    Replies:
    0
    Views:
    122
    Thomas Hafner
    Apr 14, 2007
Loading...

Share This Page