waitpid interrupted problem

J

Jason Godfrey

Hello.

I'm using perl 5.8 on a Linux system. I have a script that fires off multiple
copies of a program, then goes through a loop doing a waitpid on each child
process. While the waitpid is going on I have an alarm going off every two
seconds to handle some other tasks.

I hit some unexpected behaivor that could be a bug.

Doing an strace it appears that perl calls wait4 with the pid I am waiting for.
After it is interrupted due to the signal wait4 is called again, but this time
it is called with -1 as the pid. My second child exits first, so the waitpid
for the first child returns with the exit of the second child.(The return value
for waitpid is the pid of the second child.) When I then do the waitpid for the
second child it fails because the second child no longer exsists.

An excerpt of the strace is below.

I can work around the problem by reworking the code to not wait for specific
pids. It does strike me as undesirable behaivor though. I was wondering if
anyone else has had experience with this or any comments?

Thanks
- Jason

clone2(child_stack=0, stack_size=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTI
D|SIGCHLD, child_tidptr=0x200000000002ede0) = 32201
write(1, "[oldisk.2] /tmp/diags/oldisk -fi"..., 67[oldisk.2] /tmp/diags/oldisk -
filename /dev/shm/oldiskTestFile3219
) = 67
clone2(child_stack=0, stack_size=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTI
D|SIGCHLD, child_tidptr=0x200000000002ede0) = 32202
rt_sigaction(SIGALRM, {0x20000000006ddce0, [], 0}, {SIG_DFL}, 8) = 0
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={2, 0}}, {it_interval={0, 0
}, it_value={0, 0}}) = 0
wait4(32201, 0x60000fffffff9da0, 0, NULL) = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 20000000003dcdf1 (0) ---
rt_sigreturn() = ? (mask now [])
write(1, "|", 1|) = 1
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={2, 0}}, {it_interval={0, 0
}, it_value={0, 0}}) = 0
wait4(-1, 0x60000fffffff9da0, 0, NULL) = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 20000000003dcdf1 (0) ---
rt_sigreturn() = ? (mask now [])
write(1, "\10", ) = 1
write(1, "/", 1/) = 1

(and so on till)

wait4(-1, [WIFEXITED(s) && WEXITSTATUS(s) == 0], 0, NULL) = 32202
write(1, "Diag 1 (pid 32201 or 32202) exit"..., 42Diag 1 (pid 32201 or 32202) ex
ited with 0
) = 42
write(1, "Diag 1 (pid 32201) exited with 0"..., 33Diag 1 (pid 32201) exited with
0
) = 33
write(1, "\10", ) = 1
write(1, "\33[30m\33[42mPASS\33[0m(oldisk.1)\n", 29PASS(oldisk.1)
) = 29
lstat("/tmp/diagTestOutput.1", {st_mode=S_IFREG|0644, st_size=460, ...}) = 0
unlink("/tmp/diagTestOutput.1") = 0
wait4(32202, 0x60000fffffff9da0, 0, NULL) = -1 ECHILD (No child processes)
write(1, "Diag 2 (pid 32202 or -1) exited "..., 40Diag 2 (pid 32202 or -1) exite
d with -1
) = 40
 
B

Ben Morrow

Doing an strace it appears that perl calls wait4 with the pid I am waiting
for. After it is interrupted due to the signal wait4 is called again, but
this time it is called with -1 as the pid. My second child exits first, so
the waitpid for the first child returns with the exit of the second
child.(The return value for waitpid is the pid of the second child.) When I
then do the waitpid for the second child it fails because the second child no
longer exsists.

I *don't* see this behaviour with 5.8.2 i686-linux-thread-multi: command-line
and strace follow. Do you get the same results with this command?

Command:
strace perl -le'$SIG{ALRM} = "IGNORE";
$x = fork; defined $x or die "fork: $!";
$x or do { sleep 5; exit 1};
$y = fork; defined $y or die "fork: $!";
$y or do { sleep 6; exit 1};
alarm 2;
print waitpid $y, 0;
print waitpid $x, 0;'

Output:
22485
22484

Strace:
....
fork() = 22484
fork() = 22485
alarm(2) = 0
wait4(22485, 0xbffff3e8, 0, NULL) = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 0 (0) ---
wait4(22485, 0xbffff3e8, 0, NULL) = ? ERESTARTSYS (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(22485, [WIFEXITED(s) && WEXITSTATUS(s) == 1], 0, NULL) = 22485
--- SIGCHLD (Child exited) @ 0 (0) ---
brk(0) = 0x817c000
brk(0x817d000) = 0x817d000
wait4(22484, [WIFEXITED(s) && WEXITSTATUS(s) == 1], 0, NULL) = 22484
....

Ben
 
J

Jason Godfrey

Ben said:
I *don't* see this behaviour with 5.8.2 i686-linux-thread-multi: command-line
and strace follow. Do you get the same results with this command?

Command:
strace perl -le'$SIG{ALRM} = "IGNORE";
$x = fork; defined $x or die "fork: $!";
$x or do { sleep 5; exit 1};
$y = fork; defined $y or die "fork: $!";
$y or do { sleep 6; exit 1};
alarm 2;
print waitpid $y, 0;
print waitpid $x, 0;'

Output:
22485
22484

No, with your script I don't see the behaivour either. However, with a slight
modification I can see it:

strace perl -le'sub sigAlarm{ $i++; }
$SIG{'ALRM'} = 'sigAlarm';
$x = fork; defined $x or die "fork: $!";
$x or do { sleep 5; exit 1};
$y = fork; defined $y or die "fork: $!";
$y or do { sleep 6; exit 1};
alarm 2;
print waitpid $y, 0;
print waitpid $x, 0;'

Output:
2008
-1

Strace:
rt_sigaction(SIGALRM, {0x20000000006ddce0, [], 0}, {SIG_DFL}, 8) = 0
clone2(child_stack=0, stack_size=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x200000000002ede0) = 2008
clone2(child_stack=0, stack_size=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x200000000002ede0) = 2009
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={2, 0}}, {it_interval={0, 0},
it_value={0, 0}}) = 0
wait4(2009, 0x60000fffffff9510, 0, NULL) = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 20000000003dcdf1 (0) ---
rt_sigreturn() = ? (mask now [])
wait4(-1, [WIFEXITED(s) && WEXITSTATUS(s) == 1], 0, NULL) = 2008
write(1, "2008\n", ) = 5
wait4(2008, 0x60000fffffff9510, 0, NULL) = -1 ECHILD (No child processes)
write(1, "-1\n", 3) = 3

Thanks
- Jason
 
B

Ben Morrow

Jason Godfrey said:
No, with your script I don't see the behaivour either. However, with a slight
modification I can see it:

strace perl -le'sub sigAlarm{ $i++; }
$SIG{'ALRM'} = 'sigAlarm';
$x = fork; defined $x or die "fork: $!";
$x or do { sleep 5; exit 1};
$y = fork; defined $y or die "fork: $!";
$y or do { sleep 6; exit 1};
alarm 2;
print waitpid $y, 0;
print waitpid $x, 0;'

Output:
2008
-1

Strace:
rt_sigaction(SIGALRM, {0x20000000006ddce0, [], 0}, {SIG_DFL}, 8) = 0
clone2(child_stack=0, stack_size=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x200000000002ede0) = 2008
clone2(child_stack=0, stack_size=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x200000000002ede0) = 2009
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={2, 0}}, {it_interval={0, 0},
it_value={0, 0}}) = 0
wait4(2009, 0x60000fffffff9510, 0, NULL) = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 20000000003dcdf1 (0) ---
rt_sigreturn() = ? (mask now [])
wait4(-1, [WIFEXITED(s) && WEXITSTATUS(s) == 1], 0, NULL) = 2008
write(1, "2008\n", ) = 5
wait4(2008, 0x60000fffffff9510, 0, NULL) = -1 ECHILD (No child processes)
write(1, "-1\n", 3) = 3

Interesting... I don't get the error with that, either; but my strace
looks somewhat different from yours:

rt_sigaction(SIGALRM, {0x4002ffc0, [], SA_RESTORER, 0x40109218},
{SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0
fork() = 23540
fork() = 23541
alarm(2) = 0
wait4(23541, 0xbffff3a8, 0, NULL) = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn() = ? (mask now [RTMIN])
rt_sigprocmask(SIG_BLOCK, [ALRM], NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0
wait4(23541, 0xbffff3a8, 0, NULL) = ? ERESTARTSYS (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(23541, [WIFEXITED(s) && WEXITSTATUS(s) == 1], 0, NULL) = 23541
--- SIGCHLD (Child exited) @ 0 (0) ---
brk(0) = 0x817c000
brk(0x817d000) = 0x817d000
write(1, "23541\n", 623541) = 6
wait4(23540, [WIFEXITED(s) && WEXITSTATUS(s) == 1], 0, NULL) = 23540
write(1, "23540\n", 623540) = 6

notably, mine uses fork(2) instead of clone2(2) and alarm(2) instead of
setitimer(2). What are your linux/glibc/perl versions? I have

linux 2.4.20-xfs_pre6 (with Gentoo patches)
glibc 2.3.2
linuxthreads-0.10
perl 5.8.2 for i686-linux-thread-multi
with mostly defaults taken for Configure (in particular, perl
*doesn't* use vfork)

Ben
 
J

Jason Godfrey

Hello.

Sorry for the delay, I wanted to try on a different box before replying.

Ben said:
notably, mine uses fork(2) instead of clone2(2) and alarm(2) instead of
setitimer(2). What are your linux/glibc/perl versions? I have

linux 2.4.20-xfs_pre6 (with Gentoo patches)
glibc 2.3.2
linuxthreads-0.10
perl 5.8.2 for i686-linux-thread-multi
with mostly defaults taken for Configure (in particular, perl
*doesn't* use vfork)

glibc-2.3.2-95.6
perl-5.8.0-88.4
linux 2.4.21-9.EL #1 SMP

The machine has NPTL backported to 2.4 kernel.

In any event, I think I can code around this behaivor. I just wanted to see
if this was a general perl problem. It looks like it is configuration
specific.

Thanks
- Jason
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top