J
Jason Godfrey
Hello.
I'm using perl 5.8 on a Linux system. I have a script that fires off multiple
copies of a program, then goes through a loop doing a waitpid on each child
process. While the waitpid is going on I have an alarm going off every two
seconds to handle some other tasks.
I hit some unexpected behaivor that could be a bug.
Doing an strace it appears that perl calls wait4 with the pid I am waiting for.
After it is interrupted due to the signal wait4 is called again, but this time
it is called with -1 as the pid. My second child exits first, so the waitpid
for the first child returns with the exit of the second child.(The return value
for waitpid is the pid of the second child.) When I then do the waitpid for the
second child it fails because the second child no longer exsists.
An excerpt of the strace is below.
I can work around the problem by reworking the code to not wait for specific
pids. It does strike me as undesirable behaivor though. I was wondering if
anyone else has had experience with this or any comments?
Thanks
- Jason
clone2(child_stack=0, stack_size=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTI
D|SIGCHLD, child_tidptr=0x200000000002ede0) = 32201
write(1, "[oldisk.2] /tmp/diags/oldisk -fi"..., 67[oldisk.2] /tmp/diags/oldisk -
filename /dev/shm/oldiskTestFile3219
) = 67
clone2(child_stack=0, stack_size=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTI
D|SIGCHLD, child_tidptr=0x200000000002ede0) = 32202
rt_sigaction(SIGALRM, {0x20000000006ddce0, [], 0}, {SIG_DFL}, 8) = 0
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={2, 0}}, {it_interval={0, 0
}, it_value={0, 0}}) = 0
wait4(32201, 0x60000fffffff9da0, 0, NULL) = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 20000000003dcdf1 (0) ---
rt_sigreturn() = ? (mask now [])
write(1, "|", 1|) = 1
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={2, 0}}, {it_interval={0, 0
}, it_value={0, 0}}) = 0
wait4(-1, 0x60000fffffff9da0, 0, NULL) = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 20000000003dcdf1 (0) ---
rt_sigreturn() = ? (mask now [])
write(1, "\10", ) = 1
write(1, "/", 1/) = 1
(and so on till)
wait4(-1, [WIFEXITED(s) && WEXITSTATUS(s) == 0], 0, NULL) = 32202
write(1, "Diag 1 (pid 32201 or 32202) exit"..., 42Diag 1 (pid 32201 or 32202) ex
ited with 0
) = 42
write(1, "Diag 1 (pid 32201) exited with 0"..., 33Diag 1 (pid 32201) exited with
0
) = 33
write(1, "\10", ) = 1
write(1, "\33[30m\33[42mPASS\33[0m(oldisk.1)\n", 29PASS(oldisk.1)
) = 29
lstat("/tmp/diagTestOutput.1", {st_mode=S_IFREG|0644, st_size=460, ...}) = 0
unlink("/tmp/diagTestOutput.1") = 0
wait4(32202, 0x60000fffffff9da0, 0, NULL) = -1 ECHILD (No child processes)
write(1, "Diag 2 (pid 32202 or -1) exited "..., 40Diag 2 (pid 32202 or -1) exite
d with -1
) = 40
I'm using perl 5.8 on a Linux system. I have a script that fires off multiple
copies of a program, then goes through a loop doing a waitpid on each child
process. While the waitpid is going on I have an alarm going off every two
seconds to handle some other tasks.
I hit some unexpected behaivor that could be a bug.
Doing an strace it appears that perl calls wait4 with the pid I am waiting for.
After it is interrupted due to the signal wait4 is called again, but this time
it is called with -1 as the pid. My second child exits first, so the waitpid
for the first child returns with the exit of the second child.(The return value
for waitpid is the pid of the second child.) When I then do the waitpid for the
second child it fails because the second child no longer exsists.
An excerpt of the strace is below.
I can work around the problem by reworking the code to not wait for specific
pids. It does strike me as undesirable behaivor though. I was wondering if
anyone else has had experience with this or any comments?
Thanks
- Jason
clone2(child_stack=0, stack_size=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTI
D|SIGCHLD, child_tidptr=0x200000000002ede0) = 32201
write(1, "[oldisk.2] /tmp/diags/oldisk -fi"..., 67[oldisk.2] /tmp/diags/oldisk -
filename /dev/shm/oldiskTestFile3219
) = 67
clone2(child_stack=0, stack_size=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTI
D|SIGCHLD, child_tidptr=0x200000000002ede0) = 32202
rt_sigaction(SIGALRM, {0x20000000006ddce0, [], 0}, {SIG_DFL}, 8) = 0
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={2, 0}}, {it_interval={0, 0
}, it_value={0, 0}}) = 0
wait4(32201, 0x60000fffffff9da0, 0, NULL) = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 20000000003dcdf1 (0) ---
rt_sigreturn() = ? (mask now [])
write(1, "|", 1|) = 1
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={2, 0}}, {it_interval={0, 0
}, it_value={0, 0}}) = 0
wait4(-1, 0x60000fffffff9da0, 0, NULL) = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 20000000003dcdf1 (0) ---
rt_sigreturn() = ? (mask now [])
write(1, "\10", ) = 1
write(1, "/", 1/) = 1
(and so on till)
wait4(-1, [WIFEXITED(s) && WEXITSTATUS(s) == 0], 0, NULL) = 32202
write(1, "Diag 1 (pid 32201 or 32202) exit"..., 42Diag 1 (pid 32201 or 32202) ex
ited with 0
) = 42
write(1, "Diag 1 (pid 32201) exited with 0"..., 33Diag 1 (pid 32201) exited with
0
) = 33
write(1, "\10", ) = 1
write(1, "\33[30m\33[42mPASS\33[0m(oldisk.1)\n", 29PASS(oldisk.1)
) = 29
lstat("/tmp/diagTestOutput.1", {st_mode=S_IFREG|0644, st_size=460, ...}) = 0
unlink("/tmp/diagTestOutput.1") = 0
wait4(32202, 0x60000fffffff9da0, 0, NULL) = -1 ECHILD (No child processes)
write(1, "Diag 2 (pid 32202 or -1) exited "..., 40Diag 2 (pid 32202 or -1) exite
d with -1
) = 40