J
john
select() fails with "Bad file number".
I think that the children are dieing before
the output becomes available.
I'm trying to write a program which forks multiple children and reads
their output asynchronously. The children could take some time to
produce their output and it could arrive in fits and spurts, so I
don't want to block on any individual child; instead I want to read
whatever data each child has available (up to a buffer limit) and then
move on to the next child.
To simulate this behaviour, the children in the testcase below each
produce output, sleep, produce more output, sleep then produce final
output before exiting. If the sleep is set to 1 second for all
children, then the testcase sometimes finishes successfully, but if
the sleep is set to longer, or variable lengths of time (as below),
then it select will fail with a "Bad file number".
Example output:
../testcase
parent: startTask(), task 1, pid=21348
parent: startTask(), task 2, pid=12538
parent: startTask(), task 3, pid=26482
parent: startTask(), task 4, pid=28482
parent: startTask(), task 5, pid=29770
parent: pollForOutput(), nfound=4
OUTPUT(task 4): (fileno=6) first sleep 4...
OUTPUT(task 1): (fileno=3) first sleep 1...
OUTPUT(task 3): (fileno=5) first sleep 3...
OUTPUT(task 2): (fileno=4) first sleep 2...
parent: pollForOutput(), nfound=3
OUTPUT(task 1): (fileno=3) second sleep 1...
OUTPUT(task 1): (fileno=3) finished
OUTPUT(task 2): (fileno=4) second sleep 2...
OUTPUT(task 5): (fileno=7) first sleep 5...
parent: pollForOutput(), nfound=4
OUTPUT(task 4): (fileno=6) second sleep 4...
OUTPUT(task 1): (fileno=3)eof
parent: closing reader for task 1
OUTPUT(task 3): (fileno=5) second sleep 3...
OUTPUT(task 2): (fileno=4) finished
select: Bad file number at testcase line 66.
From the example above, you can see that the program didn't get to
read all of the child output before select() fails. I suspect that the
children are dieing before their output can be captured by the parent.
Does anyone have any idea why this is happening and how I can prevent
it ? I'm running out of ideas.
I've reproduced the problem on the following platforms:
AIX 5.3, This is perl, v5.8.2 built for aix-thread-multi
Linux Red Hat 2.4.21-4.ELsmp, This is perl, v5.8.0 built for
i386-linux-thread-multi
SunOS 5.8, This is perl, version 5.005_03 built for sun4-solaris
# Need to put no strict 'refs'; at top of file on Solaris.
Testcase is as follows:
#!/usr/bin/perl
use strict;
use IO;
my $eofsFound = 0;
my $eofsExpected = 5;
my $taskNum = 0;
my $readBits = ''; # "bitlist" of parent reader filehandles
my($fh) = ('fh0000'); # indirect filehandle names, yuk
my %readers; # store parent reader for each task
sub startTask() {
$taskNum++;
$readers{$taskNum} = $fh++; # parent reader for this task
my $cw = $fh++; # child write filehandle
{
no strict 'refs';
pipe($readers{$taskNum}, $cw) or die 'pr/cw pipe';
}
my $pid;
if ($pid = fork) {
# Parent
print "parent: startTask(), task $taskNum, pid=$pid\n";
close $cw; # close child writer
# $readers{$taskNum}->blocking(0); # stop sysread() from blocking
vec($readBits, fileno($readers{$taskNum}), 1) = 1;
} elsif ($pid ne undef) {
# Child
close $readers{$taskNum}; # close parent reader
open(STDOUT, ">&$cw") or die "STDOUT open: $!";
STDOUT->autoflush(1);
my $sleep = $taskNum; # set this to 1 and it'll probably
work
print "first sleep $sleep...\n";
select(undef, undef, undef, $sleep);
print "second sleep $sleep...\n";
select(undef, undef, undef, $sleep);
print "finished\n";
close(STDOUT);
exit(0);
} else {
die 'fork failed: $!';
}
}
sub pollForOutput {
my($rbits, $nfound);
$nfound = select($rbits = $readBits,undef,undef,2);
if ($nfound == -1) {
die "select: $!";
}
print "parent: pollForOutput(), nfound=$nfound\n";
return if $nfound == 0;
my @task_list = keys %readers;
# Work through bitmask to see which filehandles are ready.
NEXT_FH:
while ($nfound > 0) {
my $taskNum = shift @task_list;
my $fh = $readers{$taskNum};
if (vec($rbits, fileno($fh),1) == 0){
# if no incoming data from this client
next NEXT_FH;
}
$nfound--;
# parent's read filehandle
my $buf;
my $n = sysread($fh, $buf, 1024);
if ($n > 0) {
chomp $buf;
my @lines = split(/\n/, $buf);
foreach my $line (@lines) {
print "OUTPUT(task $taskNum): (fileno=" . fileno($fh) . ")
$line\n";
}
}
if ($n == 0) {
$eofsFound++;
print "OUTPUT(task $taskNum): (fileno=" . fileno($fh) .
")eof\n";
print "parent: closing reader for task $taskNum\n";
close($fh) or die "close failed: $!";
vec($readBits, fileno($fh), 1) = 0; # select() no longer
interested in this fh
}
}
}
sub main {
startTask();
startTask();
startTask();
startTask();
startTask();
while ($eofsFound < $eofsExpected) {
&pollForOutput();
sleep 2
}
print "Finished\n";
}
main();
I think that the children are dieing before
the output becomes available.
I'm trying to write a program which forks multiple children and reads
their output asynchronously. The children could take some time to
produce their output and it could arrive in fits and spurts, so I
don't want to block on any individual child; instead I want to read
whatever data each child has available (up to a buffer limit) and then
move on to the next child.
To simulate this behaviour, the children in the testcase below each
produce output, sleep, produce more output, sleep then produce final
output before exiting. If the sleep is set to 1 second for all
children, then the testcase sometimes finishes successfully, but if
the sleep is set to longer, or variable lengths of time (as below),
then it select will fail with a "Bad file number".
Example output:
../testcase
parent: startTask(), task 1, pid=21348
parent: startTask(), task 2, pid=12538
parent: startTask(), task 3, pid=26482
parent: startTask(), task 4, pid=28482
parent: startTask(), task 5, pid=29770
parent: pollForOutput(), nfound=4
OUTPUT(task 4): (fileno=6) first sleep 4...
OUTPUT(task 1): (fileno=3) first sleep 1...
OUTPUT(task 3): (fileno=5) first sleep 3...
OUTPUT(task 2): (fileno=4) first sleep 2...
parent: pollForOutput(), nfound=3
OUTPUT(task 1): (fileno=3) second sleep 1...
OUTPUT(task 1): (fileno=3) finished
OUTPUT(task 2): (fileno=4) second sleep 2...
OUTPUT(task 5): (fileno=7) first sleep 5...
parent: pollForOutput(), nfound=4
OUTPUT(task 4): (fileno=6) second sleep 4...
OUTPUT(task 1): (fileno=3)eof
parent: closing reader for task 1
OUTPUT(task 3): (fileno=5) second sleep 3...
OUTPUT(task 2): (fileno=4) finished
select: Bad file number at testcase line 66.
From the example above, you can see that the program didn't get to
read all of the child output before select() fails. I suspect that the
children are dieing before their output can be captured by the parent.
Does anyone have any idea why this is happening and how I can prevent
it ? I'm running out of ideas.
I've reproduced the problem on the following platforms:
AIX 5.3, This is perl, v5.8.2 built for aix-thread-multi
Linux Red Hat 2.4.21-4.ELsmp, This is perl, v5.8.0 built for
i386-linux-thread-multi
SunOS 5.8, This is perl, version 5.005_03 built for sun4-solaris
# Need to put no strict 'refs'; at top of file on Solaris.
Testcase is as follows:
#!/usr/bin/perl
use strict;
use IO;
my $eofsFound = 0;
my $eofsExpected = 5;
my $taskNum = 0;
my $readBits = ''; # "bitlist" of parent reader filehandles
my($fh) = ('fh0000'); # indirect filehandle names, yuk
my %readers; # store parent reader for each task
sub startTask() {
$taskNum++;
$readers{$taskNum} = $fh++; # parent reader for this task
my $cw = $fh++; # child write filehandle
{
no strict 'refs';
pipe($readers{$taskNum}, $cw) or die 'pr/cw pipe';
}
my $pid;
if ($pid = fork) {
# Parent
print "parent: startTask(), task $taskNum, pid=$pid\n";
close $cw; # close child writer
# $readers{$taskNum}->blocking(0); # stop sysread() from blocking
vec($readBits, fileno($readers{$taskNum}), 1) = 1;
} elsif ($pid ne undef) {
# Child
close $readers{$taskNum}; # close parent reader
open(STDOUT, ">&$cw") or die "STDOUT open: $!";
STDOUT->autoflush(1);
my $sleep = $taskNum; # set this to 1 and it'll probably
work
print "first sleep $sleep...\n";
select(undef, undef, undef, $sleep);
print "second sleep $sleep...\n";
select(undef, undef, undef, $sleep);
print "finished\n";
close(STDOUT);
exit(0);
} else {
die 'fork failed: $!';
}
}
sub pollForOutput {
my($rbits, $nfound);
$nfound = select($rbits = $readBits,undef,undef,2);
if ($nfound == -1) {
die "select: $!";
}
print "parent: pollForOutput(), nfound=$nfound\n";
return if $nfound == 0;
my @task_list = keys %readers;
# Work through bitmask to see which filehandles are ready.
NEXT_FH:
while ($nfound > 0) {
my $taskNum = shift @task_list;
my $fh = $readers{$taskNum};
if (vec($rbits, fileno($fh),1) == 0){
# if no incoming data from this client
next NEXT_FH;
}
$nfound--;
# parent's read filehandle
my $buf;
my $n = sysread($fh, $buf, 1024);
if ($n > 0) {
chomp $buf;
my @lines = split(/\n/, $buf);
foreach my $line (@lines) {
print "OUTPUT(task $taskNum): (fileno=" . fileno($fh) . ")
$line\n";
}
}
if ($n == 0) {
$eofsFound++;
print "OUTPUT(task $taskNum): (fileno=" . fileno($fh) .
")eof\n";
print "parent: closing reader for task $taskNum\n";
close($fh) or die "close failed: $!";
vec($readBits, fileno($fh), 1) = 0; # select() no longer
interested in this fh
}
}
}
sub main {
startTask();
startTask();
startTask();
startTask();
startTask();
while ($eofsFound < $eofsExpected) {
&pollForOutput();
sleep 2
}
print "Finished\n";
}
main();