perl problem with select and non-blocking sysread from multiple pipes

J

john

select() fails with "Bad file number".
I think that the children are dieing before
the output becomes available.

I'm trying to write a program which forks multiple children and reads
their output asynchronously. The children could take some time to
produce their output and it could arrive in fits and spurts, so I
don't want to block on any individual child; instead I want to read
whatever data each child has available (up to a buffer limit) and then
move on to the next child.

To simulate this behaviour, the children in the testcase below each
produce output, sleep, produce more output, sleep then produce final
output before exiting. If the sleep is set to 1 second for all
children, then the testcase sometimes finishes successfully, but if
the sleep is set to longer, or variable lengths of time (as below),
then it select will fail with a "Bad file number".

Example output:

../testcase
parent: startTask(), task 1, pid=21348
parent: startTask(), task 2, pid=12538
parent: startTask(), task 3, pid=26482
parent: startTask(), task 4, pid=28482
parent: startTask(), task 5, pid=29770
parent: pollForOutput(), nfound=4
OUTPUT(task 4): (fileno=6) first sleep 4...
OUTPUT(task 1): (fileno=3) first sleep 1...
OUTPUT(task 3): (fileno=5) first sleep 3...
OUTPUT(task 2): (fileno=4) first sleep 2...
parent: pollForOutput(), nfound=3
OUTPUT(task 1): (fileno=3) second sleep 1...
OUTPUT(task 1): (fileno=3) finished
OUTPUT(task 2): (fileno=4) second sleep 2...
OUTPUT(task 5): (fileno=7) first sleep 5...
parent: pollForOutput(), nfound=4
OUTPUT(task 4): (fileno=6) second sleep 4...
OUTPUT(task 1): (fileno=3)eof
parent: closing reader for task 1
OUTPUT(task 3): (fileno=5) second sleep 3...
OUTPUT(task 2): (fileno=4) finished
select: Bad file number at testcase line 66.

From the example above, you can see that the program didn't get to
read all of the child output before select() fails. I suspect that the
children are dieing before their output can be captured by the parent.

Does anyone have any idea why this is happening and how I can prevent
it ? I'm running out of ideas.

I've reproduced the problem on the following platforms:

AIX 5.3, This is perl, v5.8.2 built for aix-thread-multi
Linux Red Hat 2.4.21-4.ELsmp, This is perl, v5.8.0 built for
i386-linux-thread-multi
SunOS 5.8, This is perl, version 5.005_03 built for sun4-solaris
# Need to put no strict 'refs'; at top of file on Solaris.


Testcase is as follows:

#!/usr/bin/perl

use strict;
use IO;

my $eofsFound = 0;
my $eofsExpected = 5;
my $taskNum = 0;

my $readBits = ''; # "bitlist" of parent reader filehandles
my($fh) = ('fh0000'); # indirect filehandle names, yuk
my %readers; # store parent reader for each task

sub startTask() {

$taskNum++;

$readers{$taskNum} = $fh++; # parent reader for this task
my $cw = $fh++; # child write filehandle
{
no strict 'refs';
pipe($readers{$taskNum}, $cw) or die 'pr/cw pipe';
}

my $pid;
if ($pid = fork) {

# Parent

print "parent: startTask(), task $taskNum, pid=$pid\n";
close $cw; # close child writer
# $readers{$taskNum}->blocking(0); # stop sysread() from blocking
vec($readBits, fileno($readers{$taskNum}), 1) = 1;

} elsif ($pid ne undef) {

# Child

close $readers{$taskNum}; # close parent reader
open(STDOUT, ">&$cw") or die "STDOUT open: $!";
STDOUT->autoflush(1);

my $sleep = $taskNum; # set this to 1 and it'll probably
work

print "first sleep $sleep...\n";
select(undef, undef, undef, $sleep);

print "second sleep $sleep...\n";
select(undef, undef, undef, $sleep);

print "finished\n";
close(STDOUT);
exit(0);

} else {
die 'fork failed: $!';
}
}

sub pollForOutput {

my($rbits, $nfound);

$nfound = select($rbits = $readBits,undef,undef,2);
if ($nfound == -1) {
die "select: $!";
}
print "parent: pollForOutput(), nfound=$nfound\n";
return if $nfound == 0;

my @task_list = keys %readers;

# Work through bitmask to see which filehandles are ready.
NEXT_FH:
while ($nfound > 0) {

my $taskNum = shift @task_list;
my $fh = $readers{$taskNum};

if (vec($rbits, fileno($fh),1) == 0){
# if no incoming data from this client
next NEXT_FH;
}
$nfound--;

# parent's read filehandle

my $buf;
my $n = sysread($fh, $buf, 1024);
if ($n > 0) {
chomp $buf;
my @lines = split(/\n/, $buf);
foreach my $line (@lines) {
print "OUTPUT(task $taskNum): (fileno=" . fileno($fh) . ")
$line\n";
}
}
if ($n == 0) {
$eofsFound++;
print "OUTPUT(task $taskNum): (fileno=" . fileno($fh) .
")eof\n";
print "parent: closing reader for task $taskNum\n";
close($fh) or die "close failed: $!";
vec($readBits, fileno($fh), 1) = 0; # select() no longer
interested in this fh
}
}
}

sub main {

startTask();
startTask();
startTask();
startTask();
startTask();

while ($eofsFound < $eofsExpected) {
&pollForOutput();
sleep 2
}

print "Finished\n";
}

main();
 
B

Brian McCauley

john said:
select() fails with "Bad file number".
I think that the children are dieing before
the output becomes available.

I do not think that should cause this problem.
I'm trying to write a program which forks multiple children and reads
their output asynchronously. The children could take some time to
produce their output and it could arrive in fits and spurts, so I
don't want to block on any individual child; instead I want to read
whatever data each child has available (up to a buffer limit) and then
move on to the next child.

To simulate this behaviour, the children in the testcase below each
produce output, sleep, produce more output, sleep then produce final
output before exiting. If the sleep is set to 1 second for all
children, then the testcase sometimes finishes successfully, but if
the sleep is set to longer, or variable lengths of time (as below),
then it select will fail with a "Bad file number".

Example output:

./testcase
parent: startTask(), task 1, pid=21348
parent: startTask(), task 2, pid=12538
parent: startTask(), task 3, pid=26482
parent: startTask(), task 4, pid=28482
parent: startTask(), task 5, pid=29770
parent: pollForOutput(), nfound=4
OUTPUT(task 4): (fileno=6) first sleep 4...
OUTPUT(task 1): (fileno=3) first sleep 1...
OUTPUT(task 3): (fileno=5) first sleep 3...
OUTPUT(task 2): (fileno=4) first sleep 2...
parent: pollForOutput(), nfound=3
OUTPUT(task 1): (fileno=3) second sleep 1...
OUTPUT(task 1): (fileno=3) finished
OUTPUT(task 2): (fileno=4) second sleep 2...
OUTPUT(task 5): (fileno=7) first sleep 5...
parent: pollForOutput(), nfound=4
OUTPUT(task 4): (fileno=6) second sleep 4...
OUTPUT(task 1): (fileno=3)eof
parent: closing reader for task 1
OUTPUT(task 3): (fileno=5) second sleep 3...
OUTPUT(task 2): (fileno=4) finished
select: Bad file number at testcase line 66.

From the example above, you can see that the program didn't get to
read all of the child output before select() fails. I suspect that the
children are dieing before their output can be captured by the parent.

Does anyone have any idea why this is happening and how I can prevent
it ? I'm running out of ideas.

I've reproduced the problem on the following platforms:

AIX 5.3, This is perl, v5.8.2 built for aix-thread-multi
Linux Red Hat 2.4.21-4.ELsmp, This is perl, v5.8.0 built for
i386-linux-thread-multi
SunOS 5.8, This is perl, version 5.005_03 built for sun4-solaris
# Need to put no strict 'refs'; at top of file on Solaris.


Testcase is as follows:

#!/usr/bin/perl

use strict;

Consider warnings too.
use IO;

my $eofsFound = 0;
my $eofsExpected = 5;
my $taskNum = 0;

my $readBits = ''; # "bitlist" of parent reader filehandles
my($fh) = ('fh0000'); # indirect filehandle names, yuk

Yes, yuk - this is not necessary in 5.6.1 and later - is compatability
with earlier Perl really needed?
my %readers; # store parent reader for each task

sub startTask() {

$taskNum++;

$readers{$taskNum} = $fh++; # parent reader for this task
my $cw = $fh++; # child write filehandle
{
no strict 'refs';
pipe($readers{$taskNum}, $cw) or die 'pr/cw pipe';
}

The above is more simply:

sub startTask() {
pipe($readers{++$taskNum}, my $cw) or die 'pr/cw pipe';
close($fh) or die "close failed: $!";
vec($readBits, fileno($fh), 1) = 0;

What are you expecting fileno() to return for a filehandle that has been
close()d? Write a small script to test this expectation :)
 
J

john

Brian McCauley said:
Consider warnings too.

I didn't know about warnings.
Looks like it could save me a lot of time in the future.
I'll try "use warnings" from now on, and see how I get on.
Thanks for the tip.

Yes, yuk - this is not necessary in 5.6.1 and later - is compatability
with earlier Perl really needed?

If there's a better way then I'll do it.
I don't like using the 'no strict refs' all over the place and
the indirect filehandle stuff is just plain confusing.
Can you give me any pointers ?

What are you expecting fileno() to return for a filehandle that has been
close()d? Write a small script to test this expectation :)

Brian, you're a star! I don't think I'd have spotted that in a million
years.
Wood/trees and all that. If I reverse those last two lines it seems to
work every time. I need to do some more testing with random sleeps and
more concurrent tasks etc.

I have to admit that I don't fully understand why the close() before
the fileno() call was causing that behaviour but I guess that some
things behave in some wierd ways when you call them incorrectly.

Thanks for your help.
 
A

A. Sinan Unur

(e-mail address removed) (john) wrote in
....

If there's a better way then I'll do it.
I don't like using the 'no strict refs' all over the place and
the indirect filehandle stuff is just plain confusing.

Please quote the appropriate amount of context so we can tell what the
discussion is about.
....

I have to admit that I don't fully understand why the close() before
the fileno() call was causing that behaviour but I guess that some
things behave in some wierd ways when you call them incorrectly.

I don't see anything behaving in weird ways:

D:\Home> perldoc -f fileno
fileno FILEHANDLE
Returns the file descriptor for a filehandle, or undefined if
the filehandle is not open.

D:\Home> perldoc -f vec
vec EXPR,OFFSET,BITS
Treats the string in EXPR as a bit vector made up of elements of
width BITS, and returns the value of the element specified by
OFFSET as an unsigned integer.

Sinan
 
J

john

A. Sinan Unur said:
(e-mail address removed) (john) wrote in


Please quote the appropriate amount of context so we can tell what the
discussion is about.

Sorry. More context is below (the last words are mine)...
If there's a better way then I'll do it.
I don't like using the 'no strict refs' all over the place and
the indirect filehandle stuff is just plain confusing.
Can you give me any pointers ?

I'm still interested to hear what the "recommended"
approach is to avoid the yucky indirect filehandle names.
Is it IO::Handle ?
D:\Home> perldoc -f vec
vec EXPR,OFFSET,BITS
Treats the string in EXPR as a bit vector made up of elements of
width BITS, and returns the value of the element specified by
OFFSET as an unsigned integer.

Sinan

I've written that small script...

use strict;
#use warnings;

my $bitstring = "";
vec($bitstring, undef, 1) = 1;

my $bits = unpack("b*", $bitstring);
print "$bits\n";

It prints:

10000000

Basically the undef is treated by vec as offset zero.
With the "use warnings" un-commented, the output is...

Use of uninitialized value in vec at test.pl line 5.
10000000

Now I understand why select() was failing in my testcase.
The big lesson I've learned is to "use warnings" in future.
 
A

A. Sinan Unur

(e-mail address removed) (john) wrote in
Sorry. More context is below (the last words are mine)...


I'm still interested to hear what the "recommended"
approach is to avoid the yucky indirect filehandle names.
Is it IO::Handle ?

I am not sure :) Sorry, but I have never used 'indirect filehandles' and
what your reason is for using them. However, what you are after might be
lexical fiheandles:

E.g.

open my $fh, '<', 'fh0000' or die $!;

Sinan.
 
J

john

A. Sinan Unur said:
(e-mail address removed) (john) wrote in


I am not sure :) Sorry, but I have never used 'indirect filehandles' and
what your reason is for using them. However, what you are after might be
lexical fiheandles:

E.g.

open my $fh, '<', 'fh0000' or die $!;

Sinan.

The reason was that when I tried using IO::Handle I was doing
something like the following:

$readers{$taskNum} = IO::Handle->new();
my $cw = IO::Handle->new();

new IO::pipe($readers{$taskNum}, $cw) or die 'pr/cw pipe';

then later I was doing...

open(STDOUT, ">&$cw") or die "STDOUT open: $!";

This was giving me an "Invalid argument" error. I tried to find a
solution but my brain packed up, and I ended up going with the nasty
no strict refs approach. After seeing what you guys wrote I decided to
have another bash and realised that $cw is a file handle, whereas
open() is expecting a file number.

The following works...

my $fd = fileno($cw);
open(STDOUT, ">&$fd") or die "STDOUT open: $!";

I'm fairly new to perl and I'm still learning.
Thanks Brian/Sinan for helping.
 
B

Brian McCauley

john said:
If there's a better way then I'll do it.

Can you give me any pointers ?

Yes, go back and look at my previous post in this thread where I showed
you how simple it can be in Perl5.6.1 or later.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,754
Messages
2,569,528
Members
45,000
Latest member
MurrayKeync

Latest Threads

Top