Flushing and multiple pipes

N

neurino

Dear group,

happy to post my first message here! So, now the business.

I define two pipes which print both two parsed datastreams on the
_same_ postscript file. The data values are selected through 'commands'
by certain criteria in the following manner:

# shortened pseudocode
#!/usr/bin/perl

my $pipe1 = " commands >> $psfile";
my $pipe2 = " commands >> $psfile";

open(H1 ,"| $pipe1") or die "\n --- Error: Could not plot $pipe1: $!\n";
open(H2 ,"| $pipe2") or die "\n --- Error: Could not plot $pipe2: $!\n";

foreach my $ifile (@ifiles){
open(IFILE, "<$ifile");
while(<IFILE>){
$line = split(/\s+/,$_);
print H1 $line;
print H2 $line;
}
close(IFILE);
}
close(H1);
close(H2);
# end shortened pseudocode

I noticed that, depending on the datastream type, sometimes the
postscript is written and closed correctly, sometimes it does not. I
noticed also that on Ubuntu the script behaves differently than on
Leopard OSX. No fancy modules are loaded, just pure Perl. The
'commands' are a pipe of awk and gmt routines.

To be on the safe side, i duplicated the foreach loop, and now i open
one pipe at a time, guessing that the problem is how the OS flushes the
pipes' buffers and how the postscript file gets the values.

It ain't by any means the best solution, because i have to parse the
same files twice. Then the question: is there a way to open concurrent
pipes in a robust way?

Thanks.
 
R

Rainer Weikusat

[...]
#!/usr/bin/perl

my $pipe1 = " commands >> $psfile";
my $pipe2 = " commands >> $psfile";

open(H1 ,"| $pipe1") or die "\n --- Error: Could not plot $pipe1: $!\n";
open(H2 ,"| $pipe2") or die "\n --- Error: Could not plot $pipe2: $!\n";

foreach my $ifile (@ifiles){
open(IFILE, "<$ifile");
while(<IFILE>){
$line = split(/\s+/,$_);
print H1 $line;
print H2 $line;
}
close(IFILE);
}
close(H1);
close(H2);
# end shortened pseudocode

I noticed that, depending on the datastream type, sometimes the
postscript is written and closed correctly, sometimes it does not.

I assume 'correctly' means you get two differently processed lines for
each input line in the output file, in the order they were written in
perl. That's never going to work reliably in this way because not only
perl employs internal output buffering but the commands running as
part of your pipeline do this as well: If they use stdio, their output
will be 'fully buffered' when stdout is not connected to an
interactive device. Also, the processes in both of your pipelines
execute asynchronously with respect to the Perl control process and
the processes in the other pipeline. This means you may get higher
throughput in this way but the downside is that output reordering may
(and usually will) occur.

The simple but relatively inefficient solution to that is to create
two new pipelines for each input line and don't start the second
before the first has terminated (or the third before the second has
terminated and so on). Unless you're repeatedlyv dealing with large
inputs, this is probably good enough, though. If you wan't to process
the input asynchronously and concurrenly, you need to employ a final
'put it back together' filter which reads data from both pipelines
as it becomes available and puts the output back into the proper
order.
 
N

neurino

I assume 'correctly' means you get two differently processed lines for
each input line in the output file, in the order they were written in
perl.
Exactly.

That's never going to work reliably in this way because [...]

Thanks for the explanation. So, it seems i was on the right track somehow.
The simple but relatively inefficient solution to that is to create
two new pipelines for each input line and don't start the second
before the first has terminated (or the third before the second has
terminated and so on).

This is the way i implemented it now. I could read the files at once
but isn't an option as well. And you are pointing out the issue: the
data records are large enough, 1-2 Gb.
If you wan't to process the input asynchronously and concurrenly, you
need to employ a final 'put it back together' filter which reads data
from both pipelines as it becomes available and puts the output back
into the proper
order.

It sounds quite new to me. I found on perfaq5 pack/unpack. Is is right?

Thanks.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top