Flushing and multiple pipes

neurino · Jul 20, 2012

Dear group,

happy to post my first message here! So, now the business.

I define two pipes which print both two parsed datastreams on the
_same_ postscript file. The data values are selected through 'commands'
by certain criteria in the following manner:

# shortened pseudocode
#!/usr/bin/perl

my $pipe1 = " commands >> $psfile";
my $pipe2 = " commands >> $psfile";

open(H1 ,"| $pipe1") or die "\n --- Error: Could not plot $pipe1: $!\n";
open(H2 ,"| $pipe2") or die "\n --- Error: Could not plot $pipe2: $!\n";

foreach my $ifile (@ifiles){
open(IFILE, "<$ifile");
while(<IFILE>){
$line = split(/\s+/,$_);
print H1 $line;
print H2 $line;
}
close(IFILE);
}
close(H1);
close(H2);
# end shortened pseudocode

I noticed that, depending on the datastream type, sometimes the
postscript is written and closed correctly, sometimes it does not. I
noticed also that on Ubuntu the script behaves differently than on
Leopard OSX. No fancy modules are loaded, just pure Perl. The
'commands' are a pipe of awk and gmt routines.

To be on the safe side, i duplicated the foreach loop, and now i open
one pipe at a time, guessing that the problem is how the OS flushes the
pipes' buffers and how the postscript file gets the values.

It ain't by any means the best solution, because i have to parse the
same files twice. Then the question: is there a way to open concurrent
pipes in a robust way?

Thanks.

Rainer Weikusat · Jul 20, 2012

[...]

#!/usr/bin/perl

my $pipe1 = " commands >> $psfile";
my $pipe2 = " commands >> $psfile";

open(H1 ,"| $pipe1") or die "\n --- Error: Could not plot $pipe1: $!\n";
open(H2 ,"| $pipe2") or die "\n --- Error: Could not plot $pipe2: $!\n";

foreach my $ifile (@ifiles){
open(IFILE, "<$ifile");
while(<IFILE>){
$line = split(/\s+/,$_);
print H1 $line;
print H2 $line;
}
close(IFILE);
}
close(H1);
close(H2);
# end shortened pseudocode

I noticed that, depending on the datastream type, sometimes the
postscript is written and closed correctly, sometimes it does not.

I assume 'correctly' means you get two differently processed lines for
each input line in the output file, in the order they were written in
perl. That's never going to work reliably in this way because not only
perl employs internal output buffering but the commands running as
part of your pipeline do this as well: If they use stdio, their output
will be 'fully buffered' when stdout is not connected to an
interactive device. Also, the processes in both of your pipelines
execute asynchronously with respect to the Perl control process and
the processes in the other pipeline. This means you may get higher
throughput in this way but the downside is that output reordering may
(and usually will) occur.

The simple but relatively inefficient solution to that is to create
two new pipelines for each input line and don't start the second
before the first has terminated (or the third before the second has
terminated and so on). Unless you're repeatedlyv dealing with large
inputs, this is probably good enough, though. If you wan't to process
the input asynchronously and concurrenly, you need to employ a final
'put it back together' filter which reads data from both pipelines
as it becomes available and puts the output back into the proper
order.

neurino · Jul 20, 2012

I assume 'correctly' means you get two differently processed lines for
each input line in the output file, in the order they were written in
perl.
Exactly.

That's never going to work reliably in this way because [...]

Thanks for the explanation. So, it seems i was on the right track somehow.

The simple but relatively inefficient solution to that is to create
two new pipelines for each input line and don't start the second
before the first has terminated (or the third before the second has
terminated and so on).

This is the way i implemented it now. I could read the files at once
but isn't an option as well. And you are pointing out the issue: the
data records are large enough, 1-2 Gb.

If you wan't to process the input asynchronously and concurrenly, you
need to employ a final 'put it back together' filter which reads data
from both pipelines as it becomes available and puts the output back
into the proper
order.

It sounds quite new to me. I found on perfaq5 pack/unpack. Is is right?

Thanks.

Multiple pipes	3	Sep 26, 2007
pipes	0	Sep 11, 2007
Problem Splitting Text String	2	Dec 29, 2022
Pipes and non blocking writing	2	Apr 20, 2004
perl problem with select and non-blocking sysread from multiple pipes	7	Mar 1, 2005
Push regex search result into hash with multiple values	14	May 19, 2014
Efficiently searching multiple files	10	May 20, 2010
Duplex communication with pipes - is possible ?	5	Jun 16, 2006

Flushing and multiple pipes

neurino

Rainer Weikusat

neurino

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads