Flushing and multiple pipes

Discussion in 'Perl Misc' started by neurino, Jul 20, 2012.

  1. neurino

    neurino Guest

    Dear group,

    happy to post my first message here! So, now the business.

    I define two pipes which print both two parsed datastreams on the
    _same_ postscript file. The data values are selected through 'commands'
    by certain criteria in the following manner:

    # shortened pseudocode
    #!/usr/bin/perl

    my $pipe1 = " commands >> $psfile";
    my $pipe2 = " commands >> $psfile";

    open(H1 ,"| $pipe1") or die "\n --- Error: Could not plot $pipe1: $!\n";
    open(H2 ,"| $pipe2") or die "\n --- Error: Could not plot $pipe2: $!\n";

    foreach my $ifile (@ifiles){
    open(IFILE, "<$ifile");
    while(<IFILE>){
    $line = split(/\s+/,$_);
    print H1 $line;
    print H2 $line;
    }
    close(IFILE);
    }
    close(H1);
    close(H2);
    # end shortened pseudocode

    I noticed that, depending on the datastream type, sometimes the
    postscript is written and closed correctly, sometimes it does not. I
    noticed also that on Ubuntu the script behaves differently than on
    Leopard OSX. No fancy modules are loaded, just pure Perl. The
    'commands' are a pipe of awk and gmt routines.

    To be on the safe side, i duplicated the foreach loop, and now i open
    one pipe at a time, guessing that the problem is how the OS flushes the
    pipes' buffers and how the postscript file gets the values.

    It ain't by any means the best solution, because i have to parse the
    same files twice. Then the question: is there a way to open concurrent
    pipes in a robust way?

    Thanks.
     
    neurino, Jul 20, 2012
    #1
    1. Advertising

  2. neurino <> writes:

    [...]

    > #!/usr/bin/perl
    >
    > my $pipe1 = " commands >> $psfile";
    > my $pipe2 = " commands >> $psfile";
    >
    > open(H1 ,"| $pipe1") or die "\n --- Error: Could not plot $pipe1: $!\n";
    > open(H2 ,"| $pipe2") or die "\n --- Error: Could not plot $pipe2: $!\n";
    >
    > foreach my $ifile (@ifiles){
    > open(IFILE, "<$ifile");
    > while(<IFILE>){
    > $line = split(/\s+/,$_);
    > print H1 $line;
    > print H2 $line;
    > }
    > close(IFILE);
    > }
    > close(H1);
    > close(H2);
    > # end shortened pseudocode
    >
    > I noticed that, depending on the datastream type, sometimes the
    > postscript is written and closed correctly, sometimes it does not.


    I assume 'correctly' means you get two differently processed lines for
    each input line in the output file, in the order they were written in
    perl. That's never going to work reliably in this way because not only
    perl employs internal output buffering but the commands running as
    part of your pipeline do this as well: If they use stdio, their output
    will be 'fully buffered' when stdout is not connected to an
    interactive device. Also, the processes in both of your pipelines
    execute asynchronously with respect to the Perl control process and
    the processes in the other pipeline. This means you may get higher
    throughput in this way but the downside is that output reordering may
    (and usually will) occur.

    The simple but relatively inefficient solution to that is to create
    two new pipelines for each input line and don't start the second
    before the first has terminated (or the third before the second has
    terminated and so on). Unless you're repeatedlyv dealing with large
    inputs, this is probably good enough, though. If you wan't to process
    the input asynchronously and concurrenly, you need to employ a final
    'put it back together' filter which reads data from both pipelines
    as it becomes available and puts the output back into the proper
    order.
     
    Rainer Weikusat, Jul 20, 2012
    #2
    1. Advertising

  3. neurino

    neurino Guest

    On 2012-07-20 15:08:14 +0200, Rainer Weikusat said:

    > I assume 'correctly' means you get two differently processed lines for
    > each input line in the output file, in the order they were written in
    > perl.


    Exactly.

    > That's never going to work reliably in this way because [...]


    Thanks for the explanation. So, it seems i was on the right track somehow.

    > The simple but relatively inefficient solution to that is to create
    > two new pipelines for each input line and don't start the second
    > before the first has terminated (or the third before the second has
    > terminated and so on).


    This is the way i implemented it now. I could read the files at once
    but isn't an option as well. And you are pointing out the issue: the
    data records are large enough, 1-2 Gb.

    > If you wan't to process the input asynchronously and concurrenly, you
    > need to employ a final 'put it back together' filter which reads data
    > from both pipelines as it becomes available and puts the output back
    > into the proper
    > order.


    It sounds quite new to me. I found on perfaq5 pack/unpack. Is is right?

    Thanks.
     
    neurino, Jul 20, 2012
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. jeremy redburn
    Replies:
    3
    Views:
    3,156
    jeremy redburn
    Nov 25, 2003
  2. Multiple pipes

    , Sep 26, 2007, in forum: C Programming
    Replies:
    3
    Views:
    1,129
    William Pursell
    Sep 30, 2007
  3. Ethan Metsger

    Decorators and buffer flushing

    Ethan Metsger, Feb 28, 2008, in forum: Python
    Replies:
    0
    Views:
    287
    Ethan Metsger
    Feb 28, 2008
  4. justme

    opening multiple pipes

    justme, Jun 4, 2004, in forum: Perl Misc
    Replies:
    3
    Views:
    301
    Michele Dondi
    Jun 4, 2004
  5. john
    Replies:
    7
    Views:
    249
    Brian McCauley
    Mar 4, 2005
Loading...

Share This Page