How to prevent hanging when writing lots of text to a pipe?

Discussion in 'Perl Misc' started by jl_post@hotmail.com, Oct 23, 2009.

  1. Guest

    Hi,

    I'm trying to write text to a piped writeHandle (created with the
    pipe() function) so that I can later read the text by extracting it
    from a readHandle. However, I discovered that if I write a lot of
    text to the pipe, my script just hangs. Here is an example program:

    #!/usr/bin/perl

    use strict;
    use warnings;

    print "Enter a number: ";
    my $number = <STDIN>;
    chomp($number);

    my @lines = do
    {
    pipe(my $readHandle, my $writeHandle);

    # Autoflush $writeHandle:
    my $oldHandle = select($writeHandle);
    $| = 1;
    select($oldHandle);

    print $writeHandle "$_\n" foreach 1 .. $number;
    close($writeHandle);

    # Extract the output, line-by-line:
    <$readHandle>
    };

    print "Extracted output lines:\n @lines";

    __END__

    When I run this program, I notice that it runs perfectly for small
    values of $number (like 10). But on high values (like ten thousand),
    the program hangs.

    From testing, I discovered that the limit on the Windows platform
    I'm using is 155, and the limit on the Linux platform I'm using is
    1040. Any higher number causes the program to hang.

    As for why this is happening, my best guess is that a program can
    only stuff so much output into a piped writeHandle before it gets
    full. Therefore, deadlock occurs, as the reading won't happen until
    the writing is finished.

    However, I'm not fully convinced this is the case, because I
    replaced the lines:

    print $writeHandle "$_\n" foreach 1 .. $number;
    close($writeHandle);

    with:

    if (fork() == 0)
    {
    # Only the child process gets here:
    print $writeHandle "$_\n" foreach 1 .. $number;
    close($writeHandle);
    exit(0);
    }

    and now the Perl script hangs on both Windows and Linux platforms,
    even with low values of $number (such as 5). My intent was to make
    the child process solely responsible for stuffing the output into the
    pipe, while the parent process read from the $readHandle as data
    became available. That way we would avoid the pipe getting stuffed to
    capacity.

    But as I've said, that fork()ing code change doesn't even work for
    any values, so I must be doing something wrong somewhere.

    So my question is: How do I prevent my script from hanging when I
    have a lot of text to send through the pipe?

    Thanks in advance for any help.

    -- Jean-Luc
     
    , Oct 23, 2009
    #1
    1. Advertising

  2. ilovelinux Guest

    On 23 okt, 17:47, "" <> wrote:

    >    I'm trying to write text to a piped writeHandle (created with the
    > pipe() function) so that I can later read the text by extracting it
    > from a readHandle.  However, I discovered that if I write a lot of
    > text to the pipe, my script just hangs.  Here is an example program:

    [snip]
    >    As for why this is happening, my best guess is that a program can
    > only stuff so much output into a piped writeHandle before it gets
    > full.  Therefore, deadlock occurs, as the reading won't happen until
    > the writing is finished.


    That's right. Pipes have a limited capacity. See http://linux.die.net/man/7/pipe.
    Posix prescribes a minimum capacity of 512, which is exactly
    implemented on your windows machine:
    $ perl -we 'print "$_\n" for 1..155'| wc -c
    512
    $

    Linux has 4096 on your machine:
    $ perl -we 'print "$_\n" for 1..1040'| wc -c
    4093
    $

    but in modern kernels it is 2^16.

    As for your fork()ing test program: you should close both the write
    descriptor of the pipe in the reading process and the read descriptor
    in the writing process.
     
    ilovelinux, Oct 23, 2009
    #2
    1. Advertising

  3. "" <> wrote:
    > I'm trying to write text to a piped writeHandle (created with the
    >pipe() function) so that I can later read the text by extracting it
    >from a readHandle. However, I discovered that if I write a lot of
    >text to the pipe, my script just hangs. Here is an example program:
    >

    [...]
    > When I run this program, I notice that it runs perfectly for small
    >values of $number (like 10). But on high values (like ten thousand),
    >the program hangs.
    >[...]
    >Therefore, deadlock occurs, as the reading won't happen until
    >the writing is finished.


    And that is your problem. Pipes are an IPC method, they are not meant
    and not designed to store any data. You are abusing the buffer for
    long-term storage. Of course that is not going to work.

    You need a different design, e.g. using a file as storage medium. There
    are other options, too, like the Storable module which may or may not be
    of help.

    jue
     
    Jürgen Exner, Oct 23, 2009
    #3
  4. Guest

    Thank you all for your excellent (and rapid) responses. They were
    all helpful and clear.

    On Oct 23, 11:01 am, Ben Morrow <> wrote:
    >
    > The solution is to add
    >
    > close($writeHandle);
    >
    > as the first statement executed in the parent after the child is forked;



    That worked! Thanks! Although, I'm a little puzzled: Why should
    the parent close the $writeHandle? I would think that it's enough for
    the child to close it. After all, the child process (and not the
    parent) is the one that's doing the writing.


    > Note: I have no idea if this will work on Win32. Win32 perl's fork
    > emulation can sometimes be a little peculiar.


    I agree with your statement (as I've encountered Windows' fork()ing
    peculiarities myself). However, I tested the code with your
    modifications on Windows (on Vista running Strawberry Perl) and it
    works just fine.

    So here is my final script:

    #!/usr/bin/perl

    use strict;
    use warnings;

    print "Enter a number: ";
    my $number = <STDIN>;
    chomp($number);

    my @lines = do
    {
    pipe(my $readHandle, my $writeHandle);

    if (fork() == 0)
    {
    # Only the child process gets here:
    print $writeHandle "$_\n" foreach 1 .. $number;
    exit(0);
    }

    close($writeHandle);

    # Extract the output, line-by-line:
    local $/ = "\n";
    <$readHandle>
    };

    print "Extracted output lines:\n @lines";

    __END__

    Note that no fileHandles are explicitly closed except for
    $writeHandle in the parent thread. Perhaps I should close
    $writeHandle in the child thread and $readHandle in the parent thread,
    but I figured that since I declared both fileHandles lexically, then
    they'll automatically be closed at the end of their own scopes.

    (Of course, that won't be true if the fileHandles aren't lexically
    declared, so that's something to keep in mind when not using lexical
    fileHandles.)

    However, ilovelinux's and Jürgen Exner's responses got me thinking
    about a non-pipe() way of doing what I wanted, so I looked at "perldoc
    -f open" and read up on writing to a scalar. I was able to re-write
    my code so that it still wrote output to a fileHandle, and that output
    ended up in a scalar (and eventually into an array).

    Here is that code:

    #!/usr/bin/perl

    use strict;
    use warnings;

    print "Enter a number: ";
    my $number = <STDIN>;
    chomp($number);

    my @lines = do
    {
    use 5.008; # for writing-to-scalar support
    my $output;
    open(my $writeHandle, '>', \$output)
    or die "Cannot write to \$output scalar: $!\n";

    print $writeHandle "$_\n" foreach 1 .. $number;

    close($writeHandle);

    # Split after all newlines, except for the
    # one at the very end of the $output string:
    split m/(?<=\n)(?!\z)/, $output;
    };

    print "Extracted output lines:\n @lines";

    __END__

    This works great, and without the need to invoke a child thread.
    However, according to the documentation this capability (writing to a
    scalar) is only available in Perl 5.8 or later.

    I've checked, and the target machine I'm writing code for has Perl
    5.8.8 (even though I personally use Perl 5.10). However, I want to
    write my code so that it runs on most machines. As to what "most
    machines" means is rather tricky; most machines I've run "perl -v" on
    (that might need to run my script) reveal that they are running Perl
    5.8, and I haven't found one yet running an older version.

    So I have a question for all of you: Which of the two above
    approaches should I use? The fork()ing approach in the first script
    (that runs on Unix and on at least some Windows platforms), or the open
    () approach that only runs with Perl 5.8 or later?

    I suppose I can combine the two. I can check if the Perl version
    is 5.8 or later, and if it is, use the approach that uses open(). If
    not, fall back to fork()ing and hope for the best. And unless someone
    suggests a simpler way, I can do it like so:

    eval { require 5.008 };
    if ($@)
    {
    # use the fork() approach
    }
    else
    {
    # use the open() approach
    }

    So which should I go for? Should I go with the fork(), open(), or
    both? Any thoughts are appreciated.

    Thanks again for all the help already given.

    -- Jean-Luc
     
    , Oct 23, 2009
    #4
  5. Ted Zlatanov Guest

    On Fri, 23 Oct 2009 08:47:30 -0700 (PDT) "" <> wrote:

    jpc> I'm trying to write text to a piped writeHandle (created with the
    jpc> pipe() function) so that I can later read the text by extracting it
    jpc> from a readHandle. However, I discovered that if I write a lot of
    jpc> text to the pipe, my script just hangs. Here is an example program:

    One way to do this is to write a file name to the pipe. Then the client
    just looks at that file for the actual data when the I/O arrives.

    Depending on the size and frequency of the data updates, you could also
    use a SQLite database file. It supports access from multiple clients so
    your client can just SELECT the new data if you have some way of
    detecting it (maybe pass the row IDs over the pipe), and the server just
    writes (INSERT/UPDATE) the data whenever it wants.

    Ted
     
    Ted Zlatanov, Oct 23, 2009
    #5
  6. C.DeRykus Guest

    On Oct 23, 1:00 pm, "" <> wrote:
    > ...
    >    However, ilovelinux's and Jürgen Exner's responses got me thinking
    > about a non-pipe() way of doing what I wanted, so I looked at "perldoc
    > -f open" and read up on writing to a scalar.  I was able to re-write
    > my code so that it still wrote output to a fileHandle, and that output
    > ended up in a scalar (and eventually into an array).
    >
    >    Here is that code:
    >
    > #!/usr/bin/perl
    >
    > use strict;
    > use warnings;
    >
    > print "Enter a number: ";
    > my $number = <STDIN>;
    > chomp($number);
    >
    > my @lines = do
    > {
    > use 5.008; # for writing-to-scalar support
    > my $output;
    > open(my $writeHandle, '>', \$output)
    > or die "Cannot write to \$output scalar: $!\n";
    >
    > print $writeHandle "$_\n" foreach 1 .. $number;
    >
    >    close($writeHandle);
    >
    >    # Split after all newlines, except for the
    >    # one at the very end of the $output string:
    >    split m/(?<=\n)(?!\z)/, $output;
    >
    > };
    >
    > print "Extracted output lines:\n @lines";
    >
    > __END__
    >
    >    This works great, and without the need to invoke a child thread.
    > However, according to the documentation this capability (writing to a
    > scalar) is only available in Perl 5.8 or later.
    >...


    Hm, if there's no IPC involved, can't you simply populate
    an array directly...eliminating filehandles, Perl version
    worries, and the 'do' statement completely. Did I miss
    something else?


    my @lines;
    push @lines, "$_\n" for 1 .. $number;
    print "Extracted output lines:\n @lines";

    --
    Charles DeRykus
     
    C.DeRykus, Oct 24, 2009
    #6
  7. Guest

    On Oct 23, 2:16 pm, Ben Morrow <> wrote:
    >
    > A pipe doesn't report EOF until there are no handles on it opened for
    > writing. The parent still has its write handle open, and for all the OS
    > knows it might be wanting to write to the pipe too.


    Makes sense.


    > You can use IO::Scalar under 5.6. (Indeed, you could simply use
    > IO::Scalar under all versions of perl: it's a little less efficient and
    > a little less pretty than the PerlIO-based solution in 5.8, but it will
    > work just fine.)


    That's a great suggestion! To use IO::Scalar in my program, I had
    to create two new IO::Scalars: one to write to, and one to read from.
    I edited my sample program to be:


    #!/usr/bin/perl

    use strict;
    use warnings;

    print "Enter a number: ";
    my $number = <STDIN>;
    chomp($number);

    my @lines = do
    {
    use IO::Scalar;

    my $output;
    my $writeHandle = new IO::Scalar(\$output);

    # Populate $output:
    print $writeHandle "$_\n" foreach 1 .. $number;
    close($writeHandle);

    # Populate @lines with the lines in $output:
    my $readHandle = new IO::Scalar(\$output);
    <$readHandle>
    };

    print "Extracted output lines:\n @lines";

    __END__


    For some reason my program wouldn't work with just one IO::Scalar.
    Regardless, it works perfectly now, and without the need to fork a new
    process.

    Thanks again for your excellent response, Ben. Your advice was
    very helpful.

    -- Jean-Luc
     
    , Oct 26, 2009
    #7
  8. Guest

    > Quoth "" <>:
    >
    > > # Populate @lines with the lines in $output:
    > > my $readHandle = new IO::Scalar(\$output);
    > > <$readHandle>


    On Oct 26, 10:39 am, Ben Morrow <> wrote:
    >
    > Um, there's no need for this. Just use
    >
    > split /\n/, $output;


    That doesn't do the same thing. Splitting on /\n/ removes the
    newlines from the entries, and creates an extra final element that's
    an empty string.

    I could have used this instead:

    split m/(?<=\n)(?!\z)/, $output;

    That way the $output is split after each newline, but only if that
    newline is not the last character of $output. (All newlines would be
    retained with their lines.)

    I'm not sure which is faster or more efficient, but I figured I'd
    avoid the look-behind and negative look-ahead, and instead use the
    (more familiar) diamond operator on a file handle to split out each
    line.


    > > For some reason my program wouldn't work with just one IO::Scalar.


    > Probably you have forgotten that you need to rewind the filehandle after
    > writing and before reading.


    Ah, you're right again. Now I can avoid the second IO::Scalar and
    use a seek() call instead:


    #!/usr/bin/perl

    use strict;
    use warnings;

    print "Enter a number: ";
    my $number = <STDIN>;
    chomp($number);

    my @lines = do
    {
    use IO::Scalar;
    my $output;
    my $handle = new IO::Scalar(\$output);

    # Print the lines into the $handle:
    print $handle "$_\n" foreach 1 .. $number;

    # Now rewind the handle and put its lines into @lines:
    seek($handle, 0, 0);
    <$handle>
    };

    print "Extracted output lines:\n @lines";

    __END__


    Thanks once again, Ben.

    -- Jean-Luc
     
    , Oct 26, 2009
    #8
  9. Guest

    On Oct 23, 8:44 pm, "C.DeRykus" <> wrote:
    >
    > Hm, if there's no IPC involved, can't you simply populate
    > an array directly...eliminating filehandles, Perl version
    > worries, and the 'do' statement completely. Did I miss
    > something else?


    I left out a few details, such as the fact that the routine I'm
    calling writes to a filehandle and contains over a thousand lines of
    code. (The routine is much larger than the original "foreach" loop I
    used as an example.) I could go through all the code and change it so
    that it pushes its lines onto an array, but then I'd have to change
    all the code that calls that routine as well.

    Or I could make a copy of that routine and change only that copy,
    but then any changes (major and minor) made to the original routine
    would have to be made a second time in the new routine. (I'd rather
    not maintain two almost identical large routines, if it can be
    avoided.)

    Of course, I could just hand the routine a write-filehandle to a
    temporary file on disk, but since I'd just have to read the file
    contents back in, I'd rather just skip that step and avoid the disk I/
    O altogether. (Plus, there's no guarantee that the user has
    permission to write to a temporary file outside of /tmp.)

    Ideally, I would like to be able to write to a filehandle that
    didn't require disk I/O. Creating a pipe() accomplishes that, but as
    I mentioned before, it requires a fork() process to properly avoid
    hanging the program.

    The other solutions are to use open() to write to a scalar (which
    works, but only on Perl 5.8 and later) and using IO::Scalar (which
    should work on Perl 5.6 and later). So that's why I'm currently
    sticking with IO::Scalar.

    If you know of a better way, let me know. (There may be an obvious
    way I'm just not seeing.)

    -- Jean-Luc
     
    , Oct 26, 2009
    #9
  10. wrote:
    >> Quoth "" <>:
    >>
    >>> # Populate @lines with the lines in $output:
    >>> my $readHandle = new IO::Scalar(\$output);
    >>> <$readHandle>

    >
    > On Oct 26, 10:39 am, Ben Morrow <> wrote:
    >> Um, there's no need for this. Just use
    >>
    >> split /\n/, $output;

    >
    > That doesn't do the same thing. Splitting on /\n/ removes the
    > newlines from the entries, and creates an extra final element that's
    > an empty string.
    >
    > I could have used this instead:
    >
    > split m/(?<=\n)(?!\z)/, $output;


    Or this:

    $output =~ /.*\n/g;



    John
    --
    The programmer is fighting against the two most
    destructive forces in the universe: entropy and
    human stupidity. -- Damian Conway
     
    John W. Krahn, Oct 27, 2009
    #10
  11. Guest

    On Oct 23, 10:27 am, ilovelinux <> wrote:
    >
    > Pipes have a limited capacity. See http://linux.die.net/man/7/pipe.
    > Posix prescribes a minimum capacity of 512, which is exactly
    > implemented on your windows machine:
    > $ perl -we 'print "$_\n" for 1..155'| wc -c
    > 512



    Thanks for the info. Now that I know this, I have a quick
    question:

    Say I'm writing to a pipe in a child thread, while the parent is
    reading from the pipe. If the child writes an extremely long line of
    text to the pipe (like 50,000 characters ending in a newline), will
    that cause deadlock?

    I ask this because the parent generally reads one (newline-
    terminated) line at a time when using Perl's diamond operator. If
    Perl's pipe capacity is 512 (or 4096), then the child will have
    written that capacity well before the parent can read that line (and
    therefore before the pipe is cleared).

    The good news is that I'm testing this by piping a lot of text with
    the line:

    print $writeHandle $_ x 10000, "\n" foreach 1 .. $number;

    and there doesn't seem to be a problem, despite the fact that the
    newline comes well after the 512/4096 pipe buffer limit you informed
    me of.

    The only explanation I can think of is that Perl itself has to read
    the pipe (and therefore clear its buffer) in order to see if a newline
    character is in the "pipeline". Maybe in doing so Perl transfers the
    text from the pipe's buffer to a Perl internal buffer, effectively
    clearing the pipe and preventing deadlock from happening.

    But that's my guess. I'd like to know what you (or anybody else)
    have to say about it. (Hopefully I made myself clear enough to
    understand.)

    Thanks for any advice.

    -- Jean-Luc
     
    , Oct 27, 2009
    #11
  12. Guest


    > wrote:
    > >
    > >    I could have used this instead:
    > >
    > >       split m/(?<=\n)(?!\z)/, $output;


    On Oct 26, 10:29 pm, "John W. Krahn" <> wrote:
    > Or this:
    >
    >          $output =~ /.*\n/g;



    Hey, that's clever! I like it!

    However, there's a tiny difference: Your way will discard the last
    line if $output does not end in a newline character, whereas the first
    way will keep the line.

    (Of course, this won't be an issue if $output is guaranteed to end
    in a newline.)

    -- Jean-Luc
     
    , Oct 27, 2009
    #12
  13. Guest

    On Oct 27, 10:30 am, Ben Morrow <> wrote:
    >
    >     split /^/m, $output;



    Excellent way to split out lines! Thanks!

    -- Jean-Luc
     
    , Oct 27, 2009
    #13
  14. Guest

    On Oct 27, 10:27 am, Ben Morrow <> wrote:
    >
    > When you read from a filehandle using the <> operator,
    > perl actually reads large chunks and buffers the result, then goes
    > hunting through the buffer for a newline. If it doesn't find one, it
    > will keep reading chunks (and extending the buffer) until it does, so
    > reading really long lines needs lots of buffer space in the perl
    > process, not lots of room in the pipe.



    That makes sense. Thank you.

    -- Jean-Luc
     
    , Oct 28, 2009
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. lee, wonsun
    Replies:
    1
    Views:
    490
    Jack Klein
    Nov 2, 2004
  2. Rochester

    Python open a named pipe == hanging?

    Rochester, Aug 3, 2006, in forum: Python
    Replies:
    14
    Views:
    16,704
    Alex Martelli
    Aug 8, 2006
  3. brad
    Replies:
    9
    Views:
    372
    Bruno Desthuilliers
    Jun 19, 2008
  4. Replies:
    1
    Views:
    227
    Ben Morrow
    Jun 2, 2004
  5. coolneo
    Replies:
    9
    Views:
    191
    coolneo
    Jan 30, 2007
Loading...

Share This Page