Backtick command with long output super slow

Discussion in 'Perl Misc' started by jl_post@hotmail.com, Dec 4, 2012.

  1. Guest

    Dear Perl community,

    Recently I wrote a Perl script to run several commands, parse
    through their output, and display the results. One important line
    was:

    my $text = `$command`;

    Now, the command isn't trivial; it generates about 65 MB of output.
    When I ran my script, my script was essentially hanging on that line
    (I later realized that it wasn't hanging; it just took over 20 minutes
    to run that line of code).

    I tried running that exact command at a DOS command prompt
    (redirecting the output to a file), and it only took about 5 seconds
    to run.

    So I experimented around a bit. I replaced the above line with:

    my $text = "hello, world";
    `$command`; # note: not assigned to anything

    and the code ran in about five seconds.

    I then replaced it with:

    my $text = do
    {
    open(my $fh, "$command |") or die $!;
    local $/; # enable "slurp" mode
    <$fh>
    };

    and once again it took over 20 minutes.

    I then tried to use system() to call "$command > file.txt" and then
    read in "file.txt". That took about six or seven seconds.

    So for some reason I wasn't able to use backticks (or even qx//)
    without the operation taking up close to half and hour. To get
    (relatively) quick time, I had to write it out to a temporary file,
    then read that file in. (Note that this only was a problem with
    commands that returned lots of output, like around 60 megabytes.)

    So I wrote the following function to replace calling a command with
    backticks:

    sub getOutputOfCommand
    {
    my ($command) = @_;

    # I want to do this: my $text = `$command`;
    # Unfortunately, this runs super-slow on my version of
    # Perl (on Windows).
    # I discovered that just using system() to run the command
    # while redirecting the output to a file, and then reading in
    # that file is much, much quicker (at least by twenty times).
    #
    # So that's what I'm doing here: Writing the command's
    # output out to file, and reading it back in.
    # (I have no idea why it's faster than just $text = `command`.)

    my $text;

    if (0) # we're not doing it this way because it takes too long
    {
    $text = `$command`;
    die "\nError running command:\n\n $command\n\n" if $?;
    }
    else
    {
    # For some strange reason, this way is much quicker for me.

    my $temporaryFileName = ".temporary.$0.$$";
    system("$command > \"$temporaryFileName\"") == 0
    or die "Error running command:\n\n $command\n\n";
    $text = do
    {
    open(my $fh, $temporaryFileName)
    or die "\nCannot read '$temporaryFileName': $!\n";
    local $/; # enable "slurp" mode
    <$fh>
    };
    unlink($temporaryFileName)
    or die "Cannot unlink '$temporaryFileName': $!\n";
    }

    return $text;
    }

    Now I can quickly get the output of a command with:

    my $text = getOutputOfCommand($command);

    Since I was using Strawberry Perl 5.12 on Windows, I decided to try
    running the script on Linux. Unfortunately, the same $command that
    worked on Windows wasn't working on Unix (not Perl's fault; for some
    reason it gave an error when running on Linux. Maybe the executable
    was a different version).

    After that, I decided to download a portable version of Strawberry
    Perl 5.16 and try running it with that. Evidently, that portable
    version DOES NOT have the same problem (that is, calling a command
    (with lots of output) via backticks in Strawberry Perl 5.16 took just
    seconds (instead of minutes)).

    So does anyone know if there is a known problem with calling a
    command in backticks in Perl 5.12?

    For those who are interested, here is the output of "perl -v" when
    using Strawberry Perl 5.12:

    This is perl 5, version 12, subversion 1 (v5.12.1) built for MSWin32-
    x86-multi-thread

    and here is the output of "perl -v" when using portable Strawberry
    Perl 5.16:

    This is perl 5, version 16, subversion 2 (v5.16.2) built for MSWin32-
    x86-multi-thread

    Since I found several work-arounds to this problem, it's not
    imperative that I get a response. Still, it's nice to know about it
    in case I run across this issue in the future.

    Thanks,

    -- Jean-Luc
     
    , Dec 4, 2012
    #1
    1. Advertising

  2. Willem Guest

    wrote:
    ) Dear Perl community,
    )
    ) Recently I wrote a Perl script to run several commands, parse
    ) through their output, and display the results. One important line
    ) was:
    )
    ) my $text = `$command`;
    )
    ) Now, the command isn't trivial; it generates about 65 MB of output.
    ) When I ran my script, my script was essentially hanging on that line
    ) (I later realized that it wasn't hanging; it just took over 20 minutes
    ) to run that line of code).
    )
    ) I tried running that exact command at a DOS command prompt
    ) (redirecting the output to a file), and it only took about 5 seconds
    ) to run.
    )
    ) So I experimented around a bit. I replaced the above line with:
    )
    ) my $text = "hello, world";
    ) `$command`; # note: not assigned to anything
    )
    ) and the code ran in about five seconds.
    )
    ) I then replaced it with:
    )
    ) my $text = do
    ) {
    ) open(my $fh, "$command |") or die $!;
    ) local $/; # enable "slurp" mode
    ) <$fh>
    ) };
    )
    ) and once again it took over 20 minutes.

    It sounds like a string-growing issue.
    Have you tried reading it line by line?

    my @text = `$command`;


    SaSW, Willem
    --
    Disclaimer: I am in no way responsible for any of the statements
    made in the above text. For all I know I might be
    drugged or something..
    No I'm not paranoid. You all think I'm paranoid, don't you !
    #EOT
     
    Willem, Dec 4, 2012
    #2
    1. Advertising

  3. Uri Guttman Guest

    >>>>> "BM" == Ben Morrow <> writes:

    BM> I don't know of any specific problem with backticks, but there was a
    BM> problem at some point with perl's memory allocation strategy on Win32.
    BM> When growing a long string perl ended up doing the growing in far more
    BM> far smaller increments than was helpful; since the memory allocation
    BM> calls on Win32 are rather slow this could cause a significant slowdown.
    BM> I don't remember exactly when this was fixed; I thought it was before
    BM> 5.12, but I could be wrong.

    then wouldn't preallocating the buffer in the string help? not that i
    will play with winblows but it is an easy fix.

    OP: assign a long string first to the var and then call backticks:

    my $buffer = ' ' x 65_000_000 ;
    $buffer = `command` ;

    see if that speeds it up. if it does, it should be faster than using a
    temp file. also if you still want the temp file, use File::Slurp to read
    it is as it should be faster than perl I/O.

    uri
     
    Uri Guttman, Dec 5, 2012
    #3
  4. Ben Morrow <> writes:
    > Quoth Willem <>:
    >> wrote:


    [...]

    >> ) my $text = `$command`;
    >> )
    >> ) Now, the command isn't trivial; it generates about 65 MB of output.
    >> ) When I ran my script, my script was essentially hanging on that line
    >> ) (I later realized that it wasn't hanging; it just took over 20 minutes
    >> ) to run that line of code).


    [...]

    >> It sounds like a string-growing issue.
    >> Have you tried reading it line by line?
    >>
    >> my @text = `$command`;

    >
    > That would almost certainly be worse: array-growing uses the same
    > allocation strategy as string-growing, plus you've introduced the
    > additional overhead of looking for newlines.


    I don't have the time to test this now but assuming the issue is what
    was hinted at, namely, O(n*n) realloc because of 'small', constant
    increments combined with an allocator which hasn't been tuned to work
    well despite some people's hell-bentness to do it the wrong way,
    ie, one which does a copying realloc rather often, expecting the 'read
    into array' code to perform better isn't completely unreasonable:
    Unless the lines are very short, an array storing 65M lines of text is
    going to be a lot smaller than 65M (roughly, a pointer per line plus
    some constant 'management space', if the average line lenght is n,
    that would be 65M/(8n) [64bit]) so, extending it by copying the old
    contents to a new location will be faster and 'scanning for newlines'
    amounts to 0.5 copies (read every byte and compare it with some value)
    of the complete input data (actually splitting it into lines is
    another complete copy).
     
    Rainer Weikusat, Dec 5, 2012
    #4
  5. Rainer Weikusat <> writes:

    [...]

    > Unless the lines are very short, an array storing 65M lines of text is
    > going to be a lot smaller than 65M (roughly, a pointer per line plus
    > some constant 'management space', if the average line lenght is n,
    > that would be 65M/(8n) [64bit])


    This should have been (8*65M)/n.
     
    Rainer Weikusat, Dec 5, 2012
    #5
  6. Willem Guest

    Ben Morrow wrote:
    )
    ) Quoth Willem <>:
    )> It sounds like a string-growing issue.
    )> Have you tried reading it line by line?
    )>
    )> my @text = `$command`;
    )
    ) That would almost certainly be worse: array-growing uses the same
    ) allocation strategy as string-growing, plus you've introduced the
    ) additional overhead of looking for newlines.

    But in the case of array-growing there won't be a gazillion copy operations
    on the whole string buffer, but only on the array, which is a lot smaller.


    SaSW, Willem
    --
    Disclaimer: I am in no way responsible for any of the statements
    made in the above text. For all I know I might be
    drugged or something..
    No I'm not paranoid. You all think I'm paranoid, don't you !
    #EOT
     
    Willem, Dec 5, 2012
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Leszek Dubiel
    Replies:
    3
    Views:
    9,832
  2. Guest

    super.super.super how?

    Guest, Feb 19, 2005, in forum: Java
    Replies:
    24
    Views:
    10,800
    Darryl Pierce
    Feb 24, 2005
  3. Fernando Rodriguez

    Getting the super class via the super() function

    Fernando Rodriguez, Nov 21, 2003, in forum: Python
    Replies:
    2
    Views:
    724
    Bob Willan
    Nov 22, 2003
  4. Kerim Borchaev

    super. could there be a simpler super?

    Kerim Borchaev, Jan 15, 2004, in forum: Python
    Replies:
    4
    Views:
    479
    Michele Simionato
    Jan 15, 2004
  5. backtick and system command

    , Jan 14, 2007, in forum: Perl Misc
    Replies:
    7
    Views:
    134
    Josef Moellers
    Jan 16, 2007
Loading...

Share This Page