Backtick command with long output super slow

jl_post · Dec 4, 2012

Dear Perl community,

Recently I wrote a Perl script to run several commands, parse
through their output, and display the results. One important line
was:

my $text = `$command`;

Now, the command isn't trivial; it generates about 65 MB of output.
When I ran my script, my script was essentially hanging on that line
(I later realized that it wasn't hanging; it just took over 20 minutes
to run that line of code).

I tried running that exact command at a DOS command prompt
(redirecting the output to a file), and it only took about 5 seconds
to run.

So I experimented around a bit. I replaced the above line with:

my $text = "hello, world";
`$command`; # note: not assigned to anything

and the code ran in about five seconds.

I then replaced it with:

my $text = do
{
open(my $fh, "$command |") or die $!;
local $/; # enable "slurp" mode
<$fh>
};

and once again it took over 20 minutes.

I then tried to use system() to call "$command > file.txt" and then
read in "file.txt". That took about six or seven seconds.

So for some reason I wasn't able to use backticks (or even qx//)
without the operation taking up close to half and hour. To get
(relatively) quick time, I had to write it out to a temporary file,
then read that file in. (Note that this only was a problem with
commands that returned lots of output, like around 60 megabytes.)

So I wrote the following function to replace calling a command with
backticks:

sub getOutputOfCommand
{
my ($command) = @_;

# I want to do this: my $text = `$command`;
# Unfortunately, this runs super-slow on my version of
# Perl (on Windows).
# I discovered that just using system() to run the command
# while redirecting the output to a file, and then reading in
# that file is much, much quicker (at least by twenty times).
#
# So that's what I'm doing here: Writing the command's
# output out to file, and reading it back in.
# (I have no idea why it's faster than just $text = `command`.)

my $text;

if (0) # we're not doing it this way because it takes too long
{
$text = `$command`;
die "\nError running command:\n\n $command\n\n" if $?;
}
else
{
# For some strange reason, this way is much quicker for me.

my $temporaryFileName = ".temporary.$0.$$";
system("$command > \"$temporaryFileName\"") == 0
or die "Error running command:\n\n $command\n\n";
$text = do
{
open(my $fh, $temporaryFileName)
or die "\nCannot read '$temporaryFileName': $!\n";
local $/; # enable "slurp" mode
<$fh>
};
unlink($temporaryFileName)
or die "Cannot unlink '$temporaryFileName': $!\n";
}

return $text;
}

Now I can quickly get the output of a command with:

my $text = getOutputOfCommand($command);

Since I was using Strawberry Perl 5.12 on Windows, I decided to try
running the script on Linux. Unfortunately, the same $command that
worked on Windows wasn't working on Unix (not Perl's fault; for some
reason it gave an error when running on Linux. Maybe the executable
was a different version).

After that, I decided to download a portable version of Strawberry
Perl 5.16 and try running it with that. Evidently, that portable
version DOES NOT have the same problem (that is, calling a command
(with lots of output) via backticks in Strawberry Perl 5.16 took just
seconds (instead of minutes)).

So does anyone know if there is a known problem with calling a
command in backticks in Perl 5.12?

For those who are interested, here is the output of "perl -v" when
using Strawberry Perl 5.12:

This is perl 5, version 12, subversion 1 (v5.12.1) built for MSWin32-
x86-multi-thread

and here is the output of "perl -v" when using portable Strawberry
Perl 5.16:

This is perl 5, version 16, subversion 2 (v5.16.2) built for MSWin32-
x86-multi-thread

Since I found several work-arounds to this problem, it's not
imperative that I get a response. Still, it's nice to know about it
in case I run across this issue in the future.

Thanks,

-- Jean-Luc

Willem · Dec 4, 2012

(e-mail address removed) wrote:
) Dear Perl community,
)
) Recently I wrote a Perl script to run several commands, parse
) through their output, and display the results. One important line
) was:
)
) my $text = `$command`;
)
) Now, the command isn't trivial; it generates about 65 MB of output.
) When I ran my script, my script was essentially hanging on that line
) (I later realized that it wasn't hanging; it just took over 20 minutes
) to run that line of code).
)
) I tried running that exact command at a DOS command prompt
) (redirecting the output to a file), and it only took about 5 seconds
) to run.
)
) So I experimented around a bit. I replaced the above line with:
)
) my $text = "hello, world";
) `$command`; # note: not assigned to anything
)
) and the code ran in about five seconds.
)
) I then replaced it with:
)
) my $text = do
) {
) open(my $fh, "$command |") or die $!;
) local $/; # enable "slurp" mode
) <$fh>
) };
)
) and once again it took over 20 minutes.

It sounds like a string-growing issue.
Have you tried reading it line by line?

my @text = `$command`;

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Uri Guttman · Dec 5, 2012

BM> I don't know of any specific problem with backticks, but there was a
BM> problem at some point with perl's memory allocation strategy on Win32.
BM> When growing a long string perl ended up doing the growing in far more
BM> far smaller increments than was helpful; since the memory allocation
BM> calls on Win32 are rather slow this could cause a significant slowdown.
BM> I don't remember exactly when this was fixed; I thought it was before
BM> 5.12, but I could be wrong.

then wouldn't preallocating the buffer in the string help? not that i
will play with winblows but it is an easy fix.

OP: assign a long string first to the var and then call backticks:

my $buffer = ' ' x 65_000_000 ;
$buffer = `command` ;

see if that speeds it up. if it does, it should be faster than using a
temp file. also if you still want the temp file, use File::Slurp to read
it is as it should be faster than perl I/O.

uri

Rainer Weikusat · Dec 5, 2012

Ben Morrow said:
Quoth Willem said:

(e-mail address removed) wrote:

Click to expand...

[...]

) my $text = `$command`;
)
) Now, the command isn't trivial; it generates about 65 MB of output.
) When I ran my script, my script was essentially hanging on that line
) (I later realized that it wasn't hanging; it just took over 20 minutes
) to run that line of code).

Click to expand...

[...]

It sounds like a string-growing issue.
Have you tried reading it line by line?

my @text = `$command`;

Click to expand...

That would almost certainly be worse: array-growing uses the same
allocation strategy as string-growing, plus you've introduced the
additional overhead of looking for newlines.

I don't have the time to test this now but assuming the issue is what
was hinted at, namely, O(n*n) realloc because of 'small', constant
increments combined with an allocator which hasn't been tuned to work
well despite some people's hell-bentness to do it the wrong way,
ie, one which does a copying realloc rather often, expecting the 'read
into array' code to perform better isn't completely unreasonable:
Unless the lines are very short, an array storing 65M lines of text is
going to be a lot smaller than 65M (roughly, a pointer per line plus
some constant 'management space', if the average line lenght is n,
that would be 65M/(8n) [64bit]) so, extending it by copying the old
contents to a new location will be faster and 'scanning for newlines'
amounts to 0.5 copies (read every byte and compare it with some value)
of the complete input data (actually splitting it into lines is
another complete copy).

Rainer Weikusat · Dec 5, 2012

[...]

Unless the lines are very short, an array storing 65M lines of text is
going to be a lot smaller than 65M (roughly, a pointer per line plus
some constant 'management space', if the average line lenght is n,
that would be 65M/(8n) [64bit])

This should have been (8*65M)/n.

Willem · Dec 5, 2012

Ben Morrow wrote:
)
) Quoth Willem <[email protected]>:
)> It sounds like a string-growing issue.
)> Have you tried reading it line by line?
)>
)> my @text = `$command`;
)
) That would almost certainly be worse: array-growing uses the same
) allocation strategy as string-growing, plus you've introduced the
) additional overhead of looking for newlines.

But in the case of array-growing there won't be a gazillion copy operations
on the whole string buffer, but only on the array, which is a lot smaller.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Output confusion	2	Mar 9, 2023
Command Line Arguments	0	Mar 7, 2023
Print with command-line arguments	0	Oct 2, 2022
Very slow	16	Jan 12, 2012
Bash: time command on PNF	0	Jan 8, 2023
record every backtick on STDOUT or some log file	8	Apr 23, 2012
backtick and system command	7	Jan 14, 2007
A problem in viewing the output!	2	Jun 14, 2024

Backtick command with long output super slow

jl_post

Willem

Uri Guttman

Rainer Weikusat

Rainer Weikusat

Willem

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads