Backtick command with long output super slow

J

jl_post

Dear Perl community,

Recently I wrote a Perl script to run several commands, parse
through their output, and display the results. One important line
was:

my $text = `$command`;

Now, the command isn't trivial; it generates about 65 MB of output.
When I ran my script, my script was essentially hanging on that line
(I later realized that it wasn't hanging; it just took over 20 minutes
to run that line of code).

I tried running that exact command at a DOS command prompt
(redirecting the output to a file), and it only took about 5 seconds
to run.

So I experimented around a bit. I replaced the above line with:

my $text = "hello, world";
`$command`; # note: not assigned to anything

and the code ran in about five seconds.

I then replaced it with:

my $text = do
{
open(my $fh, "$command |") or die $!;
local $/; # enable "slurp" mode
<$fh>
};

and once again it took over 20 minutes.

I then tried to use system() to call "$command > file.txt" and then
read in "file.txt". That took about six or seven seconds.

So for some reason I wasn't able to use backticks (or even qx//)
without the operation taking up close to half and hour. To get
(relatively) quick time, I had to write it out to a temporary file,
then read that file in. (Note that this only was a problem with
commands that returned lots of output, like around 60 megabytes.)

So I wrote the following function to replace calling a command with
backticks:

sub getOutputOfCommand
{
my ($command) = @_;

# I want to do this: my $text = `$command`;
# Unfortunately, this runs super-slow on my version of
# Perl (on Windows).
# I discovered that just using system() to run the command
# while redirecting the output to a file, and then reading in
# that file is much, much quicker (at least by twenty times).
#
# So that's what I'm doing here: Writing the command's
# output out to file, and reading it back in.
# (I have no idea why it's faster than just $text = `command`.)

my $text;

if (0) # we're not doing it this way because it takes too long
{
$text = `$command`;
die "\nError running command:\n\n $command\n\n" if $?;
}
else
{
# For some strange reason, this way is much quicker for me.

my $temporaryFileName = ".temporary.$0.$$";
system("$command > \"$temporaryFileName\"") == 0
or die "Error running command:\n\n $command\n\n";
$text = do
{
open(my $fh, $temporaryFileName)
or die "\nCannot read '$temporaryFileName': $!\n";
local $/; # enable "slurp" mode
<$fh>
};
unlink($temporaryFileName)
or die "Cannot unlink '$temporaryFileName': $!\n";
}

return $text;
}

Now I can quickly get the output of a command with:

my $text = getOutputOfCommand($command);

Since I was using Strawberry Perl 5.12 on Windows, I decided to try
running the script on Linux. Unfortunately, the same $command that
worked on Windows wasn't working on Unix (not Perl's fault; for some
reason it gave an error when running on Linux. Maybe the executable
was a different version).

After that, I decided to download a portable version of Strawberry
Perl 5.16 and try running it with that. Evidently, that portable
version DOES NOT have the same problem (that is, calling a command
(with lots of output) via backticks in Strawberry Perl 5.16 took just
seconds (instead of minutes)).

So does anyone know if there is a known problem with calling a
command in backticks in Perl 5.12?

For those who are interested, here is the output of "perl -v" when
using Strawberry Perl 5.12:

This is perl 5, version 12, subversion 1 (v5.12.1) built for MSWin32-
x86-multi-thread

and here is the output of "perl -v" when using portable Strawberry
Perl 5.16:

This is perl 5, version 16, subversion 2 (v5.16.2) built for MSWin32-
x86-multi-thread

Since I found several work-arounds to this problem, it's not
imperative that I get a response. Still, it's nice to know about it
in case I run across this issue in the future.

Thanks,

-- Jean-Luc
 
W

Willem

(e-mail address removed) wrote:
) Dear Perl community,
)
) Recently I wrote a Perl script to run several commands, parse
) through their output, and display the results. One important line
) was:
)
) my $text = `$command`;
)
) Now, the command isn't trivial; it generates about 65 MB of output.
) When I ran my script, my script was essentially hanging on that line
) (I later realized that it wasn't hanging; it just took over 20 minutes
) to run that line of code).
)
) I tried running that exact command at a DOS command prompt
) (redirecting the output to a file), and it only took about 5 seconds
) to run.
)
) So I experimented around a bit. I replaced the above line with:
)
) my $text = "hello, world";
) `$command`; # note: not assigned to anything
)
) and the code ran in about five seconds.
)
) I then replaced it with:
)
) my $text = do
) {
) open(my $fh, "$command |") or die $!;
) local $/; # enable "slurp" mode
) <$fh>
) };
)
) and once again it took over 20 minutes.

It sounds like a string-growing issue.
Have you tried reading it line by line?

my @text = `$command`;


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
U

Uri Guttman

BM> I don't know of any specific problem with backticks, but there was a
BM> problem at some point with perl's memory allocation strategy on Win32.
BM> When growing a long string perl ended up doing the growing in far more
BM> far smaller increments than was helpful; since the memory allocation
BM> calls on Win32 are rather slow this could cause a significant slowdown.
BM> I don't remember exactly when this was fixed; I thought it was before
BM> 5.12, but I could be wrong.

then wouldn't preallocating the buffer in the string help? not that i
will play with winblows but it is an easy fix.

OP: assign a long string first to the var and then call backticks:

my $buffer = ' ' x 65_000_000 ;
$buffer = `command` ;

see if that speeds it up. if it does, it should be faster than using a
temp file. also if you still want the temp file, use File::Slurp to read
it is as it should be faster than perl I/O.

uri
 
R

Rainer Weikusat

Ben Morrow said:
Quoth Willem said:
(e-mail address removed) wrote:
[...]
) my $text = `$command`;
)
) Now, the command isn't trivial; it generates about 65 MB of output.
) When I ran my script, my script was essentially hanging on that line
) (I later realized that it wasn't hanging; it just took over 20 minutes
) to run that line of code).
[...]
It sounds like a string-growing issue.
Have you tried reading it line by line?

my @text = `$command`;

That would almost certainly be worse: array-growing uses the same
allocation strategy as string-growing, plus you've introduced the
additional overhead of looking for newlines.

I don't have the time to test this now but assuming the issue is what
was hinted at, namely, O(n*n) realloc because of 'small', constant
increments combined with an allocator which hasn't been tuned to work
well despite some people's hell-bentness to do it the wrong way,
ie, one which does a copying realloc rather often, expecting the 'read
into array' code to perform better isn't completely unreasonable:
Unless the lines are very short, an array storing 65M lines of text is
going to be a lot smaller than 65M (roughly, a pointer per line plus
some constant 'management space', if the average line lenght is n,
that would be 65M/(8n) [64bit]) so, extending it by copying the old
contents to a new location will be faster and 'scanning for newlines'
amounts to 0.5 copies (read every byte and compare it with some value)
of the complete input data (actually splitting it into lines is
another complete copy).
 
R

Rainer Weikusat

[...]
Unless the lines are very short, an array storing 65M lines of text is
going to be a lot smaller than 65M (roughly, a pointer per line plus
some constant 'management space', if the average line lenght is n,
that would be 65M/(8n) [64bit])

This should have been (8*65M)/n.
 
W

Willem

Ben Morrow wrote:
)
) Quoth Willem <[email protected]>:
)> It sounds like a string-growing issue.
)> Have you tried reading it line by line?
)>
)> my @text = `$command`;
)
) That would almost certainly be worse: array-growing uses the same
) allocation strategy as string-growing, plus you've introduced the
) additional overhead of looking for newlines.

But in the case of array-growing there won't be a gazillion copy operations
on the whole string buffer, but only on the array, which is a lot smaller.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,906
Latest member
SkinfixSkintag

Latest Threads

Top