Performance question

Dave Sill · Mar 29, 2005

One of my users made the following observation. I'm only an
occasional, lightweight Perl user, so I can't explain what he's
seeing. Can anyone shed some light on it? H/W is a pretty large/fast
Dell server running RHEL 3.

----
I manufactured a 401x401 [ linearly =160801 element] array [@judy]
each element having string values like

01000000001110000000000000000001

I needed to make a comma delimited ascii file of this data.

I decided a single IO write of a string would be the fastest, so i
made a string
$str="";
foreach $i(0..$#judy-1)
{
$str=$str."$judy[$i],"
}
$str=$str."$judy[$#judy]"; open(OUT,">$output_file");print OUT
$str;close(OUT);
`gzip -f $output_file`;

this took 16 minutes.

i tried it the slow way,

open(OUT,">$output_file");
foreach $i(0..$#judy-1)
{
print OUT "$judy[$i],";
}
print OUT "$judy[$#judy]"; close(OUT);
`gzip -f $output_file`;

with 160K IOs, this took about 3 seconds.

the gz files were different, but diff said uncompressed they were the
same.

Paul Lalli · Mar 29, 2005

Dave said:
One of my users made the following observation. I'm only an
occasional, lightweight Perl user, so I can't explain what he's
seeing. Can anyone shed some light on it? H/W is a pretty large/fast
Dell server running RHEL 3.

----
I manufactured a 401x401 [ linearly =160801 element] array [@judy]
each element having string values like

01000000001110000000000000000001

I needed to make a comma delimited ascii file of this data.

I decided a single IO write of a string would be the fastest, so i
made a string
$str="";
foreach $i(0..$#judy-1)
{
$str=$str."$judy[$i],"
}
$str=$str."$judy[$#judy]"; open(OUT,">$output_file");print OUT
$str;close(OUT);
`gzip -f $output_file`;

this took 16 minutes.

i tried it the slow way,

open(OUT,">$output_file");
foreach $i(0..$#judy-1)
{
print OUT "$judy[$i],";
}
print OUT "$judy[$#judy]"; close(OUT);
`gzip -f $output_file`;

with 160K IOs, this took about 3 seconds.

the gz files were different, but diff said uncompressed they were the
same.

Your user has an odd definition of "faster" and "slower". I don't know
what would make the user think that storing the entire 160,801 element
array in memory TWICE would be faster than just printing what's needed
when it's needed.

In the first algorithm, the user is storing one large string, and each
time through the loop, appending to that string. Towards the end, this
means storing over (160,000 x 32) bytes in a single scalar, and asking
perl to append to the end of that string. Then finally you ask perl to
make one absurdly large I/O access.

In the second algorithm, you're simply printing 32 bytes repeatedly.

Neither of those ways are especially good perl code, of course. The
first would be better written:

my $str = join (',', @judy);
open my $out, '>', $output_file or die "Can't open output: $!";
print $out $str;
close $out;

The second would be better written

open my $out, '>', $output_file or die "Can't open output: $!";
{
local $, = ',';
print $out @judy;
}
close $out;

I would suggest your user use the Benchmark module to determine which of
these is actually faster.

Paul Lalli

xhoster · Mar 29, 2005

Dave Sill said:
One of my users made the following observation. I'm only an
occasional, lightweight Perl user, so I can't explain what he's
seeing. Can anyone shed some light on it? H/W is a pretty large/fast
Dell server running RHEL 3.

----
I manufactured a 401x401 [ linearly =160801 element] array [@judy]
each element having string values like

01000000001110000000000000000001

I needed to make a comma delimited ascii file of this data.

I decided a single IO write of a string would be the fastest, so i
made a string
$str="";
foreach $i(0..$#judy-1)
{
$str=$str."$judy[$i],"

This has to copy the contents of $str (which towards the end is quite
huge) each time through the loop. Maybe even twice. Using:

$str.="judy[$i],";

is tremendously faster, because it just tacks something onto the end of the
string when possible, rather than copying the entire string each time. (I
would have thought perl would have optimized the first into the second, but
apparently it doesn't. Maybe such optimization would cause overloading to
break.)

}
$str=$str."$judy[$#judy]";

Of course, I do have to wonder why you just don't use
$str = join ",", @judy;

Xho

xhoster · Mar 29, 2005

Paul Lalli said:
In the first algorithm, the user is storing one large string, and each
time through the loop, appending to that string. Towards the end, this
means storing over (160,000 x 32) bytes in a single scalar, and asking
perl to append to the end of that string.

If he were doing that, it wouldn't be so bad. Perl is pretty good at
handling that. But he isn't asking Perl to append to the end of that
string, but rather to copy that string and then append to the end of that
copy.

Then finally you ask perl to
make one absurdly large I/O access.

There is nothing absurd about the size of the I/O.

Xho

robic0 · Apr 1, 2005

One of my users made the following observation. I'm only an
occasional, lightweight Perl user, so I can't explain what he's
seeing. Can anyone shed some light on it? H/W is a pretty large/fast
Dell server running RHEL 3.

<snip>

I won't be critical of anything beyond this point in your description.
The fact of the matter is that the idea of pre-defining, allocating
large, multi-dimensional arrays are strictly mental masturbation
of college professors that have nothing at all to do with real-
world programming !!

If the balck box idea is to get data, perform an operation on
it, then put the results somewhere, then this is done on the
micro level -- not the macro level.....
Imagine a cpu holding an entire "exe" in its cache before it
is written to memory and executed.

Need help with this script	4	Mar 12, 2023
Problems with "show tech" using the Net::Telnet Module	5	Apr 19, 2010
Translater + module + tkinter	1	Feb 16, 2023
ChatBot	4	Jan 19, 2021
performance problem with time.strptime()	1	Jul 2, 2009
How do I PUSH an HTTP::POST using perl LWP?	2	Jul 23, 2008
Quick Question	7	Feb 20, 2007
Performance issue	9	Apr 2, 2005

Performance question

Dave Sill

Paul Lalli

xhoster

xhoster

robic0

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads