Performance question

D

Dave Sill

One of my users made the following observation. I'm only an
occasional, lightweight Perl user, so I can't explain what he's
seeing. Can anyone shed some light on it? H/W is a pretty large/fast
Dell server running RHEL 3.

----
I manufactured a 401x401 [ linearly =160801 element] array [@judy]
each element having string values like

01000000001110000000000000000001

I needed to make a comma delimited ascii file of this data.

I decided a single IO write of a string would be the fastest, so i
made a string
$str="";
foreach $i(0..$#judy-1)
{
$str=$str."$judy[$i],"
}
$str=$str."$judy[$#judy]"; open(OUT,">$output_file");print OUT
$str;close(OUT);
`gzip -f $output_file`;

this took 16 minutes.

i tried it the slow way,

open(OUT,">$output_file");
foreach $i(0..$#judy-1)
{
print OUT "$judy[$i],";
}
print OUT "$judy[$#judy]"; close(OUT);
`gzip -f $output_file`;

with 160K IOs, this took about 3 seconds.

the gz files were different, but diff said uncompressed they were the
same.
 
P

Paul Lalli

Dave said:
One of my users made the following observation. I'm only an
occasional, lightweight Perl user, so I can't explain what he's
seeing. Can anyone shed some light on it? H/W is a pretty large/fast
Dell server running RHEL 3.

----
I manufactured a 401x401 [ linearly =160801 element] array [@judy]
each element having string values like

01000000001110000000000000000001

I needed to make a comma delimited ascii file of this data.

I decided a single IO write of a string would be the fastest, so i
made a string
$str="";
foreach $i(0..$#judy-1)
{
$str=$str."$judy[$i],"
}
$str=$str."$judy[$#judy]"; open(OUT,">$output_file");print OUT
$str;close(OUT);
`gzip -f $output_file`;

this took 16 minutes.

i tried it the slow way,

open(OUT,">$output_file");
foreach $i(0..$#judy-1)
{
print OUT "$judy[$i],";
}
print OUT "$judy[$#judy]"; close(OUT);
`gzip -f $output_file`;

with 160K IOs, this took about 3 seconds.

the gz files were different, but diff said uncompressed they were the
same.

Your user has an odd definition of "faster" and "slower". I don't know
what would make the user think that storing the entire 160,801 element
array in memory TWICE would be faster than just printing what's needed
when it's needed.

In the first algorithm, the user is storing one large string, and each
time through the loop, appending to that string. Towards the end, this
means storing over (160,000 x 32) bytes in a single scalar, and asking
perl to append to the end of that string. Then finally you ask perl to
make one absurdly large I/O access.

In the second algorithm, you're simply printing 32 bytes repeatedly.

Neither of those ways are especially good perl code, of course. The
first would be better written:

my $str = join (',', @judy);
open my $out, '>', $output_file or die "Can't open output: $!";
print $out $str;
close $out;

The second would be better written

open my $out, '>', $output_file or die "Can't open output: $!";
{
local $, = ',';
print $out @judy;
}
close $out;

I would suggest your user use the Benchmark module to determine which of
these is actually faster.

Paul Lalli
 
X

xhoster

Dave Sill said:
One of my users made the following observation. I'm only an
occasional, lightweight Perl user, so I can't explain what he's
seeing. Can anyone shed some light on it? H/W is a pretty large/fast
Dell server running RHEL 3.

----
I manufactured a 401x401 [ linearly =160801 element] array [@judy]
each element having string values like

01000000001110000000000000000001

I needed to make a comma delimited ascii file of this data.

I decided a single IO write of a string would be the fastest, so i
made a string
$str="";
foreach $i(0..$#judy-1)
{
$str=$str."$judy[$i],"

This has to copy the contents of $str (which towards the end is quite
huge) each time through the loop. Maybe even twice. Using:

$str.="judy[$i],";

is tremendously faster, because it just tacks something onto the end of the
string when possible, rather than copying the entire string each time. (I
would have thought perl would have optimized the first into the second, but
apparently it doesn't. Maybe such optimization would cause overloading to
break.)

}
$str=$str."$judy[$#judy]";

Of course, I do have to wonder why you just don't use
$str = join ",", @judy;

Xho
 
X

xhoster

Paul Lalli said:
In the first algorithm, the user is storing one large string, and each
time through the loop, appending to that string. Towards the end, this
means storing over (160,000 x 32) bytes in a single scalar, and asking
perl to append to the end of that string.

If he were doing that, it wouldn't be so bad. Perl is pretty good at
handling that. But he isn't asking Perl to append to the end of that
string, but rather to copy that string and then append to the end of that
copy.

Then finally you ask perl to
make one absurdly large I/O access.

There is nothing absurd about the size of the I/O.

Xho
 
R

robic0

One of my users made the following observation. I'm only an
occasional, lightweight Perl user, so I can't explain what he's
seeing. Can anyone shed some light on it? H/W is a pretty large/fast
Dell server running RHEL 3.
<snip>

I won't be critical of anything beyond this point in your description.
The fact of the matter is that the idea of pre-defining, allocating
large, multi-dimensional arrays are strictly mental masturbation
of college professors that have nothing at all to do with real-
world programming !!

If the balck box idea is to get data, perform an operation on
it, then put the results somewhere, then this is done on the
micro level -- not the macro level.....
Imagine a cpu holding an entire "exe" in its cache before it
is written to memory and executed.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top