Perl Performance Tuning

O

occitan

Hi,

Makepp is a big heavy-duty program, where speed is a must. A lot of
effort has been put into optimizing it. This documents some general
things we have found:

http://makepp.sourceforge.net/1.50/perl_performance.html

This should be required reading for all Perl hackers, including those
who write the alas sometimes suboptimal documentation.

I realize that the findings don't always reflect conventional wisdom.
But Perl doesn't lie. If one technique is measurably slower than
another, there must be a compelling reason, beyond personal taste, to
still use it. No flame war -- objective discussion only, please!

You are especially welcome to provide other findings, preferably with
code that demonstrates them.

best regards
Daniel
 
U

Uri Guttman

o> http://makepp.sourceforge.net/1.50/perl_performance.html

o> This should be required reading for all Perl hackers, including those
o> who write the alas sometimes suboptimal documentation.

there are some odd points in that and plenty of stuff you missed. also
the wording is very bizarre in some places.

@_ is not called the stack. it is @_. calling with &foo just leaves @_
as it is and so it can be used by the callee. but as that is a bug
waiting to happen, i wouldn't use it just for speedups.

as for passing around long strings, using the $_[0] to directly access
them is also a bug waiting to happen. just pass them by reference and
all is well. you get a proper named variable and no extra copying of
string in the arg list.

who in their right mind uses lots of bit flags for booleans in perl? in
c it makes sense for packing reasons but with perl that won't save
storage since SV's all take up overhead way beyond a few bits. maybe you
mistakenly tried that and it was slow but i wouldn't have even ventured
down that path to begin with.

choosing for vs map is more than just speed. it has to do with side
effects or generating a result list. and the size of the input list
matters too.

use undef instead of 0 makes no sense. you seem to be boolean flag crazy
if that matters to you. as such i would redesign the code to not need so
many flags. flags are not perlish, especially flags in conditionals and
loops. and i believe a basic sv can store an integer or undef (a flag)
using the same space. only a hash or array could have undef as an
element and same space and only if it is done a certain way. $foo{bar} =
undef will use the same space as 0 as it will allocate an SV for
that. but @foo{@keys} = () will not allocate sv's and will same some
space but that means boolean tests on it must be done with exists().

the amount of time you spend on such optimizations would be
better spent on better design and code. and i am not even talking better
algorithms. you can implement the same algorithm in many different ways
and with many different speed results.

avoiding hashes is a very foolish thing to say. arrays as objects are
not much faster at all. look at the brutal history of pseudohashes in
perl. they were invented to do exactly what you claim is important and
they never proved to be so much faster. they only caused nasty hard to
fix bugs when they were used outside their very tiny design envelope. i
would wager that it is how you used the hashes that cause your
problems. this goes again to a better design and not avoiding hashes.

there are so many other ways to optimize programs. i would first do a
full design and code review and then see what comes out of that. then
you can run profiling on the programs and focus on speeding up only the
largest cpu hogs and not all those little things you seem to mention. i
would prefer to waste some cpu in order to have better and more
maintainable code. if you really need speed, use xs not perl.

if you a want a proper design and/or code review, let me know.

uri
 
R

robic0

MD> Do you have a moment to elaborate on this?

assuming the output list will be a similar or the same size as the input
list, map will allocate a list on the stack to hold its generated list
and this can be very expensive when you approach the max amount of ram
you have. even with shorter lists, there are times where map will take
longer than the equivilent explicit for/push loop (which is really what
map is). a for loop with a block only enters it once but a map block is
entered in each iteration so that can affect things too. my real point
is that choosing map vs for based on speed is wrong minded. you choose
them based on do you want a generated list from read only input by using
map or do you want r/w access with no direct output by use for modifier
or for loop.

the semantics of the code and telling the maintainer what is really
going on (output or not, read only vs rw) vs whatever minor speed wins
one may have over the other. if you don't need to generate output, then
for should win on speed too. if you generate output, map will usually
win over calling push inside a for (the push is more perl ops than the
single call to map. op calls are the best rule of thumb benchmark
guessing but it can be wrong too).

MD> What were pseudohashes?

you don't want to know. they are well documented but i think finally
deprecated in the 5.8.* perls and i believe they will be deleted from
5.10. look for them in the docs in any older perl and don't say i didn't
warn you. pseudohashes were a horrible idea to somehow support symbolic
access to object attributes but with the speed of arrays. it tended to
make munging the attribute names a pain and it wasn't really faster
enough to be worth the other pains.

and i skimmed some of the makepp code (the main script) and i can see
many ways to speed it up. and its design and code could be improved a
good deal IMNSHO. a quick hit which won't save much speed but annoys the
hell out of me. there are several long if/else/elsif sections which do
option parsing. if an option passes just do a return or next or last
inside the block. then each section becomes a simple if with no
else. this is much cleaner and easier to maintain.

elsif (/^-k$/ || /^--keep[-_]going$/) {
$MakeEvent::exit_on_error = 0;
$keep_going = 1;
}
elsif (/^--log(?:=(.*))?$/) { # Log file specified?
$logfile = $1 || shift @ARGV;
}
elsif (/^-n$/ || /^--just[-_]print$/ || /^--dry[-_]run$/ || /^--recon$/) {
$dry_run = 1;
}
elsif (/^--no[-_]?log$/) {
$log_level = 0; # Turn off logging.
}



if (/^-k$/ || /^--keep[-_]going$/) {
$MakeEvent::exit_on_error = 0;
$keep_going = 1;
next ;
}

if (/^--log(?:=(.*))?$/) { # Log file specified?
$logfile = $1 || shift @ARGV;
next ;
}

if (/^-n$/ || /^--just[-_]print$/ || /^--dry[-_]run$/ || /^--recon$/) {
$dry_run = 1;
next ;

}

if (/^--no[-_]?log$/) {
$log_level = 0; # Turn off logging.
next ;
}

with the nexts, you don't have to scan to the bottom of the elsifs to
see what happens there. most likely it will be the end of the loop so
the next tells you that right there. also each if is the same. you don't
have to make many elsif's and the single final else. you can switch the
order of option checking with no editing of the whole block.

the massive use of globals, no strict or warnings are also major
problems with this code. in their pursuit of speed they dropped many
quality software engineering practices. IMO they can have had speed and
better code with a better design. minor speedups like some of the ones
advocated aren't worth the loss of code quality.

uri

Don't believe a word this guy says, his knowledge of Perl does not
justify his mesianic complex. He knows little about whatever the subject.
Search for his replies to RXParse module posted here and you will see that
he is a total sham!

robic0 @ yahoo.com
 
U

Uri Guttman

MD> Do you have a moment to elaborate on this?

assuming the output list will be a similar or the same size as the input
list, map will allocate a list on the stack to hold its generated list
and this can be very expensive when you approach the max amount of ram
you have. even with shorter lists, there are times where map will take
longer than the equivilent explicit for/push loop (which is really what
map is). a for loop with a block only enters it once but a map block is
entered in each iteration so that can affect things too. my real point
is that choosing map vs for based on speed is wrong minded. you choose
them based on do you want a generated list from read only input by using
map or do you want r/w access with no direct output by use for modifier
or for loop.

the semantics of the code and telling the maintainer what is really
going on (output or not, read only vs rw) vs whatever minor speed wins
one may have over the other. if you don't need to generate output, then
for should win on speed too. if you generate output, map will usually
win over calling push inside a for (the push is more perl ops than the
single call to map. op calls are the best rule of thumb benchmark
guessing but it can be wrong too).

MD> What were pseudohashes?

you don't want to know. they are well documented but i think finally
deprecated in the 5.8.* perls and i believe they will be deleted from
5.10. look for them in the docs in any older perl and don't say i didn't
warn you. pseudohashes were a horrible idea to somehow support symbolic
access to object attributes but with the speed of arrays. it tended to
make munging the attribute names a pain and it wasn't really faster
enough to be worth the other pains.

and i skimmed some of the makepp code (the main script) and i can see
many ways to speed it up. and its design and code could be improved a
good deal IMNSHO. a quick hit which won't save much speed but annoys the
hell out of me. there are several long if/else/elsif sections which do
option parsing. if an option passes just do a return or next or last
inside the block. then each section becomes a simple if with no
else. this is much cleaner and easier to maintain.

elsif (/^-k$/ || /^--keep[-_]going$/) {
$MakeEvent::exit_on_error = 0;
$keep_going = 1;
}
elsif (/^--log(?:=(.*))?$/) { # Log file specified?
$logfile = $1 || shift @ARGV;
}
elsif (/^-n$/ || /^--just[-_]print$/ || /^--dry[-_]run$/ || /^--recon$/) {
$dry_run = 1;
}
elsif (/^--no[-_]?log$/) {
$log_level = 0; # Turn off logging.
}



if (/^-k$/ || /^--keep[-_]going$/) {
$MakeEvent::exit_on_error = 0;
$keep_going = 1;
next ;
}

if (/^--log(?:=(.*))?$/) { # Log file specified?
$logfile = $1 || shift @ARGV;
next ;
}

if (/^-n$/ || /^--just[-_]print$/ || /^--dry[-_]run$/ || /^--recon$/) {
$dry_run = 1;
next ;

}

if (/^--no[-_]?log$/) {
$log_level = 0; # Turn off logging.
next ;
}

with the nexts, you don't have to scan to the bottom of the elsifs to
see what happens there. most likely it will be the end of the loop so
the next tells you that right there. also each if is the same. you don't
have to make many elsif's and the single final else. you can switch the
order of option checking with no editing of the whole block.

the massive use of globals, no strict or warnings are also major
problems with this code. in their pursuit of speed they dropped many
quality software engineering practices. IMO they can have had speed and
better code with a better design. minor speedups like some of the ones
advocated aren't worth the loss of code quality.

uri
 
R

robic0

MD> I'm horrified to discover that you think I'm the OP. I am simply
MD> curious about the differences between map vs. foreach and would never
MD> be so misguided as to recommend some of the bizzarities he does.

i never assumed you were the OP. i was just emphasizing my points about
his perl speed page and what it says about map/for.
More 'My Dicks Bigger Than His/Yours' by Uri Guttman.......
 
U

Uri Guttman

MD> I'm horrified to discover that you think I'm the OP. I am simply
MD> curious about the differences between map vs. foreach and would never
MD> be so misguided as to recommend some of the bizzarities he does.

i never assumed you were the OP. i was just emphasizing my points about
his perl speed page and what it says about map/for.

uri
 
R

robic0

[A complimentary Cc of this posting was sent to
Uri Guttman
loops. and i believe a basic sv can store an integer or undef (a flag)
using the same space.

undef may use smaller space than 0;

This alone makes you a candidate for the nut-house !!!!!!!!!!!!

[snip]
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
Uri Guttman
loops. and i believe a basic sv can store an integer or undef (a flag)
using the same space.

undef may use smaller space than 0; but only if the value was never
intialized (even clearing with undef $var won't help).
avoiding hashes is a very foolish thing to say. arrays as objects are
not much faster at all.

BS. IIRC, access to arrays is about 2.5 times faster. (Due to bugs,
this is not applicable to lexical arrays.)
there are so many other ways to optimize programs. i would first do a
full design and code review and then see what comes out of that. then
you can run profiling on the programs and focus on speeding up only the
largest cpu hogs and not all those little things you seem to
mention.

Except that IN SOME SITUATIONS all these "little things" may be the
"largest CPU hogs".
i would prefer to waste some cpu in order to have better and more
maintainable code.

Sometimes CPU is more important (but [fortunately] this situation is
rare with Perl code).

Hope this helps,
Ilya
 
R

robic0

(e-mail address removed) wrote in @i40g2000cwc.googlegroups.com:


Funny,

http://sourceforge.net/project/showfiles.php?group_id=43679

Only available version is 1.40.1


A little arrogant, aren't we?

<blockquote>
Time your solution
A loop like

time perl -Mstrict -we 'my <initialization>; for( 0..999_999 ) {
<code> }'
</blockquote>

How about?

use Benchmark;


You propose that I stop using all the things that make Perl useful to
me: No hashes, no strings, no objects.

I honestly do not know how to write a make utility: I have a feeling one
needs a fair bit of Graph Theory to get it right, but I might be wrong.

But, in the applications I have written, I have never bothered to
measure how long a map took versus a for: I used whatever was readable
and appropriate. The greatest improvements in my code came from
improvements in algorithms.

Sinan

Hey, psuedo code with annotation. You are the most full-of-shit poster
on this news group. The arrogance stated is surely suplanted by your
overpowering ingnorance and utterly, lack of code production that can
either help this individual or produce an module on cpan that can.
If the cpan module don't work, then shut the motherfucking up.
Oh, when ur unemployed like me, time is a realative thing.....
 
A

A. Sinan Unur

(e-mail address removed) wrote in @i40g2000cwc.googlegroups.com:

Funny,

http://sourceforge.net/project/showfiles.php?group_id=43679

Only available version is 1.40.1
This should be required reading for all Perl hackers,

A little arrogant, aren't we?

<blockquote>
Time your solution
A loop like

time perl -Mstrict -we 'my <initialization>; for( 0..999_999 ) {
<code> }'
</blockquote>

How about?

use Benchmark;
still use it. No flame war -- objective discussion only, please!

You propose that I stop using all the things that make Perl useful to
me: No hashes, no strings, no objects.

I honestly do not know how to write a make utility: I have a feeling one
needs a fair bit of Graph Theory to get it right, but I might be wrong.

But, in the applications I have written, I have never bothered to
measure how long a map took versus a for: I used whatever was readable
and appropriate. The greatest improvements in my code came from
improvements in algorithms.

Sinan

--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
 
R

robic0

(e-mail address removed) wrote in @i40g2000cwc.googlegroups.com:


Funny,

http://sourceforge.net/project/showfiles.php?group_id=43679

Only available version is 1.40.1


A little arrogant, aren't we?

<blockquote>
Time your solution
A loop like

time perl -Mstrict -we 'my <initialization>; for( 0..999_999 ) {
<code> }'
</blockquote>

How about?

use Benchmark;


You propose that I stop using all the things that make Perl useful to
me: No hashes, no strings, no objects.

I also propose that you stop using you brain, which is of no use and
failed you just after birth (yea, you know what I'm talking about).....
I honestly do not know how to write a make utility: I have a feeling one
needs a fair bit of Graph Theory to get it right, but I might be wrong.
Its a wonder that you can sucessfully wipe your ass, or does your mother
do that for you?
But, in the applications I have written, I have never bothered to
measure how long a map took versus a for: I used whatever was readable
and appropriate. The greatest improvements in my code came from
improvements in algorithms.
You have never ever written an application. That word is reserved for
programmers !!!
 
R

robic0

I also propose that you stop using you brain, which is of no use and
failed you just after birth (yea, you know what I'm talking about).....
Its a wonder that you can sucessfully wipe your ass, or does your mother
do that for you?

You have never ever written an application. That word is reserved for
programmers !!!

1-liners don't make a dent in 8,000 lines of a Perl application !
The sooner you concede that, the sooner you may smell fresh oxygen
outside of the womb of your rectum !!
 
A

Ala Qumsieh

This should be required reading for all Perl hackers, including those
who write the alas sometimes suboptimal documentation.

Anybody can, and is even encouraged to, submit patches to the perldocs.
If you feel something is suboptimal, fix it, and submit a patch.
I realize that the findings don't always reflect conventional wisdom.
But Perl doesn't lie. If one technique is measurably slower than
another, there must be a compelling reason, beyond personal taste, to
still use it. No flame war -- objective discussion only, please!

Here are my comments:

- Measure, don't guess:

I agree about profiling. I would use the Benchmark module, though.

- Regexps:

Good points all round.

- Functions:

Object orientation isn't such a bad thing all the time. The overhead
taken by method lookup can be greatly offset by code clarity and
maintainability. You just need to know when to go OO and when not to.

If you are really anal about the amount of time it takes to shift() an
element off of @_, as opposed to access it directly, then you shouldn't
be working with Perl. Such trivial "optimizations" will not result in
any substantial speedup.

About using fewer functions, then my advice is to be logical about it. I
use functions to group a chunk of related and re-usable code that has a
defined functionality (hence the word "function"). It makes things
easier and clearer. For example, I would have two different functions to
calculate the area and center of mass of a polygon. I wouldn't combine
both into the same function, even though I might be always calling them
as a pair. It help maintainability, and doesn't really add that much
overhead.

- Miscellaneous

Avoiding hashes is the silliest thing you can say to a Perl programmer.
Sometimes, it is easier to use an array, but other times it is best to
use hashes. The correct advice is to not always use hashes, but use
whatever is best for the task.

Same thing with strings.

for() loops have side effects that are different from map()'s. Again,
use whatever is best for the task.

Overall, there is some good advice, but it seems that you are solving
the wrong problems by focusing on trivial "optimizations" that can have
bad consequences in the future. It is best to find your real
bottlenecks, and revise your slow algorithms instead.

--Ala
 
O

occitan

Hi,

I mean when for and map look equally well suited for the problem at
hand. Only trying to code it both ways and measuring the impact
counts.

I tried pseudohashes for our big central data tree, and makepp became
30% slower. Their promise of compile-time verified class fields is
nonsense, because at compile time it is mostly not known what class my
$o->{FIELD} should be looked up in.

I am talking about plain arrays, which as Ilya confirmed, are clearly
faster.

We try to use strict and warnings in all modules, but I guess someone
should sit down and check where that is missing.

That option handling has long been on my sights for rewriting. But it
happens only once at startup (and again for old fashioned recursive
makefiles -- for which makepp has a far more elegant alternative), so
it's not a top priority.

best regards
Daniel
 
A

Anno Siegel

Ilya Zakharevich said:
[A complimentary Cc of this posting was sent to
Uri Guttman
[...]
avoiding hashes is a very foolish thing to say. arrays as objects are
not much faster at all.

BS. IIRC, access to arrays is about 2.5 times faster. (Due to bugs,
this is not applicable to lexical arrays.)

This is true for array- vs. hash access itself, but Uri was talking of
arrays vs. hashes as objects. Data access itself usually only accounts
for a small part of an object's performance. Switching from hashes to
arrays in a class implementation will make the class only slightly faster,
certainly not by a factor of 2.5.

Anno
 
D

Dr.Ruud

Uri Guttman schreef:
[makepp]
elsif (/^-k$/ || /^--keep[-_]going$/) {
$MakeEvent::exit_on_error = 0;
$keep_going = 1;
}
elsif (/^--log(?:=(.*))?$/) { # Log file specified?
$logfile = $1 || shift @ARGV;
}
elsif (/^-n$/ || /^--just[-_]print$/ || /^--dry[-_]run$/ ||
/^--recon$/) { $dry_run = 1;
}
elsif (/^--no[-_]?log$/) {
$log_level = 0; # Turn off logging.
}

When I see code like that, I would think:

@cfg =
( [ qr/^ (?: -k | --keep[-_]going ) $/x
, { $MakeEvent::exit_on_error = 0;
$keep_going = 1 }
]
, [ qr/^ --log (?:=(.*))? $/x # Log file specified?
, { $logfile = $1 || shift @ARGV; }
]
, [ qr/^ (?: -n | --just[-_]print | --dry[-_]run | --recon ) $/x
, { $dry_run = 1 }
]
, [ qr/^ --no [-_]? log $/x # Turn off logging.
, { $log_level = 0 }
]
);

for a moment, and then would think about more layers:

{
/^ (?: -k | --keep[-_]going ) $/x
and $cgf{'keepgoing'} = 1;

/^ --log (?:=(.*))? $/x
and $cfg('logfile' } = ($1 || shift
@ARGV);

/^ (?: -n | --just[-_]print | --dry[-_]run | --recon ) $/x
and $cfg{'dryrun' } = 1;

/^ --no [-_]? log $/x
and $cfg{'loglevel' } = 0;
}

and then would go and use
http://search.cpan.org/search?module=Getopt::Long

This is all code that runs only once, at the start, so one should format
it as readable and maintainable as human(e)ly possible.
 
P

Peter J. Holzer

Uri said:
o> http://makepp.sourceforge.net/1.50/perl_performance.html

o> This should be required reading for all Perl hackers, including those
o> who write the alas sometimes suboptimal documentation.

I agree with most of what you wrote but I have to comment on this:
who in their right mind uses lots of bit flags for booleans in perl? in
c it makes sense for packing reasons but with perl that won't save
storage since SV's all take up overhead way beyond a few bits.

That's an excellent reason to use bit flags. In C the smallest object
type is a char, which is 8 bits on most platforms. So if you
store each flag in a char, you waste only 7 bits per flag. In perl you
need an SV, which took about 40 bytes on average last time I looked.
Typically you store your flags in a hash, which adds more overhead, so
you may waste on the order of 1000 bits for each bit of information.
Not a problem if you have a few thousand flags, but if you have a few
million, it adds up.

hp
 
P

Peter J. Holzer

Uri said:
there are several long if/else/elsif sections which do option parsing.
if an option passes just do a return or next or last inside the block.
then each section becomes a simple if with no else. this is much
cleaner and easier to maintain.

elsif (/^-k$/ || /^--keep[-_]going$/) {
$MakeEvent::exit_on_error = 0;
$keep_going = 1;
}
elsif (/^--log(?:=(.*))?$/) { # Log file specified?
$logfile = $1 || shift @ARGV;
} [...]
if (/^-k$/ || /^--keep[-_]going$/) {
$MakeEvent::exit_on_error = 0;
$keep_going = 1;
next ;
}

if (/^--log(?:=(.*))?$/) { # Log file specified?
$logfile = $1 || shift @ARGV;
next ;
}
[...]

Even cleaner would be a lookup table:

sub set_keep_going {
$MakeEvent::exit_on_error = 0;
$keep_going = 1;
next ;
}

sub set_log {
$logfile = $_[0] || shift @ARGV;
next ;
}
....

%opts = (
'-k' => \&set_keep_going, # or use anonymous subs
'--keep_going' => \&set_keep_going,
'--keep-going' => \&set_keep_going,
'--log' => \&set_log,
....
)

my ($opt, $arg);
($opt, $arg) = /^(-[^=]*)=(.*)/ or
($opt) = /^(-.*)/;

if (!defined($opt)) {
# no option
} elsif ($opts{$opt}) {
$opts{$opt}->($arg);
} else {
# invalid option
}
}

Although in the case of command line options, it is probably best to
just use Getopt::Long instead of reinventing the wheel.

hp
 
U

Uri Guttman

o> This should be required reading for all Perl hackers, including those
o> who write the alas sometimes suboptimal documentation.

PJH> I agree with most of what you wrote but I have to comment on this:
PJH> That's an excellent reason to use bit flags. In C the smallest object
PJH> type is a char, which is 8 bits on most platforms. So if you
PJH> store each flag in a char, you waste only 7 bits per flag. In perl you
PJH> need an SV, which took about 40 bytes on average last time I looked.
PJH> Typically you store your flags in a hash, which adds more overhead, so
PJH> you may waste on the order of 1000 bits for each bit of information.
PJH> Not a problem if you have a few thousand flags, but if you have a few
PJH> million, it adds up.

my point is that needing large numbers of bit flags says that the design
is flawed. this doesn't cover the case where you are munging large
numbers of boolean flags for statistics or related work (and those apps
should use pdl or bit::vector). but for a reasonable sized set of flags,
it is better to use a simpler coding technique than to deal with the
trickiness of bit sized flags. using bits for a small number of flags is
micro-optimization and shouldn't be done unless the app is ridiculously
tight on space. and i bet even then there will be much better place to
save on storage than with the bits.

uri
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top