Optimization request

G

Guru03

For you, it's faster this:

s/>/>/g;

or this:

s/>/>/g if index ($_, '>');

and why?
 
D

David K. Wall

Guru03 said:
For you, it's faster this:

s/>/>/g;

or this:

s/>/>/g if index ($_, '>');

and why?

Use the first one, no need to search the string twice.

"Premature optimization is the root of all evil."
 
T

Tore Aursand

For you, it's faster this:

s/>/>/g;

or this:

s/>/>/g if index ($_, '>');

and why?

Isn't it obvious that the first one is faster? Your last line of code
requires Perl to do _two_ things, while the first one only do one thing;

#!/usr/bin/perl
#
use strict;
use warnings;
use Benchmark qw( timethese );

timethese(1_000_000, {
'Method #1' => \&method_1,
'Method #2' => \&method_2,
});

sub method_1 {
$html =~ s,<,&lt;,g;
$html =~ s,>,&gt;,g;
}

sub method_2 {
$html =~ s,<,&lt;, if ( index($html, '<') );
$html =~ s,>,&lt;, if ( index($html, '>') );
}

This gives me the following:

Benchmark: timing 1000000 iterations of Method #1, Method #2...
Method #1: 2 wallclock secs (1.43 usr + 0.00 sys = 1.43 CPU) @ 699300.70/s
Method #2: 3 wallclock secs (2.73 usr + 0.00 sys = 2.73 CPU) @ 366300.37/s
 
A

Aaron Sherman

David K. Wall said:
Guru03 said:
For you, it's faster this:
[...]
Use the first one, no need to search the string twice.

Something to keep in mind is that a) Perl doesn't always work that way
and b) when you answer someone's question, you might consider
answering the one they asked.

He didn't ask "which of these would you use", he asked (in admittedly
broken english, but still quite comprehensibly) which of them was
faster.

You can't answer that qualitatively.

Of course, others have demonstrated the flaw in the example, but this
is an important thing to keep your eye on with Perl. You could have a
case where the obvious simply isn't because perl has hundreds of
special cases that it optimizes for. If you happen to trigger such an
optimization with a complex, seemingly slow expression and not with a
simpler one... well, your results will be surprising, and you might
find yourself coming here and asking "For you, it's faster .... Why?"
 
D

David K. Wall

"David K. Wall" <[email protected]> wrote in message news:
Guru03 said:
For you, it's faster this:
[...]
Use the first one, no need to search the string twice.

Something to keep in mind is that a) Perl doesn't always work that way
and b) when you answer someone's question, you might consider
answering the one they asked.

He didn't ask "which of these would you use", he asked (in admittedly
broken english, but still quite comprehensibly) which of them was
faster.

Sometimes I mistake a poster's intent by reading a question too quickly or too
carelessly, but not this time. Right or wrong, I *was* replying to the question
of which was faster. Might I suggest that you read my post too quickly?
 
E

Eric Bohlman

Isn't it obvious that the first one is faster? Your last line of code
requires Perl to do _two_ things, while the first one only do one
thing;

It's actually not all that obvious, since both methods only require perl to
do *one* thing for inputs that don't contain any &gt;s. Plainly if each
input processed contains one or more &gt;s then your observation will be
correct (assuming there's no possible interaction between the index() and
the substitution). If not, it will depend on whether or not a non-matching
substitution that searches for a simple literal is faster or slower than a
call to index(), and whether or not this difference scales up or down with
the length of the string to be searched.

It's often the case that one method is more efficient in most circumstances
but less efficient in others. For example, the general runtime behavior of
an insertion sort is O(N^2) whereas the general runtime behavior of a
quicksort is O(N log N), but an insertion sort can actually run faster in
the case (which used to be quite common in the days of tapes and cards)
where the data to be sorted consists of a large amount of in-order data
followed by a much smaller amount of unsorted data.
 
A

Aaron Sherman

David K. Wall said:
Might I suggest that you read my post too quickly?

No, I did not. He didn't ask which you would prefer, or how you would
explain a speed difference, he asked which was faster for you, and you
didn't answer that.

I only jump on this because I've asked similar questions over the
years, and I always get people who answer based on gut reaction. With
Perl especially, where the language is defined by a single
implementation with a maze of special-case optimizations, you simply
cannot look at a piece of code and say, "well that one has more steps,
so it's less efficient." It might well work that way in that case, but
unless you run the numbers, you have no real idea.
 
T

thumb_42

Guru03 said:
For you, it's faster this:

s/>/&gt;/g;

or this:

s/>/&gt;/g if index ($_, '>');

and why?

It's going to depend on the application, how large the string is, how
frequently >'s appear in the text, which version of perl you're using
and so forth. If you do it once or twice it'd almost certainly be slower
just because of the time it'd take perl to compile the code.

I believe index() uses the boyer moore(sp?) algorithm to locate strings,
which is faster with longer key strings, but '>' is only one character so
the algorithm isn't going to help. (Does perl detect this and switch to
another algorithm? anyone know?) Not sure what perl's regex handling does,
as far as short-circuit testing, but I suspect it changes slightly with each
perl version. So.. in some machines it may be faster other places it'd be
slower.

If you only do it a few thousand times, it's definately not going to be
worth it. Might also look into the 'o' modifier.

It's a fun thing to think about, but another fun thing to think about is
that, by the time you've read this message, perl could have probably converted
the whole string to an array of characters, iterated over each one replacing >
with '&gt;' several hundred million times on a low end 486.

Jamie
 
D

David K. Wall

Aaron Sherman said:
No, I did not. He didn't ask which you would prefer, or how you would
explain a speed difference, he asked which was faster for you, and you
didn't answer that.

I only jump on this because I've asked similar questions over the
years, and I always get people who answer based on gut reaction. With
Perl especially, where the language is defined by a single
implementation with a maze of special-case optimizations, you simply
cannot look at a piece of code and say, "well that one has more steps,
so it's less efficient." It might well work that way in that case, but
unless you run the numbers, you have no real idea.

I read the OP as asking "which is faster and why?" and responded with what
I thought was correct and why I thought so. It's not the technical
information you posted that I object to. I want corrections to any
technical mistakes I make in my posts.

The reason I responded to your post is because it seemed (and seems) to
criticize me and not my answer to the OP. I guess I tried too hard to be
polite and didn't make that point.

Just drop it, ok? You convinced me the first time, I don't need yet another
sermon.
 
A

Aaron Sherman

David K. Wall said:
The reason I responded to your post is because it seemed (and seems) to
criticize me and not my answer to the OP. I guess I tried too hard to be
polite and didn't make that point.

If you heard me criticize YOU at any point, please quote it. I really
don't know you, so I don't have anything to criticize. Stick to the
topic at hand, please. The topic was requests for performance
information, not infighting.

My distinction was on the difference between "which is faster" and
"which would you expect to be faster". That's all. Nothing personal.
Just drop it, ok? You convinced me the first time, I don't need yet another
sermon.

Ah, the old "this is the last word, so don't reply!" ;-)
 
D

David K. Wall

[posted and mailed]

If you heard me criticize YOU at any point, please quote it.

"b) when you answer someone's question, you might consider
answering the one they asked."

I did try to answer the one that was asked. My answer was wrong, you
corrected me. No problem. If you had simply said "that's not what he
asked" it wouldn't have bothered me. But I did "consider answering" the
question that was asked, otherwise I wouldn't have posted. (Well, unless I
had a related comment, but then I would have indicated in some way that it
was just a comment and not intended to be an answer.)
My distinction was on the difference between "which is faster" and
"which would you expect to be faster". That's all. Nothing personal.

Ok, that's good enough for me. Thank you.
Ah, the old "this is the last word, so don't reply!" ;-)

No, it's just that I tried to tell you that, intentionally or not, you had
offended me, and you responded with another sermon on Perl internals,
something that I had no intention of disputing.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top