$& imposes a considerable performance penalty they say

D

Dan Jacobson

$ man perlvar
$& The string matched by the last successful pattern match...
The use of this variable anywhere in a program imposes a con-
siderable performance penalty on all regular expression
matches. See "BUGS".
$ time echo x|perl -wpe 's/(x)/a$1y/'
axy
real 0m0.011s
user 0m0.003s
sys 0m0.004s
$ time echo x|perl -wpe 's/x/a$&y/'
axy
real 0m0.007s
user 0m0.001s
sys 0m0.006s

I'm not sure which of the times means money, but if it is real, then
what's the deal?
 
G

Gunnar Hjalmarsson

Dan said:
$ man perlvar
$& The string matched by the last successful pattern match...
The use of this variable anywhere in a program imposes a con-
siderable performance penalty on all regular expression
matches. See "BUGS".
$ time echo x|perl -wpe 's/(x)/a$1y/'
axy
real 0m0.011s
user 0m0.003s
sys 0m0.004s
$ time echo x|perl -wpe 's/x/a$&y/'
axy
real 0m0.007s
user 0m0.001s
sys 0m0.006s

I'm not sure which of the times means money, but if it is real, then
what's the deal?

Even if I have never tried to quantify the claimed performance penalty
caused by $&, I realize that your above examples are not sufficient for
drawing any conclusions. The point, if I have understood it correctly,
is that the use of $& *once* enables capturing for *all* regular
expressions in the program, also those without capturing parentheses or
capturing through $&.
 
U

Uri Guttman

GH> Even if I have never tried to quantify the claimed performance
GH> penalty caused by $&, I realize that your above examples are not
GH> sufficient for drawing any conclusions. The point, if I have
GH> understood it correctly, is that the use of $& *once* enables
GH> capturing for *all* regular expressions in the program, also those
GH> without capturing parentheses or capturing through $&.

to clarify that, $& is a way to capture the entire match. it is similar
to enclosing the regex in () and using $1. so by itself it is useful
(golfers like it :). but in order to work properly it has a global side
effect. since it always has the full match from the last regex, and it
is a global var, if you use it once ANYWHERE in your code, the matched
string (btw, this really only matters with s/// since it can change the
original string) must be copied for all s/// even if you don't have any
capturing parens. so in general, don't use it, use explicit capturing
parens which will only cause the s/// with them to copy the original
string.

the OP's wimpy test didn't even come close to showing this issue. it
would need to be something which did s/// without capturing and either
$& being mentioned or not. and it would need many more runs than 1 to
show the difference. of course benchmark.pm is the way to do that as
timing a script will show nothing but compiler time and has no accuracy
at the required level.

uri
 
E

Eric Schwartz

Uri Guttman said:
so in general, don't use [$&], use explicit capturing
parens which will only cause the s/// with them to copy the original
string.

I don't have such an old perl to hand, but perlre points out that:

As of 5.005, $& is not so costly as the other two.

(meaning $' and $`)

How much less costly is it?

As a side note: Thanks to Abigail, mostly, one alteration I've made to
my personal programming practises lately is that I've started using
things like $&, shelling out, etc., more often in cases where the code
isn't time-critical (which is, frankly, most of the time). I've found
that it will often save me mental effort time, and in many cases makes
the code clearer than a more conventional approach might dictate.

Recently, for instance, I replaced a shell script that examined a
Linux system, and printed out what cards it thought were in which
slots, with a Perl program that does all sorts of conventionally 'bad'
things, like using $&, lots of `find -name ... | grep | sort -u`, and
the like because I was trying, as much as possible, to stick with the
logic of the shell script, and I figured "Heck, I'll optimize it
later, and pass around arrayrefs instead of calling `lspci`
everywhere, and use File::Find, and stop with the $&."

Before I even got around to it, I ran some benchmarks, and I still cut
down the average run time from 10 seconds to 3, so I give myself a
free pass for using those constructs in that context. I realize that
is not disagreeing with you, just that sometimes, the performance hit
of using $&, or shelling out even when there's a perfectly good module
available, isn't significant.

My advice would be to use them wherever you like, but be aware that
they can indeed cause performance problems. Even so, I'd still
profile your program before rushing to those as the first cure to poor
performance-- you may well find, as I have, that poor algorithms or
inefficient data structures are far more detrimental to your program's
run than $& could ever be.

-=Eric
 
A

Anno Siegel

Uri Guttman said:
GH> Even if I have never tried to quantify the claimed performance
GH> penalty caused by $&, I realize that your above examples are not
GH> sufficient for drawing any conclusions. The point, if I have
GH> understood it correctly, is that the use of $& *once* enables
GH> capturing for *all* regular expressions in the program, also those
GH> without capturing parentheses or capturing through $&.

to clarify that, $& is a way to capture the entire match. it is similar
to enclosing the regex in () and using $1. so by itself it is useful
(golfers like it :). but in order to work properly it has a global side
effect. since it always has the full match from the last regex, and it
is a global var, if you use it once ANYWHERE in your code, the matched
string (btw, this really only matters with s/// since it can change the
original string) must be copied for all s/// even if you don't have any
capturing parens. so in general, don't use it, use explicit capturing
parens which will only cause the s/// with them to copy the original
string.

the OP's wimpy test didn't even come close to showing this issue. it

Here's a similarly wimpy test that does show the difference:

time perl -e '$_ = "x" x 10_000; $1 while /(x)/g'
0.290u 0.030s 0:00.32 100.0%

time perl -e '$_ = "x" x 10_000; $& while /(x)/g'
2.910u 0.030s 0:02.96 99.3%

You want a long string to match over to see the difference. The
point is that after use of $&, all of $`, $& and $' are active, and
so the whole string is copied on every match, as opposed to only
the match itself with "()".

Some weeks ago we had a case here where someone did that with a
multi-gigabyte string...

Anno
 
U

Uri Guttman

AS> Here's a similarly wimpy test that does show the difference:

AS> time perl -e '$_ = "x" x 10_000; $1 while /(x)/g'
AS> 0.290u 0.030s 0:00.32 100.0%

AS> time perl -e '$_ = "x" x 10_000; $& while /(x)/g'
AS> 2.910u 0.030s 0:02.96 99.3%

AS> You want a long string to match over to see the difference. The
AS> point is that after use of $&, all of $`, $& and $' are active, and
AS> so the whole string is copied on every match, as opposed to only
AS> the match itself with "()".

try some minor changes. move the $& to somewhere else and use $1 in both
cases. that will show its global nature. and another variant would be to
not even grab when using $& and it will also do a full copy.

AS> Some weeks ago we had a case here where someone did that with a
AS> multi-gigabyte string...

yow!

uri
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,905
Latest member
Kristy_Poole

Latest Threads

Top