$& imposes a considerable performance penalty they say

Discussion in 'Perl Misc' started by Dan Jacobson, Nov 5, 2004.

  1. Dan Jacobson

    Dan Jacobson Guest

    $ man perlvar
    $& The string matched by the last successful pattern match...
    The use of this variable anywhere in a program imposes a con-
    siderable performance penalty on all regular expression
    matches. See "BUGS".
    $ time echo x|perl -wpe 's/(x)/a$1y/'
    axy
    real 0m0.011s
    user 0m0.003s
    sys 0m0.004s
    $ time echo x|perl -wpe 's/x/a$&y/'
    axy
    real 0m0.007s
    user 0m0.001s
    sys 0m0.006s

    I'm not sure which of the times means money, but if it is real, then
    what's the deal?
    Dan Jacobson, Nov 5, 2004
    #1
    1. Advertising

  2. Dan Jacobson wrote:
    > $ man perlvar
    > $& The string matched by the last successful pattern match...
    > The use of this variable anywhere in a program imposes a con-
    > siderable performance penalty on all regular expression
    > matches. See "BUGS".
    > $ time echo x|perl -wpe 's/(x)/a$1y/'
    > axy
    > real 0m0.011s
    > user 0m0.003s
    > sys 0m0.004s
    > $ time echo x|perl -wpe 's/x/a$&y/'
    > axy
    > real 0m0.007s
    > user 0m0.001s
    > sys 0m0.006s
    >
    > I'm not sure which of the times means money, but if it is real, then
    > what's the deal?


    Even if I have never tried to quantify the claimed performance penalty
    caused by $&, I realize that your above examples are not sufficient for
    drawing any conclusions. The point, if I have understood it correctly,
    is that the use of $& *once* enables capturing for *all* regular
    expressions in the program, also those without capturing parentheses or
    capturing through $&.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Nov 5, 2004
    #2
    1. Advertising

  3. Dan Jacobson

    Uri Guttman Guest

    >>>>> "GH" == Gunnar Hjalmarsson <> writes:

    GH> Even if I have never tried to quantify the claimed performance
    GH> penalty caused by $&, I realize that your above examples are not
    GH> sufficient for drawing any conclusions. The point, if I have
    GH> understood it correctly, is that the use of $& *once* enables
    GH> capturing for *all* regular expressions in the program, also those
    GH> without capturing parentheses or capturing through $&.

    to clarify that, $& is a way to capture the entire match. it is similar
    to enclosing the regex in () and using $1. so by itself it is useful
    (golfers like it :). but in order to work properly it has a global side
    effect. since it always has the full match from the last regex, and it
    is a global var, if you use it once ANYWHERE in your code, the matched
    string (btw, this really only matters with s/// since it can change the
    original string) must be copied for all s/// even if you don't have any
    capturing parens. so in general, don't use it, use explicit capturing
    parens which will only cause the s/// with them to copy the original
    string.

    the OP's wimpy test didn't even come close to showing this issue. it
    would need to be something which did s/// without capturing and either
    $& being mentioned or not. and it would need many more runs than 1 to
    show the difference. of course benchmark.pm is the way to do that as
    timing a script will show nothing but compiler time and has no accuracy
    at the required level.

    uri
    Uri Guttman, Nov 5, 2004
    #3
  4. Uri Guttman <> writes:
    > so in general, don't use [$&], use explicit capturing
    > parens which will only cause the s/// with them to copy the original
    > string.


    I don't have such an old perl to hand, but perlre points out that:

    As of 5.005, $& is not so costly as the other two.

    (meaning $' and $`)

    How much less costly is it?

    As a side note: Thanks to Abigail, mostly, one alteration I've made to
    my personal programming practises lately is that I've started using
    things like $&, shelling out, etc., more often in cases where the code
    isn't time-critical (which is, frankly, most of the time). I've found
    that it will often save me mental effort time, and in many cases makes
    the code clearer than a more conventional approach might dictate.

    Recently, for instance, I replaced a shell script that examined a
    Linux system, and printed out what cards it thought were in which
    slots, with a Perl program that does all sorts of conventionally 'bad'
    things, like using $&, lots of `find -name ... | grep | sort -u`, and
    the like because I was trying, as much as possible, to stick with the
    logic of the shell script, and I figured "Heck, I'll optimize it
    later, and pass around arrayrefs instead of calling `lspci`
    everywhere, and use File::Find, and stop with the $&."

    Before I even got around to it, I ran some benchmarks, and I still cut
    down the average run time from 10 seconds to 3, so I give myself a
    free pass for using those constructs in that context. I realize that
    is not disagreeing with you, just that sometimes, the performance hit
    of using $&, or shelling out even when there's a perfectly good module
    available, isn't significant.

    My advice would be to use them wherever you like, but be aware that
    they can indeed cause performance problems. Even so, I'd still
    profile your program before rushing to those as the first cure to poor
    performance-- you may well find, as I have, that poor algorithms or
    inefficient data structures are far more detrimental to your program's
    run than $& could ever be.

    -=Eric
    --
    Come to think of it, there are already a million monkeys on a million
    typewriters, and Usenet is NOTHING like Shakespeare.
    -- Blair Houghton.
    Eric Schwartz, Nov 5, 2004
    #4
  5. Dan Jacobson

    Anno Siegel Guest

    Uri Guttman <> wrote in comp.lang.perl.misc:
    > >>>>> "GH" == Gunnar Hjalmarsson <> writes:

    >
    > GH> Even if I have never tried to quantify the claimed performance
    > GH> penalty caused by $&, I realize that your above examples are not
    > GH> sufficient for drawing any conclusions. The point, if I have
    > GH> understood it correctly, is that the use of $& *once* enables
    > GH> capturing for *all* regular expressions in the program, also those
    > GH> without capturing parentheses or capturing through $&.
    >
    > to clarify that, $& is a way to capture the entire match. it is similar
    > to enclosing the regex in () and using $1. so by itself it is useful
    > (golfers like it :). but in order to work properly it has a global side
    > effect. since it always has the full match from the last regex, and it
    > is a global var, if you use it once ANYWHERE in your code, the matched
    > string (btw, this really only matters with s/// since it can change the
    > original string) must be copied for all s/// even if you don't have any
    > capturing parens. so in general, don't use it, use explicit capturing
    > parens which will only cause the s/// with them to copy the original
    > string.
    >
    > the OP's wimpy test didn't even come close to showing this issue. it


    Here's a similarly wimpy test that does show the difference:

    time perl -e '$_ = "x" x 10_000; $1 while /(x)/g'
    0.290u 0.030s 0:00.32 100.0%

    time perl -e '$_ = "x" x 10_000; $& while /(x)/g'
    2.910u 0.030s 0:02.96 99.3%

    You want a long string to match over to see the difference. The
    point is that after use of $&, all of $`, $& and $' are active, and
    so the whole string is copied on every match, as opposed to only
    the match itself with "()".

    Some weeks ago we had a case here where someone did that with a
    multi-gigabyte string...

    Anno
    Anno Siegel, Nov 9, 2004
    #5
  6. Dan Jacobson

    Uri Guttman Guest

    >>>>> "AS" == Anno Siegel <-berlin.de> writes:

    >> the OP's wimpy test didn't even come close to showing this issue. it


    AS> Here's a similarly wimpy test that does show the difference:

    AS> time perl -e '$_ = "x" x 10_000; $1 while /(x)/g'
    AS> 0.290u 0.030s 0:00.32 100.0%

    AS> time perl -e '$_ = "x" x 10_000; $& while /(x)/g'
    AS> 2.910u 0.030s 0:02.96 99.3%

    AS> You want a long string to match over to see the difference. The
    AS> point is that after use of $&, all of $`, $& and $' are active, and
    AS> so the whole string is copied on every match, as opposed to only
    AS> the match itself with "()".

    try some minor changes. move the $& to somewhere else and use $1 in both
    cases. that will show its global nature. and another variant would be to
    not even grab when using $& and it will also do a full copy.

    AS> Some weeks ago we had a case here where someone did that with a
    AS> multi-gigabyte string...

    yow!

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
    Uri Guttman, Nov 9, 2004
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Peter Bär
    Replies:
    2
    Views:
    412
    Yan-Hong Huang[MSFT]
    Jul 18, 2003
  2. Michael Andersson

    Exceptions performance penalty

    Michael Andersson, Sep 2, 2003, in forum: C++
    Replies:
    7
    Views:
    543
    Oliver S.
    Sep 3, 2003
  3. Yuri Victorovich

    Performance penalty for encapsulations ??

    Yuri Victorovich, Sep 6, 2003, in forum: C++
    Replies:
    1
    Views:
    326
    Kevin Goodsell
    Sep 6, 2003
  4. Sune
    Replies:
    2
    Views:
    325
    Martin Wells
    Oct 2, 2007
  5. David A. Black
    Replies:
    2
    Views:
    213
    Tim Hunter
    Aug 19, 2004
Loading...

Share This Page