Efficiency of s///e?

Discussion in 'Perl Misc' started by Tim McDaniel, May 17, 2013.

  1. Tim McDaniel

    Tim McDaniel Guest

    There's a sub in our code base which has something like

    my $prevent_infinite_loop = 0;
    while ($prevent_infinite_loop++ < 1000 && $text =~ /(complicated) (regular) (expression)/) {
    ... various calculations on $1, $2, $3, ... resulting in $replacement;
    $text =~ s/(complicated) (regular) (expression)/$replacement/;
    # that's exactly the same regular expression as above
    }

    (though I think that, with the particular pattern, an infinite loop is
    impossible.) To add new features and for maintainability,
    I've developed

    $text =~ s{(simpler)}{
    my $found = $1;
    ... various calculations involving split on $found, unshift, ...
    $replacement;
    }eg;

    I'm wondering about the efficiency of this approach, partincularly
    s{}{}e. For instance, is the right-hand side code compiled at run
    time, or at compile time? Any other major concerns? We still use
    Perl 5.8.8 for now, alas.

    --
    Tim McDaniel,
     
    Tim McDaniel, May 17, 2013
    #1
    1. Advertising

  2. On 5/16/2013 8:36 PM, Tim McDaniel wrote:
    > There's a sub in our code base which has something like
    >
    > my $prevent_infinite_loop = 0;
    > while ($prevent_infinite_loop++ < 1000 && $text =~ /(complicated) (regular) (expression)/) {
    > ... various calculations on $1, $2, $3, ... resulting in $replacement;
    > $text =~ s/(complicated) (regular) (expression)/$replacement/;
    > # that's exactly the same regular expression as above
    > }
    >
    > (though I think that, with the particular pattern, an infinite loop is
    > impossible.) To add new features and for maintainability,
    > I've developed
    >
    > $text =~ s{(simpler)}{
    > my $found = $1;
    > ... various calculations involving split on $found, unshift, ...
    > $replacement;
    > }eg;
    >
    > I'm wondering about the efficiency of this approach, partincularly
    > s{}{}e. For instance, is the right-hand side code compiled at run
    > time, or at compile time? Any other major concerns? We still use
    > Perl 5.8.8 for now, alas.
    >


    It's syntax checked and compiled at compile time along with the rest of
    your program.

    The only gotcha IMO is the replacement morphing into a
    long,hard-to-unravel mess that's hard-on-the-eyes and tough to debug.
    Even commented, a big multi-line s/pattern/replacement/ becomes vertigo
    inducing.

    s{ ... }
    { $1 ... # blah
    .... # more blah...
    ...
    ....
    }gex;

    At some point, a plain old "if" block seems better.

    --
    Charles DeRykus
     
    Charles DeRykus, May 17, 2013
    #2
    1. Advertising

  3. Tim McDaniel

    Tim McDaniel Guest

    In article <>,
    Ben Morrow <> wrote:
    >A block like
    >
    > s{...}{
    > # code
    > }gex;


    You don't need the "x" to allow arbitrary formatting on the right-hand
    side, only for the left, right?

    >isn't entirely different from an if: to some extent it's just a
    >brace-delimited block like any other. However, given that it does
    >actually have slightly strange parsing rules,


    Oh, please don't leave it like that, you code-tease! Is it because
    the terminator, "}" here, can screw it up?
    s{...}{
    $blort .= "}";
    }

    would screw up by terminating at the apparently "inner" "}"?
    (perlop, Gory details of parsing quoted constructs) Or anything else?

    --
    Tim McDaniel,
     
    Tim McDaniel, May 17, 2013
    #3
  4. On 5/17/2013 9:57 AM, Tim McDaniel wrote:
    > In article <>,
    > Ben Morrow <> wrote:
    >> A block like
    >>
    >> s{...}{
    >> # code
    >> }gex;

    >
    > You don't need the "x" to allow arbitrary formatting on the right-hand
    > side, only for the left, right?


    Yes. I meant show comments on the left. See what I mean about vertigo...

    >
    >> isn't entirely different from an if: to some extent it's just a
    >> brace-delimited block like any other. However, given that it does
    >> actually have slightly strange parsing rules,

    >
    > Oh, please don't leave it like that, you code-tease! Is it because
    > the terminator, "}" here, can screw it up?
    > s{...}{
    > $blort .= "}";
    > }
    >
    > would screw up by terminating at the apparently "inner" "}"?
    > (perlop, Gory details of parsing quoted constructs) Or anything else?


    Yes, usually, you'd have to escape delimiters, "\}" or "\{"

    Though, if you have a matching pair, you're ok:

    { $blort .= "{}"; }e

    Yet, escaping one, but not both, there's a problem again:

    ( $blort .= "\{}"; }e; # verboten!



    --
    Charles DeRykus
     
    Charles DeRykus, May 17, 2013
    #4
  5. with <> Ben Morrow wrote:
    >
    > Quoth Charles DeRykus <>:
    >> On 5/16/2013 8:36 PM, Tim McDaniel wrote:
    >> > There's a sub in our code base which has something like
    >> >
    >> > my $prevent_infinite_loop = 0;
    >> > while ($prevent_infinite_loop++ < 1000 && $text =~ /(complicated)

    >> (regular) (expression)/) {
    >> > ... various calculations on $1, $2, $3, ... resulting in

    >> $replacement;
    >> > $text =~ s/(complicated) (regular) (expression)/$replacement/;
    >> > # that's exactly the same regular expression as above
    >> > }
    >> >
    >> > (though I think that, with the particular pattern, an infinite loop is
    >> > impossible.) To add new features and for maintainability,
    >> > I've developed
    >> >
    >> > $text =~ s{(simpler)}{
    >> > my $found = $1;
    >> > ... various calculations involving split on $found, unshift, ...
    >> > $replacement;
    >> > }eg;

    >
    > You can also use @+ and substr, as was discussed here a little while
    > ago:
    >
    > while (... and $text =~ /.../) {
    > my ($start, $length) = ($-[0], $+[0] - $-[0]);
    >
    > # calculate $replacement
    >
    > substr $text, $start, $length, $replacement;
    > }
    >


    Care to explain why you replaced $prevent_infinite_loop with elipsis?

    #!/usr/bin/perl

    use strict;
    use warnings;

    my( $aa, $ab ) = qw/ aaaa bbbb /;

    my $prevent_infinite_loop;
    while( ++$prevent_infinite_loop < 1000 && $aa =~ /aa/ ) {
    my( $start, $length ) = ( $-[0], $+[0] + $-[0] );
    my $replacement = 'a';
    substr $aa, $start, $length, $replacement;
    print "aa: $aa\n";
    }

    $ab =~ s{bb}{ print "ab (before): $ab\n"; 'b' }ge;
    print "ab (now): $ab\n";

    __END__

    {2809:6} [0:0]% p.x foo.tI7BTR.pl
    p.x:1: no such file or directory: ./Build
    aa: aaa
    aa: aa
    aa: a
    ab (before): bbbb
    ab (before): bbbb
    ab (now): bb


    *CUT*

    --
    Torvalds' goal for Linux is very simple: World Domination
    Stallman's goal for GNU is even simpler: Freedom
     
    Eric Pozharski, May 18, 2013
    #5
  6. Tim McDaniel

    szr Guest

    On 5/17/2013 12:35 PM, Ben Morrow wrote:
    > The handling of the closing terminator is what I was referring to. The
    > first unbalanced, unescaped } will close the s{}{}e, and \} will be
    > converted to } before the read-this-as-Perl parser sees it. Fortunately
    > there are relatively few real situations where this matters, though it's
    > easy to create artificial situations with bizarre results, like
    >
    > s{...}{ { "foo" \} # } }e
    >
    > It's also worth noting that \\ is*not* converted to \, even though \\}
    > prevents the backslash from escaping the }. As I said, slightly strange
    > rules...


    The same also seems to be true of eval( expr ), such as in:

    eval qq{ { "foo" \} # } };

    Remove the closing curly brace immediately after the # and you get a

    Can't find string terminator "}" ...

    error, not unlike what happens when the same is done in a substitution
    like yours above.

    Also, keep in mind, in the substitution, you can use other delimiters
    besides { ... }, such as:

    $s =~ s{ ... }< { "foo" } # } >e

    In which case escaping that } that comes before the # actually cases an
    error:

    syntax error at line ..., near ""foo" \"

    Although, the same is no completely true of qq<...>, as both

    eval qq< { "foo" \} # } >;

    and

    eval qq< { "foo" } # } >;

    yield no errors or warnings and return the string: foo

    Not sure what exactly accounts for this difference though.

    --
    szr
     
    szr, May 18, 2013
    #6
  7. with <> Ben Morrow wrote:

    *SKIP*
    >> #!/usr/bin/perl
    >>
    >> use strict;
    >> use warnings;
    >>
    >> my( $aa, $ab ) = qw/ aaaa bbbb /;
    >>
    >> my $prevent_infinite_loop;
    >> while( ++$prevent_infinite_loop < 1000 && $aa =~ /aa/ ) {
    >> my( $start, $length ) = ( $-[0], $+[0] + $-[0] );
    >> my $replacement = 'a';
    >> substr $aa, $start, $length, $replacement;
    >> print "aa: $aa\n";
    >> }
    >>
    >> $ab =~ s{bb}{ print "ab (before): $ab\n"; 'b' }ge;
    >> print "ab (now): $ab\n";
    >>
    >> __END__
    >>
    >> {2809:6} [0:0]% p.x foo.tI7BTR.pl
    >> p.x:1: no such file or directory: ./Build
    >> aa: aaa
    >> aa: aa
    >> aa: a
    >> ab (before): bbbb
    >> ab (before): bbbb
    >> ab (now): bb

    >
    > I'm not sure what this is supposed to be demonstrating...


    I'm trying to show that

    while( m// ) { code(); s/// }

    differs from

    s//code()/eg


    --
    Torvalds' goal for Linux is very simple: World Domination
    Stallman's goal for GNU is even simpler: Freedom
     
    Eric Pozharski, May 19, 2013
    #7
  8. Eric Pozharski <> writes:
    > with <> Ben Morrow wrote:
    >>> #!/usr/bin/perl
    >>>
    >>> use strict;
    >>> use warnings;
    >>>
    >>> my( $aa, $ab ) = qw/ aaaa bbbb /;
    >>>
    >>> my $prevent_infinite_loop;
    >>> while( ++$prevent_infinite_loop < 1000 && $aa =~ /aa/ ) {
    >>> my( $start, $length ) = ( $-[0], $+[0] + $-[0] );
    >>> my $replacement = 'a';
    >>> substr $aa, $start, $length, $replacement;
    >>> print "aa: $aa\n";
    >>> }
    >>>
    >>> $ab =~ s{bb}{ print "ab (before): $ab\n"; 'b' }ge;
    >>> print "ab (now): $ab\n";
    >>>
    >>> __END__
    >>>
    >>> {2809:6} [0:0]% p.x foo.tI7BTR.pl
    >>> p.x:1: no such file or directory: ./Build
    >>> aa: aaa
    >>> aa: aa
    >>> aa: a
    >>> ab (before): bbbb
    >>> ab (before): bbbb
    >>> ab (now): bb

    >>
    >> I'm not sure what this is supposed to be demonstrating...

    >
    > I'm trying to show that
    >
    > while( m// ) { code(); s/// }
    >
    > differs from
    >
    > s//code()/eg


    In particular, s///g scans the text from left to right and replaces
    matches it found in the original input string. The loop will rescan
    the text upon each iteration, possibly performing replacements on the
    results of earlier replacements.
     
    Rainer Weikusat, May 19, 2013
    #8
  9. with <> Ben Morrow wrote:

    *SKIP*
    > OK. Then I'm not sure *why* you're demonstrating that, since I wasn't
    > questioning it.


    Good. Now I have a better understanding of your way of thinking. It
    won't make problems anymore.

    --
    Torvalds' goal for Linux is very simple: World Domination
    Stallman's goal for GNU is even simpler: Freedom
     
    Eric Pozharski, May 20, 2013
    #9
  10. Tim McDaniel

    Tim McDaniel Guest

    In article <>,
    Eric Pozharski <> wrote:
    >I'm trying to show that
    >
    > while( m// ) { code(); s/// }
    >
    >differs from
    >
    > s//code()/eg


    Oh, rescanning! Yes, in general that should indeed be considered, and
    thank you for mentioning it -- s///g doesn't rescan and so avoids any
    infinite loop problems.

    (If you're curious about my original problem that prompted this:
    rescanning doesn't produce any different results for my case -- the
    right-hand side of the s/// cannot produce text that would match
    again. The pattern it's looking for is very roughly
    [[ (word character or space) * | (word character or space) * ]]
    and the replacement is one of the alternatives, which therefore
    cannot have [ or ].)

    --
    Tim McDaniel,
     
    Tim McDaniel, May 20, 2013
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Bryan Krone

    perl efficiency -- fastest grepping?

    Bryan Krone, Nov 5, 2004, in forum: Perl
    Replies:
    1
    Views:
    1,487
    Jim Gibson
    Nov 8, 2004
  2. Trevor Hartman

    dataset efficiency question

    Trevor Hartman, Jul 3, 2003, in forum: ASP .Net
    Replies:
    0
    Views:
    404
    Trevor Hartman
    Jul 3, 2003
  3. Joseph D. DeJohn

    Custom Paging Efficiency

    Joseph D. DeJohn, Aug 6, 2003, in forum: ASP .Net
    Replies:
    1
    Views:
    362
    S. Justin Gengo
    Aug 6, 2003
  4. MC D
    Replies:
    4
    Views:
    473
    Big D
    Nov 18, 2003
  5. =?Utf-8?B?R2VvcmdlIER1cnpp?=

    Regex Efficiency Q - Manipulating Html

    =?Utf-8?B?R2VvcmdlIER1cnpp?=, Feb 28, 2004, in forum: ASP .Net
    Replies:
    0
    Views:
    310
    =?Utf-8?B?R2VvcmdlIER1cnpp?=
    Feb 28, 2004
Loading...

Share This Page