Efficiency of s///e?

Tim McDaniel · May 16, 2013

There's a sub in our code base which has something like

my $prevent_infinite_loop = 0;
while ($prevent_infinite_loop++ < 1000 && $text =~ /(complicated) (regular) (expression)/) {
... various calculations on $1, $2, $3, ... resulting in $replacement;
$text =~ s/(complicated) (regular) (expression)/$replacement/;
# that's exactly the same regular expression as above
}

(though I think that, with the particular pattern, an infinite loop is
impossible.) To add new features and for maintainability,
I've developed

$text =~ s{(simpler)}{
my $found = $1;
... various calculations involving split on $found, unshift, ...
$replacement;
}eg;

I'm wondering about the efficiency of this approach, partincularly
s{}{}e. For instance, is the right-hand side code compiled at run
time, or at compile time? Any other major concerns? We still use
Perl 5.8.8 for now, alas.

Charles DeRykus · May 17, 2013

There's a sub in our code base which has something like

my $prevent_infinite_loop = 0;
while ($prevent_infinite_loop++ < 1000 && $text =~ /(complicated) (regular) (expression)/) {
... various calculations on $1, $2, $3, ... resulting in $replacement;
$text =~ s/(complicated) (regular) (expression)/$replacement/;
# that's exactly the same regular expression as above
}

(though I think that, with the particular pattern, an infinite loop is
impossible.) To add new features and for maintainability,
I've developed

$text =~ s{(simpler)}{
my $found = $1;
... various calculations involving split on $found, unshift, ...
$replacement;
}eg;

I'm wondering about the efficiency of this approach, partincularly
s{}{}e. For instance, is the right-hand side code compiled at run
time, or at compile time? Any other major concerns? We still use
Perl 5.8.8 for now, alas.

It's syntax checked and compiled at compile time along with the rest of
your program.

The only gotcha IMO is the replacement morphing into a
long,hard-to-unravel mess that's hard-on-the-eyes and tough to debug.
Even commented, a big multi-line s/pattern/replacement/ becomes vertigo
inducing.

s{ ... }
{ $1 ... # blah
.... # more blah...
...
....
}gex;

At some point, a plain old "if" block seems better.

Tim McDaniel · May 17, 2013

A block like

s{...}{
# code
}gex;

You don't need the "x" to allow arbitrary formatting on the right-hand
side, only for the left, right?

isn't entirely different from an if: to some extent it's just a
brace-delimited block like any other. However, given that it does
actually have slightly strange parsing rules,

Oh, please don't leave it like that, you code-tease! Is it because
the terminator, "}" here, can screw it up?
s{...}{
$blort .= "}";
}

would screw up by terminating at the apparently "inner" "}"?
(perlop, Gory details of parsing quoted constructs) Or anything else?

Charles DeRykus · May 17, 2013

You don't need the "x" to allow arbitrary formatting on the right-hand
side, only for the left, right?

Yes. I meant show comments on the left. See what I mean about vertigo...

Oh, please don't leave it like that, you code-tease! Is it because
the terminator, "}" here, can screw it up?
s{...}{
$blort .= "}";
}

would screw up by terminating at the apparently "inner" "}"?
(perlop, Gory details of parsing quoted constructs) Or anything else?

Yes, usually, you'd have to escape delimiters, "\}" or "\{"

Though, if you have a matching pair, you're ok:

{ $blort .= "{}"; }e

Yet, escaping one, but not both, there's a problem again:

( $blort .= "\{}"; }e; # verboten!

Eric Pozharski · May 18, 2013

with said:
You can also use @+ and substr, as was discussed here a little while
ago:

while (... and $text =~ /.../) {
my ($start, $length) = ($-[0], $+[0] - $-[0]);

# calculate $replacement

substr $text, $start, $length, $replacement;
}

Care to explain why you replaced $prevent_infinite_loop with elipsis?

#!/usr/bin/perl

use strict;
use warnings;

my( $aa, $ab ) = qw/ aaaa bbbb /;

my $prevent_infinite_loop;
while( ++$prevent_infinite_loop < 1000 && $aa =~ /aa/ ) {
my( $start, $length ) = ( $-[0], $+[0] + $-[0] );
my $replacement = 'a';
substr $aa, $start, $length, $replacement;
print "aa: $aa\n";
}

$ab =~ s{bb}{ print "ab (before): $ab\n"; 'b' }ge;
print "ab (now): $ab\n";

__END__

{2809:6} [0:0]% p.x foo.tI7BTR.pl
p.x:1: no such file or directory: ./Build
aa: aaa
aa: aa
aa: a
ab (before): bbbb
ab (before): bbbb
ab (now): bb

*CUT*

szr · May 18, 2013

The handling of the closing terminator is what I was referring to. The
first unbalanced, unescaped } will close the s{}{}e, and \} will be
converted to } before the read-this-as-Perl parser sees it. Fortunately
there are relatively few real situations where this matters, though it's
easy to create artificial situations with bizarre results, like

s{...}{ { "foo" \} # } }e

It's also worth noting that \\ is*not* converted to \, even though \\}
prevents the backslash from escaping the }. As I said, slightly strange
rules...

The same also seems to be true of eval( expr ), such as in:

eval qq{ { "foo" \} # } };

Remove the closing curly brace immediately after the # and you get a

Can't find string terminator "}" ...

error, not unlike what happens when the same is done in a substitution
like yours above.

Also, keep in mind, in the substitution, you can use other delimiters
besides { ... }, such as:

$s =~ s{ ... }< { "foo" } # } >e

In which case escaping that } that comes before the # actually cases an
error:

syntax error at line ..., near ""foo" \"

Although, the same is no completely true of qq<...>, as both

eval qq< { "foo" \} # } >;

and

eval qq< { "foo" } # } >;

yield no errors or warnings and return the string: foo

Not sure what exactly accounts for this difference though.

Eric Pozharski · May 19, 2013

*SKIP*

#!/usr/bin/perl

use strict;
use warnings;

my( $aa, $ab ) = qw/ aaaa bbbb /;

my $prevent_infinite_loop;
while( ++$prevent_infinite_loop < 1000 && $aa =~ /aa/ ) {
my( $start, $length ) = ( $-[0], $+[0] + $-[0] );
my $replacement = 'a';
substr $aa, $start, $length, $replacement;
print "aa: $aa\n";
}

$ab =~ s{bb}{ print "ab (before): $ab\n"; 'b' }ge;
print "ab (now): $ab\n";

__END__

{2809:6} [0:0]% p.x foo.tI7BTR.pl
p.x:1: no such file or directory: ./Build
aa: aaa
aa: aa
aa: a
ab (before): bbbb
ab (before): bbbb
ab (now): bb

Click to expand...

I'm not sure what this is supposed to be demonstrating...

I'm trying to show that

while( m// ) { code(); s/// }

differs from

s//code()/eg

Rainer Weikusat · May 19, 2013

Eric Pozharski said:
with said:

#!/usr/bin/perl

use strict;
use warnings;

my( $aa, $ab ) = qw/ aaaa bbbb /;

my $prevent_infinite_loop;
while( ++$prevent_infinite_loop < 1000 && $aa =~ /aa/ ) {
my( $start, $length ) = ( $-[0], $+[0] + $-[0] );
my $replacement = 'a';
substr $aa, $start, $length, $replacement;
print "aa: $aa\n";
}

$ab =~ s{bb}{ print "ab (before): $ab\n"; 'b' }ge;
print "ab (now): $ab\n";

__END__

{2809:6} [0:0]% p.x foo.tI7BTR.pl
p.x:1: no such file or directory: ./Build
aa: aaa
aa: aa
aa: a
ab (before): bbbb
ab (before): bbbb
ab (now): bb

Click to expand...

I'm not sure what this is supposed to be demonstrating...

Click to expand...

I'm trying to show that

while( m// ) { code(); s/// }

differs from

s//code()/eg

In particular, s///g scans the text from left to right and replaces
matches it found in the original input string. The loop will rescan
the text upon each iteration, possibly performing replacements on the
results of earlier replacements.

Eric Pozharski · May 20, 2013

*SKIP*

OK. Then I'm not sure *why* you're demonstrating that, since I wasn't
questioning it.

Good. Now I have a better understanding of your way of thinking. It
won't make problems anymore.

Tim McDaniel · May 20, 2013

I'm trying to show that

while( m// ) { code(); s/// }

differs from

s//code()/eg

Oh, rescanning! Yes, in general that should indeed be considered, and
thank you for mentioning it -- s///g doesn't rescan and so avoids any
infinite loop problems.

(If you're curious about my original problem that prompted this:
rescanning doesn't produce any different results for my case -- the
right-hand side of the s/// cannot produce text that would match
again. The pattern it's looking for is very roughly
[[ (word character or space) * | (word character or space) * ]]
and the replacement is one of the alternatives, which therefore
cannot have [ or ].)

Can't use Perl RE (s///) within (?{})	1	Apr 26, 2009
FAQ 6.12 Can I use Perl regular expressions to match balanced text?	0	Jan 9, 2011
E-Mail Marketing is now officially SPAM	1	Sep 22, 2006
The devolution of English language and slothful c.l.p behaviors exposed!	50	Jan 24, 2012
Announce SiSU - publishing for e-documents, books, libraries, relational databases	1	Jan 4, 2005
comp.lang.c FAQ list Table of Contents	0	Jan 12, 2008
handling of regexp objects that aren't referenced by variables,arrays, tables or objects	11	Sep 27, 2009
anybody help me	1	Feb 10, 2006

Efficiency of s///e?

Tim McDaniel

Charles DeRykus

Tim McDaniel

Charles DeRykus

Eric Pozharski

szr

Eric Pozharski

Rainer Weikusat

Eric Pozharski

Tim McDaniel

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads