Efficiency of s///e?

T

Tim McDaniel

There's a sub in our code base which has something like

my $prevent_infinite_loop = 0;
while ($prevent_infinite_loop++ < 1000 && $text =~ /(complicated) (regular) (expression)/) {
... various calculations on $1, $2, $3, ... resulting in $replacement;
$text =~ s/(complicated) (regular) (expression)/$replacement/;
# that's exactly the same regular expression as above
}

(though I think that, with the particular pattern, an infinite loop is
impossible.) To add new features and for maintainability,
I've developed

$text =~ s{(simpler)}{
my $found = $1;
... various calculations involving split on $found, unshift, ...
$replacement;
}eg;

I'm wondering about the efficiency of this approach, partincularly
s{}{}e. For instance, is the right-hand side code compiled at run
time, or at compile time? Any other major concerns? We still use
Perl 5.8.8 for now, alas.
 
C

Charles DeRykus

There's a sub in our code base which has something like

my $prevent_infinite_loop = 0;
while ($prevent_infinite_loop++ < 1000 && $text =~ /(complicated) (regular) (expression)/) {
... various calculations on $1, $2, $3, ... resulting in $replacement;
$text =~ s/(complicated) (regular) (expression)/$replacement/;
# that's exactly the same regular expression as above
}

(though I think that, with the particular pattern, an infinite loop is
impossible.) To add new features and for maintainability,
I've developed

$text =~ s{(simpler)}{
my $found = $1;
... various calculations involving split on $found, unshift, ...
$replacement;
}eg;

I'm wondering about the efficiency of this approach, partincularly
s{}{}e. For instance, is the right-hand side code compiled at run
time, or at compile time? Any other major concerns? We still use
Perl 5.8.8 for now, alas.

It's syntax checked and compiled at compile time along with the rest of
your program.

The only gotcha IMO is the replacement morphing into a
long,hard-to-unravel mess that's hard-on-the-eyes and tough to debug.
Even commented, a big multi-line s/pattern/replacement/ becomes vertigo
inducing.

s{ ... }
{ $1 ... # blah
.... # more blah...
...
....
}gex;

At some point, a plain old "if" block seems better.
 
T

Tim McDaniel

A block like

s{...}{
# code
}gex;

You don't need the "x" to allow arbitrary formatting on the right-hand
side, only for the left, right?
isn't entirely different from an if: to some extent it's just a
brace-delimited block like any other. However, given that it does
actually have slightly strange parsing rules,

Oh, please don't leave it like that, you code-tease! Is it because
the terminator, "}" here, can screw it up?
s{...}{
$blort .= "}";
}

would screw up by terminating at the apparently "inner" "}"?
(perlop, Gory details of parsing quoted constructs) Or anything else?
 
C

Charles DeRykus

You don't need the "x" to allow arbitrary formatting on the right-hand
side, only for the left, right?

Yes. I meant show comments on the left. See what I mean about vertigo...
Oh, please don't leave it like that, you code-tease! Is it because
the terminator, "}" here, can screw it up?
s{...}{
$blort .= "}";
}

would screw up by terminating at the apparently "inner" "}"?
(perlop, Gory details of parsing quoted constructs) Or anything else?

Yes, usually, you'd have to escape delimiters, "\}" or "\{"

Though, if you have a matching pair, you're ok:

{ $blort .= "{}"; }e

Yet, escaping one, but not both, there's a problem again:

( $blort .= "\{}"; }e; # verboten!
 
E

Eric Pozharski

with said:
You can also use @+ and substr, as was discussed here a little while
ago:

while (... and $text =~ /.../) {
my ($start, $length) = ($-[0], $+[0] - $-[0]);

# calculate $replacement

substr $text, $start, $length, $replacement;
}

Care to explain why you replaced $prevent_infinite_loop with elipsis?

#!/usr/bin/perl

use strict;
use warnings;

my( $aa, $ab ) = qw/ aaaa bbbb /;

my $prevent_infinite_loop;
while( ++$prevent_infinite_loop < 1000 && $aa =~ /aa/ ) {
my( $start, $length ) = ( $-[0], $+[0] + $-[0] );
my $replacement = 'a';
substr $aa, $start, $length, $replacement;
print "aa: $aa\n";
}

$ab =~ s{bb}{ print "ab (before): $ab\n"; 'b' }ge;
print "ab (now): $ab\n";

__END__

{2809:6} [0:0]% p.x foo.tI7BTR.pl
p.x:1: no such file or directory: ./Build
aa: aaa
aa: aa
aa: a
ab (before): bbbb
ab (before): bbbb
ab (now): bb


*CUT*
 
S

szr

The handling of the closing terminator is what I was referring to. The
first unbalanced, unescaped } will close the s{}{}e, and \} will be
converted to } before the read-this-as-Perl parser sees it. Fortunately
there are relatively few real situations where this matters, though it's
easy to create artificial situations with bizarre results, like

s{...}{ { "foo" \} # } }e

It's also worth noting that \\ is*not* converted to \, even though \\}
prevents the backslash from escaping the }. As I said, slightly strange
rules...

The same also seems to be true of eval( expr ), such as in:

eval qq{ { "foo" \} # } };

Remove the closing curly brace immediately after the # and you get a

Can't find string terminator "}" ...

error, not unlike what happens when the same is done in a substitution
like yours above.

Also, keep in mind, in the substitution, you can use other delimiters
besides { ... }, such as:

$s =~ s{ ... }< { "foo" } # } >e

In which case escaping that } that comes before the # actually cases an
error:

syntax error at line ..., near ""foo" \"

Although, the same is no completely true of qq<...>, as both

eval qq< { "foo" \} # } >;

and

eval qq< { "foo" } # } >;

yield no errors or warnings and return the string: foo

Not sure what exactly accounts for this difference though.
 
E

Eric Pozharski

*SKIP*
#!/usr/bin/perl

use strict;
use warnings;

my( $aa, $ab ) = qw/ aaaa bbbb /;

my $prevent_infinite_loop;
while( ++$prevent_infinite_loop < 1000 && $aa =~ /aa/ ) {
my( $start, $length ) = ( $-[0], $+[0] + $-[0] );
my $replacement = 'a';
substr $aa, $start, $length, $replacement;
print "aa: $aa\n";
}

$ab =~ s{bb}{ print "ab (before): $ab\n"; 'b' }ge;
print "ab (now): $ab\n";

__END__

{2809:6} [0:0]% p.x foo.tI7BTR.pl
p.x:1: no such file or directory: ./Build
aa: aaa
aa: aa
aa: a
ab (before): bbbb
ab (before): bbbb
ab (now): bb

I'm not sure what this is supposed to be demonstrating...

I'm trying to show that

while( m// ) { code(); s/// }

differs from

s//code()/eg
 
R

Rainer Weikusat

Eric Pozharski said:
with said:
#!/usr/bin/perl

use strict;
use warnings;

my( $aa, $ab ) = qw/ aaaa bbbb /;

my $prevent_infinite_loop;
while( ++$prevent_infinite_loop < 1000 && $aa =~ /aa/ ) {
my( $start, $length ) = ( $-[0], $+[0] + $-[0] );
my $replacement = 'a';
substr $aa, $start, $length, $replacement;
print "aa: $aa\n";
}

$ab =~ s{bb}{ print "ab (before): $ab\n"; 'b' }ge;
print "ab (now): $ab\n";

__END__

{2809:6} [0:0]% p.x foo.tI7BTR.pl
p.x:1: no such file or directory: ./Build
aa: aaa
aa: aa
aa: a
ab (before): bbbb
ab (before): bbbb
ab (now): bb

I'm not sure what this is supposed to be demonstrating...

I'm trying to show that

while( m// ) { code(); s/// }

differs from

s//code()/eg

In particular, s///g scans the text from left to right and replaces
matches it found in the original input string. The loop will rescan
the text upon each iteration, possibly performing replacements on the
results of earlier replacements.
 
E

Eric Pozharski

*SKIP*
OK. Then I'm not sure *why* you're demonstrating that, since I wasn't
questioning it.

Good. Now I have a better understanding of your way of thinking. It
won't make problems anymore.
 
T

Tim McDaniel

I'm trying to show that

while( m// ) { code(); s/// }

differs from

s//code()/eg

Oh, rescanning! Yes, in general that should indeed be considered, and
thank you for mentioning it -- s///g doesn't rescan and so avoids any
infinite loop problems.

(If you're curious about my original problem that prompted this:
rescanning doesn't produce any different results for my case -- the
right-hand side of the s/// cannot produce text that would match
again. The pattern it's looking for is very roughly
[[ (word character or space) * | (word character or space) * ]]
and the replacement is one of the alternatives, which therefore
cannot have [ or ].)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top