Resetting //g

R

Roy Johnson

If you short-circuit out of a global pattern match like so:
for (1..$n) {
$str =~ /($pat)/g;
$NthMatch = $1;
}
where there are more than $n matches, the next time you do
$str =~ /($pat)/g;
even if it's in a completely different block of code, the matching is
going to pick up where it left off. Is there a way to reset it, short
of whiling away the rest of the matches? (I tried several arguments
for the reset function.)

Incidentally, the best way to get the $nth match of $pat in $str is
$str =~ /(?:.*?($pat)){$n}/;
but I'm still curious about short-circuited global matches.
 
B

Ben Morrow

If you short-circuit out of a global pattern match like so:
for (1..$n) {
$str =~ /($pat)/g;
$NthMatch = $1;
}
where there are more than $n matches, the next time you do
$str =~ /($pat)/g;
even if it's in a completely different block of code, the matching is
going to pick up where it left off. Is there a way to reset it, short
of whiling away the rest of the matches? (I tried several arguments
for the reset function.)

From perldoc perlop:

| The position after the last match can be read or set using the pos()
| function; see "pos" in perlfunc.

/Nota bene/ that you call pos on $str, not $pat.
Incidentally, the best way to get the $nth match of $pat in $str is
$str =~ /(?:.*?($pat)){$n}/;
but I'm still curious about short-circuited global matches.

I would have said a better way would be
$nthmatch = ($str =~ /($pat)/g)[$n];
, not least because it actually works, but maybe that's just me... :)

Ben
 
J

Jeff 'japhy' Pinyan

[posted & mailed]

If you short-circuit out of a global pattern match like so:
for (1..$n) {
$str =~ /($pat)/g;
$NthMatch = $1;
}
where there are more than $n matches, the next time you do
$str =~ /($pat)/g;
even if it's in a completely different block of code, the matching is
going to pick up where it left off. Is there a way to reset it, short
of whiling away the rest of the matches? (I tried several arguments
for the reset function.)

You can set pos($str) to undef.

for (1 .. $n) {
$str =~ /($pat)/g;
$last = $1;
}
undef pos($str);

The /g flag makes the regex start looking at pos($str) next time; setting
it to undef makes it start looking at the beginning of the string again.
 
R

Roy Johnson

Ben Morrow said:
From perldoc perlop:

| The position after the last match can be read or set using the pos()
| function; see "pos" in perlfunc.

I shoulda thought of that. Thanks.
/Nota bene/ that you call pos on $str, not $pat.
Incidentally, the best way to get the $nth match of $pat in $str is
$str =~ /(?:.*?($pat)){$n}/;
but I'm still curious about short-circuited global matches.

I would have said a better way would be
$nthmatch = ($str =~ /($pat)/g)[$n];
, not least because it actually works, but maybe that's just me... :)

What makes you think that my pattern doesn't work? It does, while
yours actually doesn't: you need to index $n-1 unless you've reset $[
to 1. The difference is that yours does about twice the work, and so
takes about twice as long. The aborting for loop is even slower.

Some benchmark code for your amusement:

#!perl

use strict;
use warnings;
use Benchmark;

my $str='abcabbcabbbbcabcabbcab';
my $n = 3; ## Find the $nth occurrence
my $pat = qr/ab+/; ## of this pattern

sub pat_n;
sub for_g;
sub m_g;

print "pat_n Match $n in $str is ", pat_n, "\n";
print "for_g Match $n in $str is ", for_g, "\n";
print "m_g Match $n in $str is ", m_g, "\n";

timethese( 100_000, {
'$pat{$n}' => \&pat_n,
'for //g' => \&for_g,
'm_g' => \&m_g,
});

sub pat_n {
$str =~ /(?:.*?($pat)){$n}/;
}

sub for_g {
my $NthMatch;
for (1..$n) {
$str =~ /($pat)/g;
$NthMatch = $1;
}
pos($str) = 0;
$NthMatch;
}

sub m_g {
($str =~ /($pat)/g)[$n-1];
}
 
B

Ben Morrow

Ben Morrow said:
Incidentally, the best way to get the $nth match of $pat in $str is
$str =~ /(?:.*?($pat)){$n}/;
but I'm still curious about short-circuited global matches.

I would have said a better way would be
$nthmatch = ($str =~ /($pat)/g)[$n];
, not least because it actually works, but maybe that's just me... :)

What makes you think that my pattern doesn't work?

Sorry, I must have misread it... or something. I thought it would fail
on inputs like
ab ab ab abb
and get the 'abb' instead of the third 'ab', but I was wrong.
It does, while yours actually doesn't: you need to index $n-1 unless
you've reset $[ to 1.

Yes, of course... :(
The difference is that yours does about twice the work, and so
takes about twice as long. The aborting for loop is even slower.

Some benchmark code for your amusement:

Actually, on my machine my code runs slowest of the three for that
input... :)

Ben
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top