Resetting //g

Roy Johnson · Oct 29, 2003

If you short-circuit out of a global pattern match like so:
for (1..$n) {
$str =~ /($pat)/g;
$NthMatch = $1;
}
where there are more than $n matches, the next time you do
$str =~ /($pat)/g;
even if it's in a completely different block of code, the matching is
going to pick up where it left off. Is there a way to reset it, short
of whiling away the rest of the matches? (I tried several arguments
for the reset function.)

Incidentally, the best way to get the $nth match of $pat in $str is
$str =~ /(?:.*?($pat)){$n}/;
but I'm still curious about short-circuited global matches.

Ben Morrow · Oct 29, 2003

If you short-circuit out of a global pattern match like so:
for (1..$n) {
$str =~ /($pat)/g;
$NthMatch = $1;
}
where there are more than $n matches, the next time you do
$str =~ /($pat)/g;
even if it's in a completely different block of code, the matching is
going to pick up where it left off. Is there a way to reset it, short
of whiling away the rest of the matches? (I tried several arguments
for the reset function.)

From perldoc perlop:

| The position after the last match can be read or set using the pos()
| function; see "pos" in perlfunc.

/Nota bene/ that you call pos on $str, not $pat.

Incidentally, the best way to get the $nth match of $pat in $str is
$str =~ /(?:.*?($pat)){$n}/;
but I'm still curious about short-circuited global matches.

I would have said a better way would be
$nthmatch = ($str =~ /($pat)/g)[$n];
, not least because it actually works, but maybe that's just me...

Ben

Jeff 'japhy' Pinyan · Oct 29, 2003

[posted & mailed]

If you short-circuit out of a global pattern match like so:
for (1..$n) {
$str =~ /($pat)/g;
$NthMatch = $1;
}
where there are more than $n matches, the next time you do
$str =~ /($pat)/g;
even if it's in a completely different block of code, the matching is
going to pick up where it left off. Is there a way to reset it, short
of whiling away the rest of the matches? (I tried several arguments
for the reset function.)

You can set pos($str) to undef.

for (1 .. $n) {
$str =~ /($pat)/g;
$last = $1;
}
undef pos($str);

The /g flag makes the regex start looking at pos($str) next time; setting
it to undef makes it start looking at the beginning of the string again.

Roy Johnson · Oct 29, 2003

Ben Morrow said:
From perldoc perlop:

| The position after the last match can be read or set using the pos()
| function; see "pos" in perlfunc.

I shoulda thought of that. Thanks.

/Nota bene/ that you call pos on $str, not $pat.

Incidentally, the best way to get the $nth match of $pat in $str is
$str =~ /(?:.*?($pat)){$n}/;
but I'm still curious about short-circuited global matches.

Click to expand...

I would have said a better way would be
$nthmatch = ($str =~ /($pat)/g)[$n];
, not least because it actually works, but maybe that's just me...

What makes you think that my pattern doesn't work? It does, while
yours actually doesn't: you need to index $n-1 unless you've reset $[
to 1. The difference is that yours does about twice the work, and so
takes about twice as long. The aborting for loop is even slower.

Some benchmark code for your amusement:

#!perl

use strict;
use warnings;
use Benchmark;

my $str='abcabbcabbbbcabcabbcab';
my $n = 3; ## Find the $nth occurrence
my $pat = qr/ab+/; ## of this pattern

sub pat_n;
sub for_g;
sub m_g;

print "pat_n Match $n in $str is ", pat_n, "\n";
print "for_g Match $n in $str is ", for_g, "\n";
print "m_g Match $n in $str is ", m_g, "\n";

timethese( 100_000, {
'$pat{$n}' => \&pat_n,
'for //g' => \&for_g,
'm_g' => \&m_g,
});

sub pat_n {
$str =~ /(?:.*?($pat)){$n}/;
}

sub for_g {
my $NthMatch;
for (1..$n) {
$str =~ /($pat)/g;
$NthMatch = $1;
}
pos($str) = 0;
$NthMatch;
}

sub m_g {
($str =~ /($pat)/g)[$n-1];
}

Ben Morrow · Oct 29, 2003

Ben Morrow said:
Ben Morrow said:

Incidentally, the best way to get the $nth match of $pat in $str is
$str =~ /(?:.*?($pat)){$n}/;
but I'm still curious about short-circuited global matches.

Click to expand...

I would have said a better way would be
$nthmatch = ($str =~ /($pat)/g)[$n];
, not least because it actually works, but maybe that's just me...

Click to expand...

What makes you think that my pattern doesn't work?

Sorry, I must have misread it... or something. I thought it would fail
on inputs like
ab ab ab abb
and get the 'abb' instead of the third 'ab', but I was wrong.

It does, while yours actually doesn't: you need to index $n-1 unless
you've reset $[ to 1.

Yes, of course...

The difference is that yours does about twice the work, and so
takes about twice as long. The aborting for loop is even slower.

Some benchmark code for your amusement:

Actually, on my machine my code runs slowest of the three for that
input...

Ben

FAQ 6.20 What good is "\G" in a regular expression?	0	Mar 3, 2011
Idk need help in editing this source code	0	Nov 5, 2022
Regex ^ beginning not strong?	2	Jul 26, 2010
Effect of redo on m//g	5	Jun 30, 2009
dynamic regex	1	Oct 27, 2004
modifying the haystack string inside while($haystack =~ /needle/g) {... }	3	Aug 10, 2008
Prematch ($`) and the m//g modifier	7	Mar 10, 2006
FindFirstIn	83	May 22, 2014

Resetting //g

Roy Johnson

Ben Morrow

Jeff 'japhy' Pinyan

Roy Johnson

Ben Morrow

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads