Can I quickly identify what part of a conditional regular expression matches?

Alf McLaughlin · Feb 9, 2006

Hello all!
I will be as brief as possible:

my $regexp = 'g[bx]|c[ak]|f[zm]'; #Let's say I have this regular
expression:
my $string = 'gbyyyyycayyyyyyfz'; #and I want to find out all the
occurences in the following string:

#so i do this:

while($string =~ /(?=($regexp))/ig) {
print "$1\n";
}

that prints out the following: "gb\nca\fz\n"

but, instead of printing these out I would like to print out the actual
part of the regular expression that matches (as efficiently as
possible!): "g[bx], c[ak], f[zm]"

i can imagine that the part of the regular expression that matches
might be efficiently captured much like the actual match is dumped into
$1.

Many thanks!

-Alf McLaughlin

it_says_BALLS_on_your forehead · Feb 9, 2006

Alf said:
Hello all!
I will be as brief as possible:

my $regexp = 'g[bx]|c[ak]|f[zm]'; #Let's say I have this regular
expression:
my $string = 'gbyyyyycayyyyyyfz'; #and I want to find out all the
occurences in the following string:

#so i do this:

while($string =~ /(?=($regexp))/ig) {
print "$1\n";
}

that prints out the following: "gb\nca\fz\n"

but, instead of printing these out I would like to print out the actual
part of the regular expression that matches (as efficiently as
possible!): "g[bx], c[ak], f[zm]"

i can imagine that the part of the regular expression that matches
might be efficiently captured much like the actual match is dumped into
$1.

here's a brute force way...
my $regexp = 'g[bx]|c[ak]|f[zm]';
my @exs = split(/\|/, $regexp);

for (@exs) {
while ( $string =~ m/$_/ig ) {
print "$_\n";
}
}

Matt Garrish · Feb 10, 2006

Alf McLaughlin said:
Hello all!
I will be as brief as possible:

my $regexp = 'g[bx]|c[ak]|f[zm]'; #Let's say I have this regular
expression:
my $string = 'gbyyyyycayyyyyyfz'; #and I want to find out all the
occurences in the following string:

#so i do this:

while($string =~ /(?=($regexp))/ig) {
print "$1\n";
}

that prints out the following: "gb\nca\fz\n"

but, instead of printing these out I would like to print out the actual
part of the regular expression that matches (as efficiently as
possible!): "g[bx], c[ak], f[zm]"

i can imagine that the part of the regular expression that matches
might be efficiently captured much like the actual match is dumped into
$1.

I suppose the following would work (but I can't see why you care which part
matches!):

my $string = 'gbyyyyycayyyyyyfz';

my $re1 = 'g[bx]';
my $re2 = 'c[ak]';
my $re3 = 'f[zm]';

while ($string =~ /($re1)|($re2)|($re3)/g) {
print $1 ? "$re1 : $1" : ($2 ? "$re2 : $2" : "$re3 : $3"), "\n";
}

Xicheng · Feb 10, 2006

Alf said:
Hello all!
I will be as brief as possible:

my $regexp = 'g[bx]|c[ak]|f[zm]'; #Let's say I have this regular
expression:
my $string = 'gbyyyyycayyyyyyfz'; #and I want to find out all the
occurences in the following string:

#so i do this:

while($string =~ /(?=($regexp))/ig) {
print "$1\n";
}

that prints out the following: "gb\nca\fz\n"

but, instead of printing these out I would like to print out the actual
part of the regular expression that matches (as efficiently as
possible!): "g[bx], c[ak], f[zm]" »
i can imagine that the part of the regular expression that matches
might be efficiently captured much like the actual match is dumped into
$1.

=============
use strict;
use warnings;
my $regexp = qr/
g[bx](?{print "match g[bx]:"}) |
c[ak](?{print "match c[ak]:"}) |
f[zm](?{print "match f[zm]:"})
/x; #Let's say I have this regular
my $string = 'gbyyyfmyycayyyyyyfz'; #and I want to find out all the

while($string =~ /(?>($regexp))/ig) { # no backtracking inside the
parenthesis
print "$1\n";
}
======printout========
match g[bx]:gb
match f[zm]:fm
match c[ak]:ca
match f[zm]:fz
=================

Xicheng

Xicheng · Feb 10, 2006

Xicheng said:
Alf said:

Hello all!
I will be as brief as possible:

my $regexp = 'g[bx]|c[ak]|f[zm]'; #Let's say I have this regular
expression:
my $string = 'gbyyyyycayyyyyyfz'; #and I want to find out all the
occurences in the following string:

#so i do this:

while($string =~ /(?=($regexp))/ig) {
print "$1\n";
}

that prints out the following: "gb\nca\fz\n"

but, instead of printing these out I would like to print out the actual
part of the regular expression that matches (as efficiently as
possible!): "g[bx], c[ak], f[zm]" »
i can imagine that the part of the regular expression that matches
might be efficiently captured much like the actual match is dumped into
$1.

Click to expand...

I just found it could be simplier if your use $^R, say:
===================
use strict; use warnings;
my $regexp = qr/(?>
g[bx](?{"g[bx]"}) |
c[ak](?{"c[ak]"}) |
f[zm](?{"f[zm]"})
)/x;
my $string = 'gbyyyfmyycayyyyyyfz';
while($string =~ /($regexp)/ig) {
print "'$1' matches $^R\n";
}
========printout============
'gb' matches g[bx]
'fm' matches f[zm]
'ca' matches c[ak]
'fz' matches f[zm]
=========================

Xicheng

=============
use strict;
use warnings;
my $regexp = qr/
g[bx](?{print "match g[bx]:"}) |
c[ak](?{print "match c[ak]:"}) |
f[zm](?{print "match f[zm]:"})
/x; #Let's say I have this regular
my $string = 'gbyyyfmyycayyyyyyfz'; #and I want to find out all the

while($string =~ /(?>($regexp))/ig) { # no backtracking inside the
parenthesis
print "$1\n";
}
======printout========
match g[bx]:gb
match f[zm]:fm
match c[ak]:ca
match f[zm]:fz
=================

Xicheng

Many thanks!

-Alf McLaughlin

Click to expand...

Xicheng · Feb 10, 2006

Xicheng said:
Alf said:

Hello all!
I will be as brief as possible:

my $regexp = 'g[bx]|c[ak]|f[zm]'; #Let's say I have this regular
expression:
my $string = 'gbyyyyycayyyyyyfz'; #and I want to find out all the
occurences in the following string:

#so i do this:

while($string =~ /(?=($regexp))/ig) {
print "$1\n";
}

that prints out the following: "gb\nca\fz\n"

but, instead of printing these out I would like to print out the actual
part of the regular expression that matches (as efficiently as
possible!): "g[bx], c[ak], f[zm]" »
i can imagine that the part of the regular expression that matches
might be efficiently captured much like the actual match is dumped into
$1.

Click to expand...

I just found it could be simplier if your use $^R, say:
===================
use strict; use warnings;
my $regexp = qr/(?>
g[bx](?{"g[bx]"}) |
c[ak](?{"c[ak]"}) |
f[zm](?{"f[zm]"})
)/x;
my $string = 'gbyyyfmyycayyyyyyfz';
while($string =~ /($regexp)/ig) {
print "'$1' matches $^R\n";
}
========printout============
'gb' matches g[bx]
'fm' matches f[zm]
'ca' matches c[ak]
'fz' matches f[zm]
=========================

Xicheng

=============
use strict;
use warnings;
my $regexp = qr/
g[bx](?{print "match g[bx]:"}) |
c[ak](?{print "match c[ak]:"}) |
f[zm](?{print "match f[zm]:"})
/x; #Let's say I have this regular
my $string = 'gbyyyfmyycayyyyyyfz'; #and I want to find out all the

while($string =~ /(?>($regexp))/ig) { # no backtracking inside the
parenthesis
print "$1\n";
}
======printout========
match g[bx]:gb
match f[zm]:fm
match c[ak]:ca
match f[zm]:fz
=================

Xicheng

Many thanks!

-Alf McLaughlin

Click to expand...

Alf McLaughlin · Feb 10, 2006

Very cool! Thanks, Alf

Josef Moellers · Feb 10, 2006

Alf said:
Hello all!
I will be as brief as possible:

my $regexp = 'g[bx]|c[ak]|f[zm]'; #Let's say I have this regular
expression:
my $string = 'gbyyyyycayyyyyyfz'; #and I want to find out all the
occurences in the following string:

#so i do this:

while($string =~ /(?=($regexp))/ig) {
print "$1\n";
}

that prints out the following: "gb\nca\fz\n"

but, instead of printing these out I would like to print out the actual
part of the regular expression that matches (as efficiently as
possible!): "g[bx], c[ak], f[zm]"

i can imagine that the part of the regular expression that matches
might be efficiently captured much like the actual match is dumped into
$1.

You might want to look at the $& entry in perldoc perlvar. It contains
"The string matched by the last successful pattern match".

Anno Siegel · Feb 10, 2006

Matt Garrish said:
Alf McLaughlin said:

Hello all!
I will be as brief as possible:

my $regexp = 'g[bx]|c[ak]|f[zm]'; #Let's say I have this regular
expression:
my $string = 'gbyyyyycayyyyyyfz'; #and I want to find out all the
occurences in the following string:

#so i do this:

while($string =~ /(?=($regexp))/ig) {
print "$1\n";
}

that prints out the following: "gb\nca\fz\n"

but, instead of printing these out I would like to print out the actual
part of the regular expression that matches (as efficiently as
possible!): "g[bx], c[ak], f[zm]"

i can imagine that the part of the regular expression that matches
might be efficiently captured much like the actual match is dumped into
$1.

Click to expand...

I suppose the following would work (but I can't see why you care which part
matches!):

my $string = 'gbyyyyycayyyyyyfz';

my $re1 = 'g[bx]';
my $re2 = 'c[ak]';
my $re3 = 'f[zm]';

while ($string =~ /($re1)|($re2)|($re3)/g) {
print $1 ? "$re1 : $1" : ($2 ? "$re2 : $2" : "$re3 : $3"), "\n";
}

Here is a more general solution that works with any number of alternatives:

# A list of alternatives
my @part = qw( g[bx] c[ak] f[zm]);

# build the regex. Each alternative gets capturing parens
my $regexp = join '|', map "($_)", @part;

my $string = 'gbyyyyycayyyyyyfz';
while ( $string =~ /(?=($regexp))/ig) {
# find which part matched, ignoring the outermost capture
my ( $i) = grep defined $-[ $_], 2 .. $#-;
print "$1 matched by $part[ $i - 2]\n";
}

If the regex is more complex than a linear sequence of alternatives,
a solution with code insertions (a la Xicheng elsewhere in this thread)
is probably better.

Anno

Alf McLaughlin · Feb 10, 2006

Excellent! Even more flexible. thanks, Alf

How do I get the text that is found by a regular expression?	10	Apr 30, 2014
FAQ 6.20 What good is "\G" in a regular expression?	0	Mar 3, 2011
replace random matches of regexp	4	Sep 8, 2011
FAQ 6.5 I put a regular expression into $/ but it didn't work. What's wrong?	0	Jan 28, 2011
FAQ 6.12 Can I use Perl regular expressions to match balanced text?	0	Jan 9, 2011
FAQ 6.24 How do I match a regular expression that's in a variable?	0	Apr 19, 2011
relace() with string variable as part of regular expression	2	Dec 11, 2007
Retrieving the matches to a regular expression	1	Jan 3, 2007

Can I quickly identify what part of a conditional regular expression matches?

Alf McLaughlin

it_says_BALLS_on_your forehead

Matt Garrish

Xicheng

Xicheng

Xicheng

Alf McLaughlin

Josef Moellers

Anno Siegel

Alf McLaughlin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads