Can I quickly identify what part of a conditional regular expression matches?

A

Alf McLaughlin

Hello all!
I will be as brief as possible:

my $regexp = 'g[bx]|c[ak]|f[zm]'; #Let's say I have this regular
expression:
my $string = 'gbyyyyycayyyyyyfz'; #and I want to find out all the
occurences in the following string:

#so i do this:

while($string =~ /(?=($regexp))/ig) {
print "$1\n";
}

that prints out the following: "gb\nca\fz\n"

but, instead of printing these out I would like to print out the actual
part of the regular expression that matches (as efficiently as
possible!): "g[bx], c[ak], f[zm]"

i can imagine that the part of the regular expression that matches
might be efficiently captured much like the actual match is dumped into
$1.

Many thanks!

-Alf McLaughlin
 
I

it_says_BALLS_on_your forehead

Alf said:
Hello all!
I will be as brief as possible:

my $regexp = 'g[bx]|c[ak]|f[zm]'; #Let's say I have this regular
expression:
my $string = 'gbyyyyycayyyyyyfz'; #and I want to find out all the
occurences in the following string:

#so i do this:

while($string =~ /(?=($regexp))/ig) {
print "$1\n";
}

that prints out the following: "gb\nca\fz\n"

but, instead of printing these out I would like to print out the actual
part of the regular expression that matches (as efficiently as
possible!): "g[bx], c[ak], f[zm]"

i can imagine that the part of the regular expression that matches
might be efficiently captured much like the actual match is dumped into
$1.

here's a brute force way...
my $regexp = 'g[bx]|c[ak]|f[zm]';
my @exs = split(/\|/, $regexp);

for (@exs) {
while ( $string =~ m/$_/ig ) {
print "$_\n";
}
}
 
M

Matt Garrish

Alf McLaughlin said:
Hello all!
I will be as brief as possible:

my $regexp = 'g[bx]|c[ak]|f[zm]'; #Let's say I have this regular
expression:
my $string = 'gbyyyyycayyyyyyfz'; #and I want to find out all the
occurences in the following string:

#so i do this:

while($string =~ /(?=($regexp))/ig) {
print "$1\n";
}

that prints out the following: "gb\nca\fz\n"

but, instead of printing these out I would like to print out the actual
part of the regular expression that matches (as efficiently as
possible!): "g[bx], c[ak], f[zm]"

i can imagine that the part of the regular expression that matches
might be efficiently captured much like the actual match is dumped into
$1.

I suppose the following would work (but I can't see why you care which part
matches!):

my $string = 'gbyyyyycayyyyyyfz';

my $re1 = 'g[bx]';
my $re2 = 'c[ak]';
my $re3 = 'f[zm]';

while ($string =~ /($re1)|($re2)|($re3)/g) {
print $1 ? "$re1 : $1" : ($2 ? "$re2 : $2" : "$re3 : $3"), "\n";
}
 
X

Xicheng

Alf said:
Hello all!
I will be as brief as possible:

my $regexp = 'g[bx]|c[ak]|f[zm]'; #Let's say I have this regular
expression:
my $string = 'gbyyyyycayyyyyyfz'; #and I want to find out all the
occurences in the following string:

#so i do this:

while($string =~ /(?=($regexp))/ig) {
print "$1\n";
}

that prints out the following: "gb\nca\fz\n"

but, instead of printing these out I would like to print out the actual
part of the regular expression that matches (as efficiently as
possible!): "g[bx], c[ak], f[zm]" »
i can imagine that the part of the regular expression that matches
might be efficiently captured much like the actual match is dumped into
$1.
=============
use strict;
use warnings;
my $regexp = qr/
g[bx](?{print "match g[bx]:"}) |
c[ak](?{print "match c[ak]:"}) |
f[zm](?{print "match f[zm]:"})
/x; #Let's say I have this regular
my $string = 'gbyyyfmyycayyyyyyfz'; #and I want to find out all the

while($string =~ /(?>($regexp))/ig) { # no backtracking inside the
parenthesis
print "$1\n";
}
======printout========
match g[bx]:gb
match f[zm]:fm
match c[ak]:ca
match f[zm]:fz
=================

Xicheng
 
X

Xicheng

Xicheng said:
Alf said:
Hello all!
I will be as brief as possible:

my $regexp = 'g[bx]|c[ak]|f[zm]'; #Let's say I have this regular
expression:
my $string = 'gbyyyyycayyyyyyfz'; #and I want to find out all the
occurences in the following string:

#so i do this:

while($string =~ /(?=($regexp))/ig) {
print "$1\n";
}

that prints out the following: "gb\nca\fz\n"

but, instead of printing these out I would like to print out the actual
part of the regular expression that matches (as efficiently as
possible!): "g[bx], c[ak], f[zm]" »
i can imagine that the part of the regular expression that matches
might be efficiently captured much like the actual match is dumped into
$1.

I just found it could be simplier if your use $^R, say:
===================
use strict; use warnings;
my $regexp = qr/(?>
g[bx](?{"g[bx]"}) |
c[ak](?{"c[ak]"}) |
f[zm](?{"f[zm]"})
)/x;
my $string = 'gbyyyfmyycayyyyyyfz';
while($string =~ /($regexp)/ig) {
print "'$1' matches $^R\n";
}
========printout============
'gb' matches g[bx]
'fm' matches f[zm]
'ca' matches c[ak]
'fz' matches f[zm]
=========================

Xicheng
=============
use strict;
use warnings;
my $regexp = qr/
g[bx](?{print "match g[bx]:"}) |
c[ak](?{print "match c[ak]:"}) |
f[zm](?{print "match f[zm]:"})
/x; #Let's say I have this regular
my $string = 'gbyyyfmyycayyyyyyfz'; #and I want to find out all the

while($string =~ /(?>($regexp))/ig) { # no backtracking inside the
parenthesis
print "$1\n";
}
======printout========
match g[bx]:gb
match f[zm]:fm
match c[ak]:ca
match f[zm]:fz
=================

Xicheng
Many thanks!

-Alf McLaughlin
 
X

Xicheng

Xicheng said:
Alf said:
Hello all!
I will be as brief as possible:

my $regexp = 'g[bx]|c[ak]|f[zm]'; #Let's say I have this regular
expression:
my $string = 'gbyyyyycayyyyyyfz'; #and I want to find out all the
occurences in the following string:

#so i do this:

while($string =~ /(?=($regexp))/ig) {
print "$1\n";
}

that prints out the following: "gb\nca\fz\n"

but, instead of printing these out I would like to print out the actual
part of the regular expression that matches (as efficiently as
possible!): "g[bx], c[ak], f[zm]" »
i can imagine that the part of the regular expression that matches
might be efficiently captured much like the actual match is dumped into
$1.

I just found it could be simplier if your use $^R, say:
===================
use strict; use warnings;
my $regexp = qr/(?>
g[bx](?{"g[bx]"}) |
c[ak](?{"c[ak]"}) |
f[zm](?{"f[zm]"})
)/x;
my $string = 'gbyyyfmyycayyyyyyfz';
while($string =~ /($regexp)/ig) {
print "'$1' matches $^R\n";
}
========printout============
'gb' matches g[bx]
'fm' matches f[zm]
'ca' matches c[ak]
'fz' matches f[zm]
=========================

Xicheng
=============
use strict;
use warnings;
my $regexp = qr/
g[bx](?{print "match g[bx]:"}) |
c[ak](?{print "match c[ak]:"}) |
f[zm](?{print "match f[zm]:"})
/x; #Let's say I have this regular
my $string = 'gbyyyfmyycayyyyyyfz'; #and I want to find out all the

while($string =~ /(?>($regexp))/ig) { # no backtracking inside the
parenthesis
print "$1\n";
}
======printout========
match g[bx]:gb
match f[zm]:fm
match c[ak]:ca
match f[zm]:fz
=================

Xicheng
Many thanks!

-Alf McLaughlin
 
J

Josef Moellers

Alf said:
Hello all!
I will be as brief as possible:

my $regexp = 'g[bx]|c[ak]|f[zm]'; #Let's say I have this regular
expression:
my $string = 'gbyyyyycayyyyyyfz'; #and I want to find out all the
occurences in the following string:

#so i do this:

while($string =~ /(?=($regexp))/ig) {
print "$1\n";
}

that prints out the following: "gb\nca\fz\n"

but, instead of printing these out I would like to print out the actual
part of the regular expression that matches (as efficiently as
possible!): "g[bx], c[ak], f[zm]"

i can imagine that the part of the regular expression that matches
might be efficiently captured much like the actual match is dumped into
$1.

You might want to look at the $& entry in perldoc perlvar. It contains
"The string matched by the last successful pattern match".
 
A

Anno Siegel

Matt Garrish said:
Alf McLaughlin said:
Hello all!
I will be as brief as possible:

my $regexp = 'g[bx]|c[ak]|f[zm]'; #Let's say I have this regular
expression:
my $string = 'gbyyyyycayyyyyyfz'; #and I want to find out all the
occurences in the following string:

#so i do this:

while($string =~ /(?=($regexp))/ig) {
print "$1\n";
}

that prints out the following: "gb\nca\fz\n"

but, instead of printing these out I would like to print out the actual
part of the regular expression that matches (as efficiently as
possible!): "g[bx], c[ak], f[zm]"

i can imagine that the part of the regular expression that matches
might be efficiently captured much like the actual match is dumped into
$1.

I suppose the following would work (but I can't see why you care which part
matches!):

my $string = 'gbyyyyycayyyyyyfz';

my $re1 = 'g[bx]';
my $re2 = 'c[ak]';
my $re3 = 'f[zm]';

while ($string =~ /($re1)|($re2)|($re3)/g) {
print $1 ? "$re1 : $1" : ($2 ? "$re2 : $2" : "$re3 : $3"), "\n";
}

Here is a more general solution that works with any number of alternatives:

# A list of alternatives
my @part = qw( g[bx] c[ak] f[zm]);

# build the regex. Each alternative gets capturing parens
my $regexp = join '|', map "($_)", @part;

my $string = 'gbyyyyycayyyyyyfz';
while ( $string =~ /(?=($regexp))/ig) {
# find which part matched, ignoring the outermost capture
my ( $i) = grep defined $-[ $_], 2 .. $#-;
print "$1 matched by $part[ $i - 2]\n";
}

If the regex is more complex than a linear sequence of alternatives,
a solution with code insertions (a la Xicheng elsewhere in this thread)
is probably better.

Anno
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top