How to dissect a Regexp object?

K

kj

Is there a way to tell if a given Regexp object, generated at
runtime, includes at least one pair of capture parentheses?

More generally, is there any documentation for the Regexp class?
(I'm referring to the class alluded to by the output of, e.g., ref
qr//). Running perldoc Regexp fails ("no docs found"), and perldoc
perlre does not say much at all about this class as such.

TIA!

Kynn
 
C

C.DeRykus

Is there a way to tell if a given Regexp object, generated at
runtime, includes at least one pair of capture parentheses?

More generally, is there any documentation for the Regexp class?
(I'm referring to the class alluded to by the output of, e.g., ref
qr//).  Running perldoc Regexp fails ("no docs found"), and perldoc
perlre does not say much at all about this class as such.

perldoc perlop (see: Regexp Quote-Like Operators)

The regex object is viewable as a string:

$ perl -le '$regex = qr/ab(\d+)/; print $regex'
(?-xism:ab(\d+))
 
I

Ilya Zakharevich

The Regexp 'class' doesn't have any methods[1], and isn't really useable
as a class at all. In Perl 5.12 it will be mostly replaced by a new
REGEXP svtype, which was what it should have been from the beginning.
(qr// will still return a ref blessed into Regexp, for compatibility.)

Is there any way to find that an object is of a REGEXP svtype (without
using overload::StrVal)? Without this, serialization is not possible;
witness failure of FreezeThaw...

Or, at least, get hints that it "might be REGEXP" (with no false
negatives), so that a call to overload::StrVal() is needed...

Yours,
Ilya
 
C

C.DeRykus

Is there a way to tell if a given Regexp object, generated at
runtime, includes at least one pair of capture parentheses?
...

If you have a recently current Perl version, you might be able
to leverage re::regexp_pattern in list context to check paren's.

On a Win32 5.10.1 strawberry distro for instance:

c:\strawberry\perl\bin\perl.exe -le "
use re 'regexp_pattern';
$r = qr/ab(\d+)/;
($pat) = regexp_pattern($r);
print $pat"
ab(\d+)

So you could parse $pat for capturing paren's. You'd need to
exclude certain assertions such as (? ... ) but that's left
as an exercise for the reader :)
 
S

sln

Is there a way to tell if a given Regexp object, generated at
runtime, includes at least one pair of capture parentheses?

More generally, is there any documentation for the Regexp class?
(I'm referring to the class alluded to by the output of, e.g., ref
qr//). Running perldoc Regexp fails ("no docs found"), and perldoc
perlre does not say much at all about this class as such.

TIA!

Kynn

Its not too hard to analyse the string returned by qr//
to get the start (and thereby the count) of capture groups.
To get the actual group text requires some recursion and thought.

use strict;
use warnings;

my $tmp = qr/\(\$th (i(s))(i(s))(i(s))(?:(i\(s)\)(i(s))(i(s))\))/x;
my @capt;

while ($tmp =~ /( (?<!\\)\((?!\?) )/xg ) {
push @capt, pos($tmp);
}
print "$tmp\n";
my ($i,$last) = (1,1);

for my $p (@capt) {
print (' 'x ($p - $last), $i++ % 10);
$last = $p+1;
}
print "\nFound ",scalar @capt, " capture groups\n";

__END__

(?x-ism:\(\$th (i(s))(i(s))(i(s))(?:(i\(s)\)(i(s))(i(s))\)))
1 2 3 4 5 6 7 8 9 0 1
Found 11 capture groups
 
K

kj

Its not too hard to analyse the string returned by qr//
to get the start (and thereby the count) of capture groups.
To get the actual group text requires some recursion and thought.
use strict;
use warnings;
my $tmp = qr/\(\$th (i(s))(i(s))(i(s))(?:(i\(s)\)(i(s))(i(s))\))/x;
my @capt;
while ($tmp =~ /( (?<!\\)\((?!\?) )/xg ) {
push @capt, pos($tmp);
}
print "$tmp\n";
my ($i,$last) = (1,1);
for my $p (@capt) {
print (' 'x ($p - $last), $i++ % 10);
$last = $p+1;
}
print "\nFound ",scalar @capt, " capture groups\n";

__END__
(?x-ism:\(\$th (i(s))(i(s))(i(s))(?:(i\(s)\)(i(s))(i(s))\)))
1 2 3 4 5 6 7 8 9 0 1
Found 11 capture groups


Thanks for this code! Now I must study it.

~K
 
M

Martijn Lievaart

Its not too hard to analyse the string returned by qr// to get the start
(and thereby the count) of capture groups. To get the actual group text
requires some recursion and thought.

use strict;
use warnings;

my $tmp = qr/\(\$th (i(s))(i(s))(i(s))(?:(i\(s)\)(i(s))(i(s))\))/x; my
@capt;

while ($tmp =~ /( (?<!\\)\((?!\?) )/xg ) {
push @capt, pos($tmp);
}
print "$tmp\n";
my ($i,$last) = (1,1);

for my $p (@capt) {
print (' 'x ($p - $last), $i++ % 10); $last = $p+1;
}
print "\nFound ",scalar @capt, " capture groups\n";

__END__

(?x-ism:\(\$th (i(s))(i(s))(i(s))(?:(i\(s)\)(i(s))(i(s))\)))
1 2 3 4 5 6 7 8 9 0 1
Found 11 capture groups

I think this will fail on the regexp /\\(.)/.

M4
 
S

sln

I think this will fail on the regexp /\\(.)/.

M4

Correct. Inserting (?:\\.)* should fix it.
See if this will fail on anything.

-sln

use strict;
use warnings;

my $tmp = qr/\(\$th(\\(?:.) \\(.\) \\\(.)(i(s))(i(s))(?:(i\(s)\)(i(s))(i(s))\)))/x;
my @capt;

# /(?<!\\)(?:\\.)*\((?!\?)/
# -------------------------
my $grprx = qr/
(?<!\\) # Not an escape behind us
(?:\\.)* # 0 or more escape + any char
\( # (
(?!\?) # Not a ? in front of us
/x;

while ($tmp =~ /($grprx)/g ) {
# print "'$1'\n";
push @capt, pos($tmp);
}
print "$tmp\n";
my ($i,$last) = (1,1);

for my $p (@capt) {
print (' 'x ($p - $last), $i++ % 10);
$last = $p+1;
}
print "\nFound ",scalar @capt, " capture groups\n";

__END__

(?x-ism:\(\$th(\\(?:.) \\(.\) \\\(.)(i(s))(i(s))(?:(i\(s)\)(i(s))(i(s))\))))
1 2 3 4 5 6 7 8 9 0 1
Found 11 capture groups
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,043
Latest member
CannalabsCBDReview

Latest Threads

Top