How to dissect a Regexp object?

Discussion in 'Perl Misc' started by kj, Jan 22, 2010.

  1. kj

    kj Guest

    Is there a way to tell if a given Regexp object, generated at
    runtime, includes at least one pair of capture parentheses?

    More generally, is there any documentation for the Regexp class?
    (I'm referring to the class alluded to by the output of, e.g., ref
    qr//). Running perldoc Regexp fails ("no docs found"), and perldoc
    perlre does not say much at all about this class as such.

    TIA!

    Kynn
     
    kj, Jan 22, 2010
    #1
    1. Advertising

  2. kj

    C.DeRykus Guest

    On Jan 22, 9:54 am, kj <> wrote:
    > Is there a way to tell if a given Regexp object, generated at
    > runtime, includes at least one pair of capture parentheses?
    >
    > More generally, is there any documentation for the Regexp class?
    > (I'm referring to the class alluded to by the output of, e.g., ref
    > qr//).  Running perldoc Regexp fails ("no docs found"), and perldoc
    > perlre does not say much at all about this class as such.
    >


    perldoc perlop (see: Regexp Quote-Like Operators)

    The regex object is viewable as a string:

    $ perl -le '$regex = qr/ab(\d+)/; print $regex'
    (?-xism:ab(\d+))

    --
    Charles DeRykus
     
    C.DeRykus, Jan 22, 2010
    #2
    1. Advertising

  3. On 2010-01-22, Ben Morrow <> wrote:
    > The Regexp 'class' doesn't have any methods[1], and isn't really useable
    > as a class at all. In Perl 5.12 it will be mostly replaced by a new
    > REGEXP svtype, which was what it should have been from the beginning.
    > (qr// will still return a ref blessed into Regexp, for compatibility.)


    Is there any way to find that an object is of a REGEXP svtype (without
    using overload::StrVal)? Without this, serialization is not possible;
    witness failure of FreezeThaw...

    Or, at least, get hints that it "might be REGEXP" (with no false
    negatives), so that a call to overload::StrVal() is needed...

    Yours,
    Ilya
     
    Ilya Zakharevich, Jan 23, 2010
    #3
  4. kj

    C.DeRykus Guest

    On Jan 22, 9:54 am, kj <> wrote:
    > Is there a way to tell if a given Regexp object, generated at
    > runtime, includes at least one pair of capture parentheses?
    > ...


    If you have a recently current Perl version, you might be able
    to leverage re::regexp_pattern in list context to check paren's.

    On a Win32 5.10.1 strawberry distro for instance:

    c:\strawberry\perl\bin\perl.exe -le "
    use re 'regexp_pattern';
    $r = qr/ab(\d+)/;
    ($pat) = regexp_pattern($r);
    print $pat"
    ab(\d+)

    So you could parse $pat for capturing paren's. You'd need to
    exclude certain assertions such as (? ... ) but that's left
    as an exercise for the reader :)

    --
    Charles DeRykus
     
    C.DeRykus, Jan 23, 2010
    #4
  5. kj

    Guest

    On Fri, 22 Jan 2010 17:54:49 +0000 (UTC), kj <> wrote:

    >Is there a way to tell if a given Regexp object, generated at
    >runtime, includes at least one pair of capture parentheses?
    >
    >More generally, is there any documentation for the Regexp class?
    >(I'm referring to the class alluded to by the output of, e.g., ref
    >qr//). Running perldoc Regexp fails ("no docs found"), and perldoc
    >perlre does not say much at all about this class as such.
    >
    >TIA!
    >
    >Kynn


    Its not too hard to analyse the string returned by qr//
    to get the start (and thereby the count) of capture groups.
    To get the actual group text requires some recursion and thought.

    use strict;
    use warnings;

    my $tmp = qr/\(\$th (i(s))(i(s))(i(s))(?:(i\(s)\)(i(s))(i(s))\))/x;
    my @capt;

    while ($tmp =~ /( (?<!\\)\((?!\?) )/xg ) {
    push @capt, pos($tmp);
    }
    print "$tmp\n";
    my ($i,$last) = (1,1);

    for my $p (@capt) {
    print (' 'x ($p - $last), $i++ % 10);
    $last = $p+1;
    }
    print "\nFound ",scalar @capt, " capture groups\n";

    __END__

    (?x-ism:\(\$th (i(s))(i(s))(i(s))(?:(i\(s)\)(i(s))(i(s))\)))
    1 2 3 4 5 6 7 8 9 0 1
    Found 11 capture groups
     
    , Jan 23, 2010
    #5
  6. kj

    kj Guest

    In <> writes:

    >On Fri, 22 Jan 2010 17:54:49 +0000 (UTC), kj <> wrote:


    >>Is there a way to tell if a given Regexp object, generated at
    >>runtime, includes at least one pair of capture parentheses?
    >>
    >>More generally, is there any documentation for the Regexp class?
    >>(I'm referring to the class alluded to by the output of, e.g., ref
    >>qr//). Running perldoc Regexp fails ("no docs found"), and perldoc
    >>perlre does not say much at all about this class as such.
    >>
    >>TIA!
    >>
    >>Kynn


    >Its not too hard to analyse the string returned by qr//
    >to get the start (and thereby the count) of capture groups.
    >To get the actual group text requires some recursion and thought.


    > use strict;
    > use warnings;


    > my $tmp = qr/\(\$th (i(s))(i(s))(i(s))(?:(i\(s)\)(i(s))(i(s))\))/x;
    > my @capt;


    > while ($tmp =~ /( (?<!\\)\((?!\?) )/xg ) {
    > push @capt, pos($tmp);
    > }
    > print "$tmp\n";
    > my ($i,$last) = (1,1);


    > for my $p (@capt) {
    > print (' 'x ($p - $last), $i++ % 10);
    > $last = $p+1;
    > }
    > print "\nFound ",scalar @capt, " capture groups\n";
    >
    >__END__


    >(?x-ism:\(\$th (i(s))(i(s))(i(s))(?:(i\(s)\)(i(s))(i(s))\)))
    > 1 2 3 4 5 6 7 8 9 0 1
    >Found 11 capture groups



    Thanks for this code! Now I must study it.

    ~K
     
    kj, Jan 23, 2010
    #6
  7. On Fri, 22 Jan 2010 21:29:30 -0800, sln wrote:

    > Its not too hard to analyse the string returned by qr// to get the start
    > (and thereby the count) of capture groups. To get the actual group text
    > requires some recursion and thought.
    >
    > use strict;
    > use warnings;
    >
    > my $tmp = qr/\(\$th (i(s))(i(s))(i(s))(?:(i\(s)\)(i(s))(i(s))\))/x; my
    > @capt;
    >
    > while ($tmp =~ /( (?<!\\)\((?!\?) )/xg ) {
    > push @capt, pos($tmp);
    > }
    > print "$tmp\n";
    > my ($i,$last) = (1,1);
    >
    > for my $p (@capt) {
    > print (' 'x ($p - $last), $i++ % 10); $last = $p+1;
    > }
    > print "\nFound ",scalar @capt, " capture groups\n";
    >
    > __END__
    >
    > (?x-ism:\(\$th (i(s))(i(s))(i(s))(?:(i\(s)\)(i(s))(i(s))\)))
    > 1 2 3 4 5 6 7 8 9 0 1
    > Found 11 capture groups


    I think this will fail on the regexp /\\(.)/.

    M4
     
    Martijn Lievaart, Jan 24, 2010
    #7
  8. kj

    Guest

    On Sun, 24 Jan 2010 07:42:40 +0100, Martijn Lievaart <> wrote:

    >On Fri, 22 Jan 2010 21:29:30 -0800, sln wrote:
    >
    >> Its not too hard to analyse the string returned by qr// to get the start
    >> (and thereby the count) of capture groups. To get the actual group text
    >> requires some recursion and thought.
    >>
    >> use strict;
    >> use warnings;
    >>
    >> my $tmp = qr/\(\$th (i(s))(i(s))(i(s))(?:(i\(s)\)(i(s))(i(s))\))/x; my
    >> @capt;
    >>
    >> while ($tmp =~ /( (?<!\\)\((?!\?) )/xg ) {
    >> push @capt, pos($tmp);
    >> }
    >> print "$tmp\n";
    >> my ($i,$last) = (1,1);
    >>
    >> for my $p (@capt) {
    >> print (' 'x ($p - $last), $i++ % 10); $last = $p+1;
    >> }
    >> print "\nFound ",scalar @capt, " capture groups\n";
    >>
    >> __END__
    >>
    >> (?x-ism:\(\$th (i(s))(i(s))(i(s))(?:(i\(s)\)(i(s))(i(s))\)))
    >> 1 2 3 4 5 6 7 8 9 0 1
    >> Found 11 capture groups

    >
    >I think this will fail on the regexp /\\(.)/.
    >
    >M4


    Correct. Inserting (?:\\.)* should fix it.
    See if this will fail on anything.

    -sln

    use strict;
    use warnings;

    my $tmp = qr/\(\$th(\\(?:.) \\(.\) \\\(.)(i(s))(i(s))(?:(i\(s)\)(i(s))(i(s))\)))/x;
    my @capt;

    # /(?<!\\)(?:\\.)*\((?!\?)/
    # -------------------------
    my $grprx = qr/
    (?<!\\) # Not an escape behind us
    (?:\\.)* # 0 or more escape + any char
    \( # (
    (?!\?) # Not a ? in front of us
    /x;

    while ($tmp =~ /($grprx)/g ) {
    # print "'$1'\n";
    push @capt, pos($tmp);
    }
    print "$tmp\n";
    my ($i,$last) = (1,1);

    for my $p (@capt) {
    print (' 'x ($p - $last), $i++ % 10);
    $last = $p+1;
    }
    print "\nFound ",scalar @capt, " capture groups\n";

    __END__

    (?x-ism:\(\$th(\\(?:.) \\(.\) \\\(.)(i(s))(i(s))(?:(i\(s)\)(i(s))(i(s))\))))
    1 2 3 4 5 6 7 8 9 0 1
    Found 11 capture groups
     
    , Jan 24, 2010
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Greg Hurrell
    Replies:
    4
    Views:
    163
    James Edward Gray II
    Feb 14, 2007
  2. Mikel Lindsaar
    Replies:
    0
    Views:
    491
    Mikel Lindsaar
    Mar 31, 2008
  3. Joao Silva
    Replies:
    16
    Views:
    363
    7stud --
    Aug 21, 2009
  4. Uldis  Bojars
    Replies:
    2
    Views:
    193
    Janwillem Borleffs
    Dec 17, 2006
  5. Matìj Cepl

    new RegExp().test() or just RegExp().test()

    Matìj Cepl, Nov 24, 2009, in forum: Javascript
    Replies:
    3
    Views:
    181
    Matěj Cepl
    Nov 24, 2009
Loading...

Share This Page