Can I quickly identify what part of a conditional regular expression matches?

Discussion in 'Perl Misc' started by Alf McLaughlin, Feb 9, 2006.

  1. Hello all!
    I will be as brief as possible:

    my $regexp = 'g[bx]|c[ak]|f[zm]'; #Let's say I have this regular
    expression:
    my $string = 'gbyyyyycayyyyyyfz'; #and I want to find out all the
    occurences in the following string:

    #so i do this:

    while($string =~ /(?=($regexp))/ig) {
    print "$1\n";
    }

    that prints out the following: "gb\nca\fz\n"

    but, instead of printing these out I would like to print out the actual
    part of the regular expression that matches (as efficiently as
    possible!): "g[bx], c[ak], f[zm]"

    i can imagine that the part of the regular expression that matches
    might be efficiently captured much like the actual match is dumped into
    $1.

    Many thanks!

    -Alf McLaughlin
     
    Alf McLaughlin, Feb 9, 2006
    #1
    1. Advertising

  2. Alf McLaughlin wrote:
    > Hello all!
    > I will be as brief as possible:
    >
    > my $regexp = 'g[bx]|c[ak]|f[zm]'; #Let's say I have this regular
    > expression:
    > my $string = 'gbyyyyycayyyyyyfz'; #and I want to find out all the
    > occurences in the following string:
    >
    > #so i do this:
    >
    > while($string =~ /(?=($regexp))/ig) {
    > print "$1\n";
    > }
    >
    > that prints out the following: "gb\nca\fz\n"
    >
    > but, instead of printing these out I would like to print out the actual
    > part of the regular expression that matches (as efficiently as
    > possible!): "g[bx], c[ak], f[zm]"
    >
    > i can imagine that the part of the regular expression that matches
    > might be efficiently captured much like the actual match is dumped into
    > $1.


    here's a brute force way...
    my $regexp = 'g[bx]|c[ak]|f[zm]';
    my @exs = split(/\|/, $regexp);

    for (@exs) {
    while ( $string =~ m/$_/ig ) {
    print "$_\n";
    }
    }
     
    it_says_BALLS_on_your forehead, Feb 9, 2006
    #2
    1. Advertising

  3. Alf McLaughlin

    Matt Garrish Guest

    "Alf McLaughlin" <> wrote in message
    news:...
    > Hello all!
    > I will be as brief as possible:
    >
    > my $regexp = 'g[bx]|c[ak]|f[zm]'; #Let's say I have this regular
    > expression:
    > my $string = 'gbyyyyycayyyyyyfz'; #and I want to find out all the
    > occurences in the following string:
    >
    > #so i do this:
    >
    > while($string =~ /(?=($regexp))/ig) {
    > print "$1\n";
    > }
    >
    > that prints out the following: "gb\nca\fz\n"
    >
    > but, instead of printing these out I would like to print out the actual
    > part of the regular expression that matches (as efficiently as
    > possible!): "g[bx], c[ak], f[zm]"
    >
    > i can imagine that the part of the regular expression that matches
    > might be efficiently captured much like the actual match is dumped into
    > $1.
    >


    I suppose the following would work (but I can't see why you care which part
    matches!):

    my $string = 'gbyyyyycayyyyyyfz';

    my $re1 = 'g[bx]';
    my $re2 = 'c[ak]';
    my $re3 = 'f[zm]';

    while ($string =~ /($re1)|($re2)|($re3)/g) {
    print $1 ? "$re1 : $1" : ($2 ? "$re2 : $2" : "$re3 : $3"), "\n";
    }
     
    Matt Garrish, Feb 10, 2006
    #3
  4. Alf McLaughlin

    Xicheng Guest

    Alf McLaughlin wrote:
    > Hello all!
    > I will be as brief as possible:
    >
    > my $regexp = 'g[bx]|c[ak]|f[zm]'; #Let's say I have this regular
    > expression:
    > my $string = 'gbyyyyycayyyyyyfz'; #and I want to find out all the
    > occurences in the following string:
    >
    > #so i do this:
    >
    > while($string =~ /(?=($regexp))/ig) {
    > print "$1\n";
    > }
    >
    > that prints out the following: "gb\nca\fz\n"
    >
    > but, instead of printing these out I would like to print out the actual
    > part of the regular expression that matches (as efficiently as
    > possible!): "g[bx], c[ak], f[zm]"

    »
    > i can imagine that the part of the regular expression that matches
    > might be efficiently captured much like the actual match is dumped into
    > $1.

    =============
    use strict;
    use warnings;
    my $regexp = qr/
    g[bx](?{print "match g[bx]:"}) |
    c[ak](?{print "match c[ak]:"}) |
    f[zm](?{print "match f[zm]:"})
    /x; #Let's say I have this regular
    my $string = 'gbyyyfmyycayyyyyyfz'; #and I want to find out all the

    while($string =~ /(?>($regexp))/ig) { # no backtracking inside the
    parenthesis
    print "$1\n";
    }
    ======printout========
    match g[bx]:gb
    match f[zm]:fm
    match c[ak]:ca
    match f[zm]:fz
    =================

    Xicheng

    > Many thanks!
    >
    > -Alf McLaughlin
     
    Xicheng, Feb 10, 2006
    #4
  5. Alf McLaughlin

    Xicheng Guest

    Xicheng wrote:
    > Alf McLaughlin wrote:
    > > Hello all!
    > > I will be as brief as possible:
    > >
    > > my $regexp = 'g[bx]|c[ak]|f[zm]'; #Let's say I have this regular
    > > expression:
    > > my $string = 'gbyyyyycayyyyyyfz'; #and I want to find out all the
    > > occurences in the following string:
    > >
    > > #so i do this:
    > >
    > > while($string =~ /(?=($regexp))/ig) {
    > > print "$1\n";
    > > }
    > >
    > > that prints out the following: "gb\nca\fz\n"
    > >
    > > but, instead of printing these out I would like to print out the actual
    > > part of the regular expression that matches (as efficiently as
    > > possible!): "g[bx], c[ak], f[zm]"

    > »
    > > i can imagine that the part of the regular expression that matches
    > > might be efficiently captured much like the actual match is dumped into
    > > $1.


    I just found it could be simplier if your use $^R, say:
    ===================
    use strict; use warnings;
    my $regexp = qr/(?>
    g[bx](?{"g[bx]"}) |
    c[ak](?{"c[ak]"}) |
    f[zm](?{"f[zm]"})
    )/x;
    my $string = 'gbyyyfmyycayyyyyyfz';
    while($string =~ /($regexp)/ig) {
    print "'$1' matches $^R\n";
    }
    ========printout============
    'gb' matches g[bx]
    'fm' matches f[zm]
    'ca' matches c[ak]
    'fz' matches f[zm]
    =========================

    Xicheng

    > =============
    > use strict;
    > use warnings;
    > my $regexp = qr/
    > g[bx](?{print "match g[bx]:"}) |
    > c[ak](?{print "match c[ak]:"}) |
    > f[zm](?{print "match f[zm]:"})
    > /x; #Let's say I have this regular
    > my $string = 'gbyyyfmyycayyyyyyfz'; #and I want to find out all the
    >
    > while($string =~ /(?>($regexp))/ig) { # no backtracking inside the
    > parenthesis
    > print "$1\n";
    > }
    > ======printout========
    > match g[bx]:gb
    > match f[zm]:fm
    > match c[ak]:ca
    > match f[zm]:fz
    > =================
    >
    > Xicheng
    >
    > > Many thanks!
    > >
    > > -Alf McLaughlin
     
    Xicheng, Feb 10, 2006
    #5
  6. Alf McLaughlin

    Xicheng Guest

    Xicheng wrote:
    > Alf McLaughlin wrote:
    > > Hello all!
    > > I will be as brief as possible:
    > >
    > > my $regexp = 'g[bx]|c[ak]|f[zm]'; #Let's say I have this regular
    > > expression:
    > > my $string = 'gbyyyyycayyyyyyfz'; #and I want to find out all the
    > > occurences in the following string:
    > >
    > > #so i do this:
    > >
    > > while($string =~ /(?=($regexp))/ig) {
    > > print "$1\n";
    > > }
    > >
    > > that prints out the following: "gb\nca\fz\n"
    > >
    > > but, instead of printing these out I would like to print out the actual
    > > part of the regular expression that matches (as efficiently as
    > > possible!): "g[bx], c[ak], f[zm]"

    > »
    > > i can imagine that the part of the regular expression that matches
    > > might be efficiently captured much like the actual match is dumped into
    > > $1.


    I just found it could be simplier if your use $^R, say:
    ===================
    use strict; use warnings;
    my $regexp = qr/(?>
    g[bx](?{"g[bx]"}) |
    c[ak](?{"c[ak]"}) |
    f[zm](?{"f[zm]"})
    )/x;
    my $string = 'gbyyyfmyycayyyyyyfz';
    while($string =~ /($regexp)/ig) {
    print "'$1' matches $^R\n";
    }
    ========printout============
    'gb' matches g[bx]
    'fm' matches f[zm]
    'ca' matches c[ak]
    'fz' matches f[zm]
    =========================

    Xicheng

    > =============
    > use strict;
    > use warnings;
    > my $regexp = qr/
    > g[bx](?{print "match g[bx]:"}) |
    > c[ak](?{print "match c[ak]:"}) |
    > f[zm](?{print "match f[zm]:"})
    > /x; #Let's say I have this regular
    > my $string = 'gbyyyfmyycayyyyyyfz'; #and I want to find out all the
    >
    > while($string =~ /(?>($regexp))/ig) { # no backtracking inside the
    > parenthesis
    > print "$1\n";
    > }
    > ======printout========
    > match g[bx]:gb
    > match f[zm]:fm
    > match c[ak]:ca
    > match f[zm]:fz
    > =================
    >
    > Xicheng
    >
    > > Many thanks!
    > >
    > > -Alf McLaughlin
     
    Xicheng, Feb 10, 2006
    #6
  7. Very cool! Thanks, Alf

    > I just found it could be simplier if your use $^R, say:
    > ===================
    > use strict; use warnings;
    > my $regexp = qr/(?>
    > g[bx](?{"g[bx]"}) |
    > c[ak](?{"c[ak]"}) |
    > f[zm](?{"f[zm]"})
    > )/x;
    > my $string = 'gbyyyfmyycayyyyyyfz';
    > while($string =~ /($regexp)/ig) {
    > print "'$1' matches $^R\n";
    > }
     
    Alf McLaughlin, Feb 10, 2006
    #7
  8. Re: Can I quickly identify what part of a conditional regular expressionmatches?

    Alf McLaughlin wrote:
    > Hello all!
    > I will be as brief as possible:
    >
    > my $regexp = 'g[bx]|c[ak]|f[zm]'; #Let's say I have this regular
    > expression:
    > my $string = 'gbyyyyycayyyyyyfz'; #and I want to find out all the
    > occurences in the following string:
    >
    > #so i do this:
    >
    > while($string =~ /(?=($regexp))/ig) {
    > print "$1\n";
    > }
    >
    > that prints out the following: "gb\nca\fz\n"
    >
    > but, instead of printing these out I would like to print out the actual
    > part of the regular expression that matches (as efficiently as
    > possible!): "g[bx], c[ak], f[zm]"
    >
    > i can imagine that the part of the regular expression that matches
    > might be efficiently captured much like the actual match is dumped into
    > $1.


    You might want to look at the $& entry in perldoc perlvar. It contains
    "The string matched by the last successful pattern match".

    --
    Josef Möllers (Pinguinpfleger bei FSC)
    If failure had no penalty success would not be a prize
    -- T. Pratchett
     
    Josef Moellers, Feb 10, 2006
    #8
  9. Alf McLaughlin

    Anno Siegel Guest

    Matt Garrish <> wrote in comp.lang.perl.misc:
    >
    > "Alf McLaughlin" <> wrote in message
    > news:...
    > > Hello all!
    > > I will be as brief as possible:
    > >
    > > my $regexp = 'g[bx]|c[ak]|f[zm]'; #Let's say I have this regular
    > > expression:
    > > my $string = 'gbyyyyycayyyyyyfz'; #and I want to find out all the
    > > occurences in the following string:
    > >
    > > #so i do this:
    > >
    > > while($string =~ /(?=($regexp))/ig) {
    > > print "$1\n";
    > > }
    > >
    > > that prints out the following: "gb\nca\fz\n"
    > >
    > > but, instead of printing these out I would like to print out the actual
    > > part of the regular expression that matches (as efficiently as
    > > possible!): "g[bx], c[ak], f[zm]"
    > >
    > > i can imagine that the part of the regular expression that matches
    > > might be efficiently captured much like the actual match is dumped into
    > > $1.
    > >

    >
    > I suppose the following would work (but I can't see why you care which part
    > matches!):
    >
    > my $string = 'gbyyyyycayyyyyyfz';
    >
    > my $re1 = 'g[bx]';
    > my $re2 = 'c[ak]';
    > my $re3 = 'f[zm]';
    >
    > while ($string =~ /($re1)|($re2)|($re3)/g) {
    > print $1 ? "$re1 : $1" : ($2 ? "$re2 : $2" : "$re3 : $3"), "\n";
    > }


    Here is a more general solution that works with any number of alternatives:

    # A list of alternatives
    my @part = qw( g[bx] c[ak] f[zm]);

    # build the regex. Each alternative gets capturing parens
    my $regexp = join '|', map "($_)", @part;

    my $string = 'gbyyyyycayyyyyyfz';
    while ( $string =~ /(?=($regexp))/ig) {
    # find which part matched, ignoring the outermost capture
    my ( $i) = grep defined $-[ $_], 2 .. $#-;
    print "$1 matched by $part[ $i - 2]\n";
    }

    If the regex is more complex than a linear sequence of alternatives,
    a solution with code insertions (a la Xicheng elsewhere in this thread)
    is probably better.

    Anno
    --
    $_='Just another Perl hacker'; print +( join( '', map { eval $_; $@ }
    'use warnings FATAL => "all"; printf "%-1s", "\n"', 'use strict; a',
    'use warnings FATAL => "all"; "@x"', '1->m') =~
    m|${ s/(.)/($1).*/g; \ $_ }|is),',';
     
    Anno Siegel, Feb 10, 2006
    #9
  10. Excellent! Even more flexible. thanks, Alf
     
    Alf McLaughlin, Feb 10, 2006
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    3
    Views:
    2,712
  2. Mystilleef
    Replies:
    5
    Views:
    1,101
    Fredrik Lundh
    Dec 15, 2005
  3. Chris Lasher
    Replies:
    5
    Views:
    323
    Chris Lasher
    Apr 12, 2006
  4. abcd

    regular expression - matches

    abcd, Jul 21, 2006, in forum: Python
    Replies:
    11
    Views:
    506
    John Machin
    Jul 22, 2006
  5. roberto
    Replies:
    0
    Views:
    890
    roberto
    Aug 18, 2006
Loading...

Share This Page