FAQ 6.20 What good is "\G" in a regular expression?

Discussion in 'Perl Misc' started by PerlFAQ Server, Mar 3, 2011.

  1. This is an excerpt from the latest version perlfaq6.pod, which
    comes with the standard Perl distribution. These postings aim to
    reduce the number of repeated questions as well as allow the community
    to review and update the answers. The latest version of the complete
    perlfaq is at http://faq.perl.org .

    --------------------------------------------------------------------

    6.20: What good is "\G" in a regular expression?

    You use the "\G" anchor to start the next match on the same string where
    the last match left off. The regular expression engine cannot skip over
    any characters to find the next match with this anchor, so "\G" is
    similar to the beginning of string anchor, "^". The "\G" anchor is
    typically used with the "g" flag. It uses the value of "pos()" as the
    position to start the next match. As the match operator makes successive
    matches, it updates "pos()" with the position of the next character past
    the last match (or the first character of the next match, depending on
    how you like to look at it). Each string has its own "pos()" value.

    Suppose you want to match all of consecutive pairs of digits in a string
    like "1122a44" and stop matching when you encounter non-digits. You want
    to match 11 and 22 but the letter <a> shows up between 22 and 44 and you
    want to stop at "a". Simply matching pairs of digits skips over the "a"
    and still matches 44.

    $_ = "1122a44";
    my @pairs = m/(\d\d)/g; # qw( 11 22 44 )

    If you use the "\G" anchor, you force the match after 22 to start with
    the "a". The regular expression cannot match there since it does not
    find a digit, so the next match fails and the match operator returns the
    pairs it already found.

    $_ = "1122a44";
    my @pairs = m/\G(\d\d)/g; # qw( 11 22 )

    You can also use the "\G" anchor in scalar context. You still need the
    "g" flag.

    $_ = "1122a44";
    while( m/\G(\d\d)/g )
    {
    print "Found $1\n";
    }

    After the match fails at the letter "a", perl resets "pos()" and the
    next match on the same string starts at the beginning.

    $_ = "1122a44";
    while( m/\G(\d\d)/g )
    {
    print "Found $1\n";
    }

    print "Found $1 after while" if m/(\d\d)/g; # finds "11"

    You can disable "pos()" resets on fail with the "c" flag, documented in
    perlop and perlreref. Subsequent matches start where the last successful
    match ended (the value of "pos()") even if a match on the same string
    has failed in the meantime. In this case, the match after the "while()"
    loop starts at the "a" (where the last match stopped), and since it does
    not use any anchor it can skip over the "a" to find 44.

    $_ = "1122a44";
    while( m/\G(\d\d)/gc )
    {
    print "Found $1\n";
    }

    print "Found $1 after while" if m/(\d\d)/g; # finds "44"

    Typically you use the "\G" anchor with the "c" flag when you want to try
    a different match if one fails, such as in a tokenizer. Jeffrey Friedl
    offers this example which works in 5.004 or later.

    while (<>) {
    chomp;
    PARSER: {
    m/ \G( \d+\b )/gcx && do { print "number: $1\n"; redo; };
    m/ \G( \w+ )/gcx && do { print "word: $1\n"; redo; };
    m/ \G( \s+ )/gcx && do { print "space: $1\n"; redo; };
    m/ \G( [^\w\d]+ )/gcx && do { print "other: $1\n"; redo; };
    }
    }

    For each line, the "PARSER" loop first tries to match a series of digits
    followed by a word boundary. This match has to start at the place the
    last match left off (or the beginning of the string on the first match).
    Since "m/ \G( \d+\b )/gcx" uses the "c" flag, if the string does not
    match that regular expression, perl does not reset pos() and the next
    match starts at the same position to try a different pattern.



    --------------------------------------------------------------------

    The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
    are not necessarily experts in every domain where Perl might show up,
    so please include as much information as possible and relevant in any
    corrections. The perlfaq-workers also don't have access to every
    operating system or platform, so please include relevant details for
    corrections to examples that do not work on particular platforms.
    Working code is greatly appreciated.

    If you'd like to help maintain the perlfaq, see the details in
    perlfaq.pod.
    PerlFAQ Server, Mar 3, 2011
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. VSK
    Replies:
    2
    Views:
    2,272
  2. PerlFAQ Server
    Replies:
    0
    Views:
    72
    PerlFAQ Server
    Jan 28, 2011
  3. PerlFAQ Server
    Replies:
    0
    Views:
    159
    PerlFAQ Server
    Feb 10, 2011
  4. PerlFAQ Server
    Replies:
    0
    Views:
    111
    PerlFAQ Server
    Feb 11, 2011
  5. PerlFAQ Server
    Replies:
    0
    Views:
    125
    PerlFAQ Server
    Mar 30, 2011
Loading...

Share This Page