greedy v. non-greedy matching

Discussion in 'Perl Misc' started by Matt Garrish, Feb 16, 2004.

  1. Matt Garrish

    Matt Garrish Guest

    Would anynoe care to enlighten me as to why the (.*?) pattern matches
    greedily in the following example:

    my $text =<<TEXT;
    I wouldn't expect the following text to match
    xyz 12345 abc
    but it does and I lose this text as well
    xyz 12345 abc
    xyz 12345 abc
    xyz 12345 abc
    TEXT

    $text =~ s/(xyz(.*?)abc\s*)+$//s;

    print $text;


    But if I change the regex to:

    $text =~ s/(xyz(.*?)abc\s*)\1+$//s;

    It works as expected.

    Matt
    Matt Garrish, Feb 16, 2004
    #1
    1. Advertising

  2. Matt Garrish wrote:
    > Would anynoe care to enlighten me as to why the (.*?) pattern
    > matches greedily in the following example:
    >
    > my $text =<<TEXT;
    > I wouldn't expect the following text to match
    > xyz 12345 abc
    > but it does and I lose this text as well
    > xyz 12345 abc
    > xyz 12345 abc
    > xyz 12345 abc
    > TEXT
    >
    > $text =~ s/(xyz(.*?)abc\s*)+$//s;


    It doesn't. Making it non-greedy does not change the fact that it
    matches the *first occurrence* of the pattern.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Feb 16, 2004
    #2
    1. Advertising

  3. Matt Garrish

    Anno Siegel Guest

    Matt Garrish <> wrote in comp.lang.perl.misc:
    > Would anynoe care to enlighten me as to why the (.*?) pattern matches
    > greedily in the following example:
    >
    > my $text =<<TEXT;
    > I wouldn't expect the following text to match


    [...]

    Greedy vs. non-greedy never decides *if* a pattern matches, it can only
    modify *what* it matches. So your expectation is unjustified.

    Anno
    Anno Siegel, Feb 16, 2004
    #3
  4. Matt Garrish

    fifo Guest

    At 2004-02-16 07:52 -0500, Matt Garrish wrote:
    > Would anynoe care to enlighten me as to why the (.*?) pattern matches
    > greedily in the following example:
    >
    > my $text =<<TEXT;
    > I wouldn't expect the following text to match
    > xyz 12345 abc
    > but it does and I lose this text as well
    > xyz 12345 abc
    > xyz 12345 abc
    > xyz 12345 abc
    > TEXT
    >
    > $text =~ s/(xyz(.*?)abc\s*)+$//s;
    >
    > print $text;
    >


    You're trying to match the sub-expression /(xyz(.*?)abc\s*)/ repeatedly,
    up to end of the string.

    This initially matches the first "xyz 12345 abc\n", but this isn't
    followed by either the end of the string, nor by something that matches
    the expression again. Hence we have to backtrack, and we find that if
    we use the /(.*?)/ part to match a bit more of the string, the
    expression will next match this:

    xyz 12345 abc
    but it does and I lose this text as well
    xyz 12345 abc

    Now this _is_ followed by two more "xyz 12345 abc\n" strings, each of
    which also matches the above sub-expression so we're done.

    >
    > But if I change the regex to:
    >
    > $text =~ s/(xyz(.*?)abc\s*)\1+$//s;
    >
    > It works as expected.
    >


    This expression requires that whatever it is that matches
    /(xyz(.*?)abc\s*)/ is repeated verbatim (at least once) upto the end of
    the string. This doesn't happen when that sub-expression matches the
    "but it does" line, since this doesn't occur subsequently.
    fifo, Feb 16, 2004
    #4
  5. Matt Garrish

    Matt Garrish Guest

    "Anno Siegel" <-berlin.de> wrote in message
    news:c0qgod$for$-Berlin.DE...
    > Matt Garrish <> wrote in comp.lang.perl.misc:
    > > Would anynoe care to enlighten me as to why the (.*?) pattern matches
    > > greedily in the following example:
    > >
    > > my $text =<<TEXT;
    > > I wouldn't expect the following text to match

    >
    > [...]
    >
    > Greedy vs. non-greedy never decides *if* a pattern matches, it can only
    > modify *what* it matches. So your expectation is unjustified.
    >


    Yeah, it was too early in the morning to be thinking about regexes. I was
    thinking that the outer grouping would limit the match to multiple instance
    of "xyz...abc" to the end of the string, instead of still finding the first
    "xyz" to the last "abc".

    Matt
    Matt Garrish, Feb 16, 2004
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. kaeli
    Replies:
    3
    Views:
    11,310
  2. Peter Fein

    Pyparsing: Non-greedy matching?

    Peter Fein, Dec 31, 2004, in forum: Python
    Replies:
    2
    Views:
    1,247
    Peter Fein
    Dec 31, 2004
  3. Sam Pointon

    regexp non-greedy matching bug?

    Sam Pointon, Dec 4, 2005, in forum: Python
    Replies:
    8
    Views:
    366
    Fredrik Lundh
    Dec 5, 2005
  4. Tim Peters

    Re: regexp non-greedy matching bug?

    Tim Peters, Dec 4, 2005, in forum: Python
    Replies:
    0
    Views:
    389
    Tim Peters
    Dec 4, 2005
  5. Dan Kelly

    Greedy and non greedy quantifiers

    Dan Kelly, Jan 17, 2008, in forum: Ruby
    Replies:
    4
    Views:
    143
    Robert Klemme
    Jan 19, 2008
Loading...

Share This Page