greedy v. non-greedy matching

M

Matt Garrish

Would anynoe care to enlighten me as to why the (.*?) pattern matches
greedily in the following example:

my $text =<<TEXT;
I wouldn't expect the following text to match
xyz 12345 abc
but it does and I lose this text as well
xyz 12345 abc
xyz 12345 abc
xyz 12345 abc
TEXT

$text =~ s/(xyz(.*?)abc\s*)+$//s;

print $text;


But if I change the regex to:

$text =~ s/(xyz(.*?)abc\s*)\1+$//s;

It works as expected.

Matt
 
G

Gunnar Hjalmarsson

Matt said:
Would anynoe care to enlighten me as to why the (.*?) pattern
matches greedily in the following example:

my $text =<<TEXT;
I wouldn't expect the following text to match
xyz 12345 abc
but it does and I lose this text as well
xyz 12345 abc
xyz 12345 abc
xyz 12345 abc
TEXT

$text =~ s/(xyz(.*?)abc\s*)+$//s;

It doesn't. Making it non-greedy does not change the fact that it
matches the *first occurrence* of the pattern.
 
A

Anno Siegel

Matt Garrish said:
Would anynoe care to enlighten me as to why the (.*?) pattern matches
greedily in the following example:

my $text =<<TEXT;
I wouldn't expect the following text to match

[...]

Greedy vs. non-greedy never decides *if* a pattern matches, it can only
modify *what* it matches. So your expectation is unjustified.

Anno
 
F

fifo

Would anynoe care to enlighten me as to why the (.*?) pattern matches
greedily in the following example:

my $text =<<TEXT;
I wouldn't expect the following text to match
xyz 12345 abc
but it does and I lose this text as well
xyz 12345 abc
xyz 12345 abc
xyz 12345 abc
TEXT

$text =~ s/(xyz(.*?)abc\s*)+$//s;

print $text;

You're trying to match the sub-expression /(xyz(.*?)abc\s*)/ repeatedly,
up to end of the string.

This initially matches the first "xyz 12345 abc\n", but this isn't
followed by either the end of the string, nor by something that matches
the expression again. Hence we have to backtrack, and we find that if
we use the /(.*?)/ part to match a bit more of the string, the
expression will next match this:

xyz 12345 abc
but it does and I lose this text as well
xyz 12345 abc

Now this _is_ followed by two more "xyz 12345 abc\n" strings, each of
which also matches the above sub-expression so we're done.
But if I change the regex to:

$text =~ s/(xyz(.*?)abc\s*)\1+$//s;

It works as expected.

This expression requires that whatever it is that matches
/(xyz(.*?)abc\s*)/ is repeated verbatim (at least once) upto the end of
the string. This doesn't happen when that sub-expression matches the
"but it does" line, since this doesn't occur subsequently.
 
M

Matt Garrish

Anno Siegel said:
Matt Garrish said:
Would anynoe care to enlighten me as to why the (.*?) pattern matches
greedily in the following example:

my $text =<<TEXT;
I wouldn't expect the following text to match

[...]

Greedy vs. non-greedy never decides *if* a pattern matches, it can only
modify *what* it matches. So your expectation is unjustified.

Yeah, it was too early in the morning to be thinking about regexes. I was
thinking that the outer grouping would limit the match to multiple instance
of "xyz...abc" to the end of the string, instead of still finding the first
"xyz" to the last "abc".

Matt
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,731
Messages
2,569,432
Members
44,832
Latest member
GlennSmall

Latest Threads

Top