Something confusing about non-greedy reg exp match

gburdell1 · Sep 7, 2009

If I do this:

import re
a=re.search(r'hello.*?money', 'hello how are you hello funny money')

I would expect a.group(0) to be "hello funny money", since .*? is a
non-greedy match. But instead, I get the whole sentence, "hello how
are you hello funny money".

Is this expected behavior? How can I specify the correct regexp so
that I get "hello funny money" ?

r · Sep 7, 2009

If I do this:

import re
a=re.search(r'hello.*?money', 'hello how are you hello funny money')

I would expect a.group(0) to be "hello funny money", since .*? is a
non-greedy match. But instead, I get the whole sentence, "hello how
are you hello funny money".

Is this expected behavior? How can I specify the correct regexp so
that I get "hello funny money" ?

heres one way, but it depends greatly on the actual pattern you
seek...

re.search(r'hello \w+ money', 'hello how are you hello funny
money').group()

your regex matches the whole string because it means ("hello" followed
by any number of *anythings* up to "money") you see?

wisdom = get_enlightend(http://jjsenlightenments.blogspot.com/)

Mark Tolonen · Sep 7, 2009

If I do this:

import re
a=re.search(r'hello.*?money', 'hello how are you hello funny money')

I would expect a.group(0) to be "hello funny money", since .*? is a
non-greedy match. But instead, I get the whole sentence, "hello how
are you hello funny money".

Is this expected behavior? How can I specify the correct regexp so
that I get "hello funny money" ?

A non-greedy match matches the fewest characters before matching the text
*after* the non-greedy match. For example:
'hello how are you hello funny money and more money'

This is why it is difficult to use regular expressions to match nested
objects like parentheses or XML tags. In your case you'll need something
extra to not match the first hello.
'hello funny money'

-Mark

r · Sep 7, 2009

EDIT:
your regex matches the whole string because it means...

"hello" followed by any number of *anythings* up to the first
occurrence of "money")

you see?

George Burdell · Sep 7, 2009

A non-greedy match matches the fewest characters before matching the text
*after* the non-greedy match. For example:

'hello how are you hello funny money'>>> a=re.search(r'hello.*money','hello how are you hello funny money and

'hello how are you hello funny money and more money'

This is why it is difficult to use regular expressions to match nested
objects like parentheses or XML tags. In your case you'll need something
extra to not match the first hello.

'hello funny money'

-Mark

I see now. I also understand r's response. But what if there are many
"hello"'s before "money," and I don't know how many there are? In
other words, I want to find every occurrence of "money," and for each
occurrence, I want to scan back to the first occurrence of "hello."
How can this be done?

George Burdell · Sep 7, 2009

I see now. I also understand r's response. But what if there are many
"hello"'s before "money," and I don't know how many there are? In
other words, I want to find every occurrence of "money," and for each
occurrence, I want to scan back to the first occurrence of "hello."
How can this be done?

I should say "closet" occurrence of "hello," to be more clear.

George Burdell · Sep 7, 2009

A non-greedy match matches the fewest characters before matching the text
*after* the non-greedy match. For example:

'hello how are you hello funny money'>>> a=re.search(r'hello.*money','hello how are you hello funny money and

'hello how are you hello funny money and more money'

This is why it is difficult to use regular expressions to match nested
objects like parentheses or XML tags. In your case you'll need something
extra to not match the first hello.

'hello funny money'

-Mark

I see now. I also understand r's response. But what if there are many
"hello"'s before "money," and I don't know how many there are? In
other words, I want to find every occurrence of "money," and for each
occurrence, I want to scan in the reverse (left) direction to the
closest occurrence of "hello." How can this be done?

7stud · Sep 7, 2009

If I do this:

import re
a=re.search(r'hello.*?money', 'hello how are you hello funny money')

I would expect a.group(0) to be "hello funny money", since .*? is a
non-greedy match. But instead, I get the whole sentence, "hello how
are you hello funny money".

Is this expected behavior?

Yes. search() finds the *first* match. The non-greedy quantifier
does not transform search() into a function that finds all possible
matches and then picks the shortest one. Instead, the non-greedy
quantifier causes search() to return the shortest possible first match
(v. the default which is the "longest possible first match"). In your
case, there is only one possible first match, so the non-greedy
quantifier does nothing.

MRAB · Sep 7, 2009

George said:
I see now. I also understand r's response. But what if there are many
"hello"'s before "money," and I don't know how many there are? In
other words, I want to find every occurrence of "money," and for each
occurrence, I want to scan in the reverse (left) direction to the
closest occurrence of "hello." How can this be done?

'hello funny money'

Paul McGuire · Sep 7, 2009

By recognising the task: not expression matching, but lexing and
parsing. For which you might find the ‘pyparsing’ library of use
<URL:http://pyparsing.wikispaces.com/>.

Even pyparsing has to go through some gyrations to do this sort of
"match, then backup" parsing. Here is my solution:
[['hello funny money']]

SkipTo is analogous to the OP's .*?, but the failOn attribute adds the
logic "if this string is found before matching the target string, then
fail". So pyparsing scans through the string, matches the first
"hello", attempts to skip to the next occurrence of "money", but finds
another "hello" first, so this parse fails. Then the scan continues
until the next "hello" is found, and this time, SkipTo successfully
finds "money" without first hitting a "hello". I then had to wrap the
whole thing in a helper method originalTextFor, otherwise I get an
ugly grouping of separate strings.

So I still don't really have any kind of "backup after matching"
parsing, I just turned this into a qualified forward match. One could
do a similar thing with a parse action. If you could attach some kind
of validating function to a field within a regex, you could have done
the same thing there.

-- Paul

re module non-greedy matches broken	12	Apr 3, 2005
FAQ 6.13 What does it mean that regexes are greedy? How can I get around it?	0	Apr 18, 2011
matching a sentence, greedy up!	1	Aug 10, 2003
[RegExp] Making non-greedy; Escaping parentheses?	3	Sep 12, 2003
[ANN] Reg - Ruby Extended Grammar 0.4.6	0	Nov 18, 2005
Lalr(n) parsing with reg	1	Apr 25, 2005
Tkinter - non-ASCII characters in text widgets problem	15	Jun 25, 2009
Regular expression match objects - compact syntax?	1	Feb 3, 2005

Something confusing about non-greedy reg exp match

gburdell1

r

Mark Tolonen

r

George Burdell

George Burdell

George Burdell

7stud

MRAB

Paul McGuire

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads