Regex Question

B

Bill Mill

Hello all,

I've got a test script:

==== start python code =====

tests2 = ["item1: alpha; item2: beta. item3 - gamma--",
"item1: alpha; item3 - gamma--"]

def test_re(regex):
r = re.compile(regex, re.MULTILINE)
for test in tests2:
res = r.search(test)
if res:
print res.groups()
else:
print "Failed"

==== end python code ====

And a simple question:

Why does the first regex that follows successfully grab "beta", while
the second one doesn't?

In [131]: test_re(r"(?:item2: (.*?)\.)")
('beta',)
Failed

In [132]: test_re(r"(?:item2: (.*?)\.)?")
(None,)
(None,)

Shouldn't the '?' greedily grab the group match?

Thanks
Bill Mill
bill.mill at gmail.com
 
J

James Stroud

Bill said:
Hello all,

I've got a test script:

==== start python code =====

tests2 = ["item1: alpha; item2: beta. item3 - gamma--",
"item1: alpha; item3 - gamma--"]

def test_re(regex):
r = re.compile(regex, re.MULTILINE)
for test in tests2:
res = r.search(test)
if res:
print res.groups()
else:
print "Failed"

==== end python code ====

And a simple question:

Why does the first regex that follows successfully grab "beta", while
the second one doesn't?

In [131]: test_re(r"(?:item2: (.*?)\.)")
('beta',)
Failed

In [132]: test_re(r"(?:item2: (.*?)\.)?")
(None,)
(None,)

Shouldn't the '?' greedily grab the group match?

Thanks
Bill Mill
bill.mill at gmail.com

The question-mark matches at zero or one. The first match will be a
group with nothing in it, which satisfies the zero condition. Perhaps
you mean "+"?

e.g.

py> import re
py> rgx = re.compile('1?')
py> rgx.search('a1').groups()
(None,)
py> rgx = re.compile('(1)+')
py> rgx.search('a1').groups()

James
 
B

Bill Mill

James said:
Bill said:
Hello all,

I've got a test script:

==== start python code =====

tests2 = ["item1: alpha; item2: beta. item3 - gamma--",
"item1: alpha; item3 - gamma--"]

def test_re(regex):
r = re.compile(regex, re.MULTILINE)
for test in tests2:
res = r.search(test)
if res:
print res.groups()
else:
print "Failed"

==== end python code ====

And a simple question:

Why does the first regex that follows successfully grab "beta", while
the second one doesn't?

In [131]: test_re(r"(?:item2: (.*?)\.)")
('beta',)
Failed

In [132]: test_re(r"(?:item2: (.*?)\.)?")
(None,)
(None,)

Shouldn't the '?' greedily grab the group match?

Thanks
Bill Mill
bill.mill at gmail.com

The question-mark matches at zero or one. The first match will be a
group with nothing in it, which satisfies the zero condition. Perhaps
you mean "+"?

e.g.

py> import re
py> rgx = re.compile('1?')
py> rgx.search('a1').groups()
(None,)
py> rgx = re.compile('(1)+')
py> rgx.search('a1').groups()

But shouldn't the ? be greedy, and thus prefer the one match to the
zero? This is my sticking point - I've seen that plus works, and this
just confuses me more.

-Bill Mill
bill.mill at gmail.com
 
G

Gabriel Genellina

At said:
But shouldn't the ? be greedy, and thus prefer the one match to the
zero? This is my sticking point - I've seen that plus works, and this
just confuses me more.

Perhaps you have misunderstood what search does.
search( pattern, string[, flags])
Scan through string looking for a location where the regular
expression pattern produces a match

'1?' means 0 or 1 times '1', i.e., nothing or a single '1'.
At the start of the target string, 'a1', we have nothing, so the re
matches, and returns that occurrence. It doesnt matter that a few
characters later there is *another* match, even if it is longer; once
a match is found, the scan is done.
If you want "the longest match of all possible matches along the
string", you should use findall() instead of search().


--
Gabriel Genellina
Softlab SRL






__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas
 
B

Bill Mill

Gabriel said:
At said:
But shouldn't the ? be greedy, and thus prefer the one match to the
zero? This is my sticking point - I've seen that plus works, and this
just confuses me more.

Perhaps you have misunderstood what search does.
search( pattern, string[, flags])
Scan through string looking for a location where the regular
expression pattern produces a match

'1?' means 0 or 1 times '1', i.e., nothing or a single '1'.
At the start of the target string, 'a1', we have nothing, so the re
matches, and returns that occurrence. It doesnt matter that a few
characters later there is *another* match, even if it is longer; once
a match is found, the scan is done.
If you want "the longest match of all possible matches along the
string", you should use findall() instead of search().

That is exactly what I misunderstood. Thank you very much.

-Bill Mill
bill.mill at gmail.com
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Questions about regex 3
regex question 11
regex question 9
Question on regex 1
Question regarding lists and regex 2
Search & Replace with RegEx 3
Nested Regex Conditionals 1
Regex help...pretty please? 4

Members online

No members online now.

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,125
Latest member
VinayKumar Nevatia_
Top