regex alternation problem

J

Jesse Aldridge

import re

s1 = "I am an american"

s2 = "I am american an "

for s in [s1, s2]:
print re.findall(" (am|an) ", s)

# Results:
# ['am']
# ['am', 'an']
 
E

Eugene Perederey

According to documentation re.findall takes a compiled pattern as a
first argument. So try
patt = re.compile(r'(am|an)')
re.findall(patt, s1)
re.findall(patt, s2)

2009/4/18 Jesse Aldridge said:
import re

s1 = "I am an american"

s2 = "I am american an "

for s in [s1, s2]:
   print re.findall(" (am|an) ", s)

# Results:
# ['am']
# ['am', 'an']
 
R

Robert Kern

According to documentation re.findall takes a compiled pattern as a
first argument. So try
patt = re.compile(r'(am|an)')
re.findall(patt, s1)
re.findall(patt, s2)

No, it will take a string pattern, too.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
R

Robert Kern

import re

s1 = "I am an american"

s2 = "I am american an "

for s in [s1, s2]:
print re.findall(" (am|an) ", s)

# Results:
# ['am']
# ['am', 'an']

findall() finds non-overlapping matches. " am an " would work, but not
" am an ".

Instead of including explicit spaces in your pattern, I suggest using the \b
"word boundary" special instruction.
.... print re.findall(r"\b(am|an)\b", s)
....
['am', 'an']
['am', 'an']

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
T

Tim Chase

s1 = "I am an american"
s2 = "I am american an "

for s in [s1, s2]:
print re.findall(" (am|an) ", s)

# Results:
# ['am']
# ['am', 'an']

In your first case, the regexp is consuming the " am " (four
characters, two of which are spaces), leaving no leading space
for the second one to find. You might try using \b as a
word-boundary:

re.findall(r"\b(am|an)\b", s)

-tkc
 
P

Paul McGuire

import re

s1 = "I am an american"

s2 = "I am american an "

for s in [s1, s2]:
    print re.findall(" (am|an) ", s)

# Results:
# ['am']
# ['am', 'an']

Does it help if you expand your RE to its full expression, with '_'s
where the blanks go:

"_am_" or "_an_"

Now look for these in "I_am_an_american". After the first "_am_" is
processed, findall picks up at the leading 'a' of 'an', and there is
no leading blank, so no match. If you search through
"I_am_american_an_", both "am" and "an" have surrounding spaces, so
both match.

Instead of using explicit spaces, try using '\b' meaning word break:
import re
re.findall(r"\b(am|an)\b", "I am an american") ['am', 'an']
re.findall(r"\b(am|an)\b", "I am american an")
['am', 'an']

-- Paul




Your find pattern includes (and consumes) a leading AND trailing space
around each word. In the first string "I am an american", there is a
leading and trailing space around "am", but the trailing space for
"am" is the leading space for "an", so " an "
 
P

Paul McGuire

-- Paul

Your find pattern includes (and consumes) a leading AND trailing space
around each word.  In the first string "I am an american", there is a
leading and trailing space around "am", but the trailing space for
"am" is the leading space for "an", so " an "- Hide quoted text -
Oops, sorry, ignore debris after sig...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,898
Latest member
BlairH7607

Latest Threads

Top