regex alternation problem

Discussion in 'Python' started by Jesse Aldridge, Apr 17, 2009.

  1. import re

    s1 = "I am an american"

    s2 = "I am american an "

    for s in [s1, s2]:
    print re.findall(" (am|an) ", s)

    # Results:
    # ['am']
    # ['am', 'an']

    -------

    I want the results to be the same for each string. What am I doing
    wrong?
    Jesse Aldridge, Apr 17, 2009
    #1
    1. Advertising

  2. According to documentation re.findall takes a compiled pattern as a
    first argument. So try
    patt = re.compile(r'(am|an)')
    re.findall(patt, s1)
    re.findall(patt, s2)

    2009/4/18 Jesse Aldridge <>:
    > import re
    >
    > s1 = "I am an american"
    >
    > s2 = "I am american an "
    >
    > for s in [s1, s2]:
    >    print re.findall(" (am|an) ", s)
    >
    > # Results:
    > # ['am']
    > # ['am', 'an']
    >
    > -------
    >
    > I want the results to be the same for each string.  What am I doing
    > wrong?
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >




    --
    Sincerely yours, Eugene Perederey
    Eugene Perederey, Apr 17, 2009
    #2
    1. Advertising

  3. Jesse Aldridge

    Robert Kern Guest

    On 2009-04-17 16:57, Eugene Perederey wrote:
    > According to documentation re.findall takes a compiled pattern as a
    > first argument. So try
    > patt = re.compile(r'(am|an)')
    > re.findall(patt, s1)
    > re.findall(patt, s2)


    No, it will take a string pattern, too.

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
    Robert Kern, Apr 17, 2009
    #3
  4. Jesse Aldridge

    Robert Kern Guest

    On 2009-04-17 16:49, Jesse Aldridge wrote:
    > import re
    >
    > s1 = "I am an american"
    >
    > s2 = "I am american an "
    >
    > for s in [s1, s2]:
    > print re.findall(" (am|an) ", s)
    >
    > # Results:
    > # ['am']
    > # ['am', 'an']
    >
    > -------
    >
    > I want the results to be the same for each string. What am I doing
    > wrong?


    findall() finds non-overlapping matches. " am an " would work, but not
    " am an ".

    Instead of including explicit spaces in your pattern, I suggest using the \b
    "word boundary" special instruction.

    >>> for s in [s1, s2]:

    .... print re.findall(r"\b(am|an)\b", s)
    ....
    ['am', 'an']
    ['am', 'an']

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
    Robert Kern, Apr 17, 2009
    #4
  5. Jesse Aldridge

    Tim Chase Guest

    > s1 = "I am an american"
    >
    > s2 = "I am american an "
    >
    > for s in [s1, s2]:
    > print re.findall(" (am|an) ", s)
    >
    > # Results:
    > # ['am']
    > # ['am', 'an']
    >
    > -------
    >
    > I want the results to be the same for each string. What am I doing
    > wrong?


    In your first case, the regexp is consuming the " am " (four
    characters, two of which are spaces), leaving no leading space
    for the second one to find. You might try using \b as a
    word-boundary:

    re.findall(r"\b(am|an)\b", s)

    -tkc
    Tim Chase, Apr 17, 2009
    #5
  6. Jesse Aldridge

    Paul McGuire Guest

    On Apr 17, 4:49 pm, Jesse Aldridge <> wrote:
    > import re
    >
    > s1 = "I am an american"
    >
    > s2 = "I am american an "
    >
    > for s in [s1, s2]:
    >     print re.findall(" (am|an) ", s)
    >
    > # Results:
    > # ['am']
    > # ['am', 'an']
    >
    > -------
    >
    > I want the results to be the same for each string.  What am I doing
    > wrong?


    Does it help if you expand your RE to its full expression, with '_'s
    where the blanks go:

    "_am_" or "_an_"

    Now look for these in "I_am_an_american". After the first "_am_" is
    processed, findall picks up at the leading 'a' of 'an', and there is
    no leading blank, so no match. If you search through
    "I_am_american_an_", both "am" and "an" have surrounding spaces, so
    both match.

    Instead of using explicit spaces, try using '\b' meaning word break:

    >>> import re
    >>> re.findall(r"\b(am|an)\b", "I am an american")

    ['am', 'an']
    >>> re.findall(r"\b(am|an)\b", "I am american an")

    ['am', 'an']

    -- Paul




    Your find pattern includes (and consumes) a leading AND trailing space
    around each word. In the first string "I am an american", there is a
    leading and trailing space around "am", but the trailing space for
    "am" is the leading space for "an", so " an "
    Paul McGuire, Apr 17, 2009
    #6
  7. Jesse Aldridge

    Paul McGuire Guest

    On Apr 17, 5:28 pm, Paul McGuire <> wrote:
    > -- Paul
    >
    > Your find pattern includes (and consumes) a leading AND trailing space
    > around each word.  In the first string "I am an american", there is a
    > leading and trailing space around "am", but the trailing space for
    > "am" is the leading space for "an", so " an "- Hide quoted text -
    >

    Oops, sorry, ignore debris after sig...
    Paul McGuire, Apr 17, 2009
    #7
  8. On Apr 17, 5:30 pm, Paul McGuire <> wrote:
    > On Apr 17, 5:28 pm, Paul McGuire <> wrote:> -- Paul
    >
    > > Your find pattern includes (and consumes) a leading AND trailing space
    > > around each word.  In the first string "I am an american", there is a
    > > leading and trailing space around "am", but the trailing space for
    > > "am" is the leading space for "an", so " an "- Hide quoted text -

    >
    > Oops, sorry, ignore debris after sig...


    Alright, I got it. Thanks for the help guys.
    Jesse Aldridge, Apr 18, 2009
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?SmViQnVzaGVsbA==?=

    Is ASP Validator Regex Engine Same As VS2003 Find Regex Engine?

    =?Utf-8?B?SmViQnVzaGVsbA==?=, Oct 22, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    688
    =?Utf-8?B?SmViQnVzaGVsbA==?=
    Oct 22, 2005
  2. Rick Venter

    perl regex to java regex

    Rick Venter, Oct 29, 2003, in forum: Java
    Replies:
    5
    Views:
    1,606
    Ant...
    Nov 6, 2003
  3. Replies:
    2
    Views:
    589
  4. Replies:
    3
    Views:
    730
    Reedick, Andrew
    Jul 1, 2008
  5. Simon Strandgaard
    Replies:
    1
    Views:
    82
    Simon Strandgaard
    Dec 4, 2003
Loading...

Share This Page