Hopefully simple regular expression question

Discussion in 'Python' started by peterbe@gmail.com, Jun 14, 2005.

  1. Guest

    I want to match a word against a string such that 'peter' is found in
    "peter bengtsson" or " hey peter," or but in "thepeter bengtsson" or
    "hey peterbe," because the word has to stand on its own. The following
    code works for a single word:

    def createStandaloneWordRegex(word):
    """ return a regular expression that can find 'peter' only if it's
    written
    alone (next to space, start of string, end of string, comma, etc)
    but
    not if inside another word like peterbe """
    return re.compile(r"""
    (
    ^ %s
    (?=\W | $)
    |
    (?<=\W)
    %s
    (?=\W | $)
    )
    """% (word, word), re.I|re.L|re.M|re.X)


    def test_createStandaloneWordRegex():
    def T(word, text):
    print createStandaloneWordRegex(word).findall(text)

    T("peter", "So Peter Bengtsson wrote this")
    T("peter", "peter")
    T("peter bengtsson", "So Peter Bengtsson wrote this")

    The result of running this is::

    ['Peter']
    ['peter']
    [] <--- this is the problem!!


    It works if the parameter is just one word (eg. 'peter') but stops
    working when it's an expression (eg. 'peter bengtsson')

    How do I modify my regular expression to match on expressions as well
    as just single words??
     
    , Jun 14, 2005
    #1
    1. Advertising

  2. John Machin Guest

    wrote:
    > I want to match a word against a string such that 'peter' is found in
    > "peter bengtsson" or " hey peter," or but in "thepeter bengtsson" or
    > "hey peterbe," because the word has to stand on its own. The following
    > code works for a single word:
    >
    > def createStandaloneWordRegex(word):
    > """ return a regular expression that can find 'peter' only if it's
    > written
    > alone (next to space, start of string, end of string, comma, etc)
    > but
    > not if inside another word like peterbe """
    > return re.compile(r"""
    > (
    > ^ %s
    > (?=\W | $)
    > |
    > (?<=\W)
    > %s
    > (?=\W | $)
    > )
    > """% (word, word), re.I|re.L|re.M|re.X)
    >
    >
    > def test_createStandaloneWordRegex():
    > def T(word, text):
    > print createStandaloneWordRegex(word).findall(text)
    >
    > T("peter", "So Peter Bengtsson wrote this")
    > T("peter", "peter")
    > T("peter bengtsson", "So Peter Bengtsson wrote this")
    >
    > The result of running this is::
    >
    > ['Peter']
    > ['peter']
    > [] <--- this is the problem!!
    >
    >
    > It works if the parameter is just one word (eg. 'peter') but stops
    > working when it's an expression (eg. 'peter bengtsson')


    No, not when it's an "expression" (whatever that means), but when the
    parameter contains whitespace, which is ignored in verbose mode.

    >
    > How do I modify my regular expression to match on expressions as well
    > as just single words??
    >


    If you must stick with re.X, you must escape any whitespace characters
    in your "word" -- see re.escape().

    Alternatively (1), drop re.X but this is ugly:

    regex_text_no_X = r"(^%s(?=\W|$)|(?<=\W)%s(?=\W|$))" % (word, word)

    Alternatively (2), consider using the \b gadget; this appears to give
    the same answers as the baroque method:

    regex_text_no_flab = r"\b%s\b" % word


    HTH,
    John
     
    John Machin, Jun 14, 2005
    #2
    1. Advertising

  3. Kalle Anke Guest

    On Tue, 14 Jun 2005 13:01:58 +0200, wrote
    (in article <>):

    > How do I modify my regular expression to match on expressions as well
    > as just single words??


    import re

    def createStandaloneWordRegex(word):
    """ return a regular expression that can find 'peter' only if it's
    written alone (next to space, start of string, end of string,
    comma, etc) but not if inside another word like peterbe """

    return re.compile(r'\b' + word + r'\b', re.I)


    def test_createStandaloneWordRegex():
    def T(word, text):
    print createStandaloneWordRegex(word).findall(text)

    T("peter", "So Peter Bengtsson wrote this")
    T("peter", "peter")
    T("peter bengtsson", "So Peter Bengtsson wrote this")
    test_createStandaloneWordRegex()

    Works?
     
    Kalle Anke, Jun 14, 2005
    #3
  4. On 14 Jun 2005 04:01:58 -0700, rumours say that ""
    <> might have written:

    >I want to match a word against a string such that 'peter' is found in
    >"peter bengtsson" or " hey peter," or but in "thepeter bengtsson" or
    >"hey peterbe," because the word has to stand on its own. The following
    >code works for a single word:


    [snip]

    use \b before and after the word you search, for example:

    rePeter= re.compile("\bpeter\b", re.I)

    In the documentation for the re module, Subsection 4.2.1 is Regular
    Expression Syntax; it'll help a lot if you read it.

    Cheers.
    --
    TZOTZIOY, I speak England very best.
    "Be strict when sending and tolerant when receiving." (from RFC1958)
    I really should keep that in mind when talking with people, actually...
     
    Christos TZOTZIOY Georgiou, Jun 14, 2005
    #4
  5. Guest

    Thank you! I had totally forgot about that. It works.
     
    , Jun 14, 2005
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. VSK
    Replies:
    2
    Views:
    2,332
  2. Jon Thackray

    Hopefully simple XPath question

    Jon Thackray, Nov 10, 2004, in forum: XML
    Replies:
    9
    Views:
    426
    Joris Gillis
    Nov 13, 2004
  3. Replies:
    1
    Views:
    253
  4. Chuft Captain

    A hopefully simple question

    Chuft Captain, Apr 9, 2009, in forum: Ruby
    Replies:
    5
    Views:
    167
    Chuft Captain
    Apr 9, 2009
  5. Harold Pritchett

    A hopefully simple question

    Harold Pritchett, Feb 16, 2005, in forum: Perl Misc
    Replies:
    6
    Views:
    137
    Chris Mattern
    Feb 16, 2005
Loading...

Share This Page