trying to find repeated substrings with regular expression

Discussion in 'Python' started by Robert Dodier, Mar 13, 2006.

  1. Hello all,

    I'm trying to find substrings that look like 'FOO blah blah blah'
    in a string. For example give 'blah FOO blah1a blah1b FOO blah2
    FOO blah3a blah3b blah3b' I want to get three substrings,
    'FOO blah1a blah1b', 'FOO blah2', and 'FOO blah3a blah3b blah3b'.

    I've tried numerous variations on '.*(FOO((?!FOO).)*)+.*'
    and everything I've tried either matches too much or too little.

    I've decided it's easier for me just to search for FOO, and then
    break up the string based on the locations of FOO.

    But I'd like to better understand regular expressions.
    Can someone suggest a regular expression which will return
    groups corresponding to the FOO substrings above?

    Thanks for any insights, I appreciate it a lot.

    Robert Dodier
     
    Robert Dodier, Mar 13, 2006
    #1
    1. Advertising

  2. Robert Dodier wrote:

    > Hello all,
    >
    > I'm trying to find substrings that look like 'FOO blah blah blah'
    > in a string. For example give 'blah FOO blah1a blah1b FOO blah2
    > FOO blah3a blah3b blah3b' I want to get three substrings,
    > 'FOO blah1a blah1b', 'FOO blah2', and 'FOO blah3a blah3b blah3b'.


    > [...]


    > Can someone suggest a regular expression which will return
    > groups corresponding to the FOO substrings above?


    FOO.*?(?=(?:FOO|$))
    --
    Giovanni Bajo
     
    Giovanni Bajo, Mar 13, 2006
    #2
    1. Advertising

  3. Robert Dodier

    Kent Johnson Guest

    Robert Dodier wrote:
    > Hello all,
    >
    > I'm trying to find substrings that look like 'FOO blah blah blah'
    > in a string. For example give 'blah FOO blah1a blah1b FOO blah2
    > FOO blah3a blah3b blah3b' I want to get three substrings,
    > 'FOO blah1a blah1b', 'FOO blah2', and 'FOO blah3a blah3b blah3b'.
    >
    > I've tried numerous variations on '.*(FOO((?!FOO).)*)+.*'
    > and everything I've tried either matches too much or too little.


    FOO(.*?)(?=FOO|$)


    > I've decided it's easier for me just to search for FOO, and then
    > break up the string based on the locations of FOO.


    Use re.split() for this.

    Kent
     
    Kent Johnson, Mar 13, 2006
    #3
  4. Robert Dodier

    Guest

    Robert Dodier wrote:

    > I've decided it's easier for me just to search for FOO, and then
    > break up the string based on the locations of FOO.
    >
    > But I'd like to better understand regular expressions.


    Those who cannot learn regular expressions are doomed to repeat string
    searches. Which is not such a bad thing.

    txt = "blah FOO blah1a blah1b FOO blah2 FOO blah3a blah3b blah3b"

    def fa(s, pat):
    retlist = []
    try:
    while True:
    i = s.rindex(pat)
    retlist.insert(0,s[i:])
    s = s[:i]
    except:
    return retlist

    print fa(txt, "FOO")
     
    , Mar 13, 2006
    #4
  5. [Robert Dodier]
    > I'm trying to find substrings that look like 'FOO blah blah blah'
    > in a string. For example give 'blah FOO blah1a blah1b FOO blah2
    > FOO blah3a blah3b blah3b' I want to get three substrings,
    > 'FOO blah1a blah1b', 'FOO blah2', and 'FOO blah3a blah3b blah3b'.


    No need for regular expressions on this one:

    >>> s = 'blah FOO blah1a blah1b FOO blah2 FOO blah3a blah3b blah3b'
    >>> ['FOO' + tail for tail in s.split('FOO')[1:]]

    ['FOO blah1a blah1b ', 'FOO blah2 ', 'FOO blah3a blah3b blah3b']


    >
    > I've tried numerous variations on '.*(FOO((?!FOO).)*)+.*'
    > and everything I've tried either matches too much or too little.


    The regular expression way is to find the target phrase followed by any
    text followed by the target phrase. The first two are in a group and
    the last is not included in the result group. The any-text section is
    non-greedy:

    >>> import re
    >>> re.findall('(FOO.*?)(?=FOO|$)', s)

    ['FOO blah1a blah1b ', 'FOO blah2 ', 'FOO blah3a blah3b blah3b']


    Raymond
     
    Raymond Hettinger, Mar 14, 2006
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. VSK
    Replies:
    2
    Views:
    2,379
  2. Replies:
    7
    Views:
    323
    Larry Bates
    Feb 24, 2006
  3. scsoce
    Replies:
    1
    Views:
    297
    Hrvoje Niksic
    Nov 21, 2008
  4. MRAB
    Replies:
    0
    Views:
    410
  5. Michele Dondi
    Replies:
    0
    Views:
    196
    Michele Dondi
    Nov 15, 2007
Loading...

Share This Page