python replace/sub/wildcard/regex issue

Discussion in 'Python' started by tom, Jan 19, 2010.

  1. tom

    tom Guest

    hi...

    trying to figure out how to solve what should be an easy python/regex/
    wildcard/replace issue.

    i've tried a number of different approaches.. so i must be missing
    something...

    my initial sample text are:

    Soo Choi</span>LONGEDITBOX">Apryl Berney
    Soo Choi</span>LONGEDITBOX">Joel Franks
    Joel Franks</span>GEDITBOX">Alexander Yamato

    and i'm trying to get

    Soo Choi foo Apryl Berney
    Soo Choi foo Joel Franks
    Joel Franks foo Alexander Yamato

    the issue i'm facing.. is how to start at "</" and end at '">' and
    substitute inclusive of the stuff inside the regex...

    i've tried derivations of

    name=re.sub("</s[^>]*\">"," foo ",name)

    but i'm missing something...

    thoughts... thanks

    tom
     
    tom, Jan 19, 2010
    #1
    1. Advertising

  2. tom

    Chris Rebert Guest

    On Mon, Jan 18, 2010 at 8:04 PM, tom <> wrote:
    > hi...
    >
    > trying to figure out how to solve what should be an easy python/regex/
    > wildcard/replace issue.
    >
    > i've tried a number of different approaches.. so i must be missing
    > something...
    >
    > my initial sample text are:
    >
    > Soo Choi</span>LONGEDITBOX">Apryl Berney
    > Soo Choi</span>LONGEDITBOX">Joel Franks
    > Joel Franks</span>GEDITBOX">Alexander Yamato
    >
    > and i'm trying to get
    >
    > Soo Choi foo Apryl Berney
    > Soo Choi foo Joel Franks
    > Joel Franks foo Alexander Yamato
    >
    > the issue i'm facing.. is how to start at "</" and end at '">' and
    > substitute inclusive of the stuff inside the regex...
    >
    > i've tried derivations of
    >
    > name=re.sub("</s[^>]*\">"," foo ",name)
    >
    > but i'm missing something...
    >
    > thoughts... thanks


    "Some people, when confronted with a problem, think 'I know, I'll use
    regular expressions.' Now they have two problems."

    Assuming your sample text is representative of all your test:

    new_text = "\n".join(line[:line.index('<')] +
    line[line.rindex('>')+1:] for line in your_text.split('\n'))

    Cheers,
    Chris
    --
    http://blog.rebertia.com
     
    Chris Rebert, Jan 19, 2010
    #2
    1. Advertising

  3. tom

    alex23 Guest

    On Jan 19, 2:04 pm, tom <> wrote:
    > trying to figure out how to solve what should be an easy python/regex/
    > wildcard/replace issue.
    > but i'm missing something...


    Well, some would say you've missed the most obvious solution of _not_
    using regexps :)

    I'd probably do it via string methods wrapped up in a helper function:

    >>> def extract(text):

    .... first, rest = text.split('<', 1)
    .... ignore, last = rest.rsplit('>', 1)
    .... return '%s foo %s' % (first, last)
    ....
    >>> extract('Soo Choi</span>LONGEDITBOX">Apryl Berney')

    'Soo Choi foo Apryl Berney'
    >>> extract('Soo Choi</span>LONGEDITBOX">Joel Franks')

    'Soo Choi foo Joel Franks'
    >>> extract('Joel Franks</span>GEDITBOX">Alexander Yamato')

    'Joel Franks foo Alexander Yamato'
     
    alex23, Jan 19, 2010
    #3
  4. tom

    Chris Rebert Guest

    On Mon, Jan 18, 2010 at 8:31 PM, Chris Rebert <> wrote:
    > On Mon, Jan 18, 2010 at 8:04 PM, tom <> wrote:
    >> hi...
    >>
    >> trying to figure out how to solve what should be an easy python/regex/
    >> wildcard/replace issue.
    >>
    >> i've tried a number of different approaches.. so i must be missing
    >> something...
    >>
    >> my initial sample text are:
    >>
    >> Soo Choi</span>LONGEDITBOX">Apryl Berney
    >> Soo Choi</span>LONGEDITBOX">Joel Franks
    >> Joel Franks</span>GEDITBOX">Alexander Yamato
    >>
    >> and i'm trying to get
    >>
    >> Soo Choi foo Apryl Berney
    >> Soo Choi foo Joel Franks
    >> Joel Franks foo Alexander Yamato
    >>
    >> the issue i'm facing.. is how to start at "</" and end at '">' and
    >> substitute inclusive of the stuff inside the regex...
    >>
    >> i've tried derivations of
    >>
    >> name=re.sub("</s[^>]*\">"," foo ",name)
    >>
    >> but i'm missing something...
    >>
    >> thoughts... thanks

    >
    > "Some people, when confronted with a problem, think 'I know, I'll use
    > regular expressions.' Now they have two problems."
    >
    > Assuming your sample text is representative of all your test:
    >
    > new_text = "\n".join(line[:line.index('<')] + line[line.rindex('>')+1:] for line in your_text.split('\n'))


    Erm, remembering to intersperse the "foo" (should be all 1-line, bloody Gmail):
    new_text = "\n".join(line[:line.index('<')] + " foo " +
    line[line.rindex('>')+1:] for line in your_text.split('\n'))

    Or just use alex23's method, which seems all-round superior. :)

    Cheers,
    Chris
     
    Chris Rebert, Jan 19, 2010
    #4
  5. tom

    dippim Guest

    On Jan 18, 11:04 pm, tom <> wrote:
    > hi...
    >
    > trying to figure out how to solve what should be an easy python/regex/
    > wildcard/replace issue.
    >
    > i've tried a number of different approaches.. so i must be missing
    > something...
    >
    > my initial sample text are:
    >
    > Soo Choi</span>LONGEDITBOX">Apryl Berney
    > Soo Choi</span>LONGEDITBOX">Joel Franks
    > Joel Franks</span>GEDITBOX">Alexander Yamato
    >
    > and i'm trying to get
    >
    > Soo Choi foo Apryl Berney
    > Soo Choi foo Joel Franks
    > Joel Franks foo Alexander Yamato
    >
    > the issue i'm facing.. is how to start at "</" and end at '">' and
    > substitute inclusive of the stuff inside the regex...
    >
    > i've tried derivations of
    >
    > name=re.sub("</s[^>]*\">"," foo ",name)
    >
    > but i'm missing something...
    >
    > thoughts... thanks
    >
    > tom


    The problem here is that </s matches itself correctly. However, [^>]*
    consumes anything that's not > and then stops when it hits something
    that is >. So, [^>]* consumes "pan" in each case, then tries to match
    \">, but fails since there isn't a ", so the match ends. It never
    makes it to the second >.

    I agree with Chris Rebert, regexes are dangerous because the number of
    possible cases where you can match isn't always clear (see the above
    explanation :). Also, if the number of comparisons you have to do
    isn't high, they can be inefficient. However, for your limited set of
    examples the following should work:

    aList = ['Soo Choi</span>LONGEDITBOX">Apryl Berney',
    'Soo Choi</span>LONGEDITBOX">Joel Franks',
    'Joel Franks</span>GEDITBOX">Alexander Yamato']

    matcher = re.compile(r"<[\w\W]*>")

    newList = []
    for x in aList:
    newList.append(matcher.sub(" foo ", x))

    print newList

    David
     
    dippim, Jan 19, 2010
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ben
    Replies:
    2
    Views:
    936
  2. Replies:
    3
    Views:
    808
    Reedick, Andrew
    Jul 1, 2008
  3. Replies:
    7
    Views:
    879
  4. Lawrence D'Oliveiro

    Death To Sub-Sub-Sub-Directories!

    Lawrence D'Oliveiro, May 5, 2011, in forum: Java
    Replies:
    92
    Views:
    2,128
    Lawrence D'Oliveiro
    May 20, 2011
  5. seven.reeds

    regex multi-line match/replace issue

    seven.reeds, Apr 24, 2006, in forum: Perl Misc
    Replies:
    6
    Views:
    166
    Anno Siegel
    Apr 25, 2006
Loading...

Share This Page