string stripping issues

Discussion in 'Python' started by orangeDinosaur, Mar 2, 2006.

  1. Hello,

    I am encountering a behavior I can think of reason for. Sometimes,
    when I use the .strip module for strings, it takes away more than what
    I've specified. For example:

    >>> a = ' <TD WIDTH=175><FONT SIZE=2>Hughes. John</FONT></TD>\r\n'


    >>> a.strip(' <TD WIDTH=175><FONT SIZE=2>')


    returns:

    'ughes. John</FONT></TD>\r\n'

    However, if I take another string, for example:

    >>> b = ' <TD WIDTH=175><FONT SIZE=2>Kim, Dong-Hyun</FONT></TD>\r\n'


    >>> b.strip(' <TD WIDTH=175><FONT SIZE=2>')


    returns:

    'Kim, Dong-Hyun</FONT></TD>\r\n'

    I don't understand why in one case it eats up the 'H' but in the next
    case it leaves the 'K' alone.
     
    orangeDinosaur, Mar 2, 2006
    #1
    1. Advertising

  2. orangeDinosaur wrote:
    > I am encountering a behavior I can think of reason for. Sometimes,
    > when I use the .strip module for strings, it takes away more than what
    > I've specified. For example:
    >
    > >>> a = ' <TD WIDTH=175><FONT SIZE=2>Hughes. John</FONT></TD>\r\n'

    >
    > >>> a.strip(' <TD WIDTH=175><FONT SIZE=2>')

    >
    > returns:
    >
    > 'ughes. John</FONT></TD>\r\n'
    >
    > However, if I take another string, for example:
    >
    > >>> b = ' <TD WIDTH=175><FONT SIZE=2>Kim, Dong-Hyun</FONT></TD>\r\n'

    >
    > >>> b.strip(' <TD WIDTH=175><FONT SIZE=2>')

    >
    > returns:
    >
    > 'Kim, Dong-Hyun</FONT></TD>\r\n'
    >
    > I don't understand why in one case it eats up the 'H' but in the next
    > case it leaves the 'K' alone.



    That method... I do not think it means what you think it means. The
    argument to str.strip is a *set* of characters, e.g.:

    >>> foo = 'abababaXabbaXabababbbb'
    >>> foo.strip('ab')

    'XabbaX'
    >>> foo.strip('aabababaab') # no difference!

    'XabbaX'

    For more info, see the string method docs:
    http://docs.python.org/lib/string-methods.html
    To do what you're trying to do, try this:

    >>> prefix = 'hello '
    >>> bar = 'hello world!'
    >>> if bar.startswith(prefix): bar = bar[:len(prefix)]

    ...
    >>> bar

    'world!'

    --Ben
     
    Ben Cartwright, Mar 2, 2006
    #2
    1. Advertising

  3. from the python manual:

    strip( [chars])
    The chars argument is not a prefix or suffix; rather, all combinations
    of its values are stripped:
    >>> 'www.example.com'.strip('cmowz.')

    'example'

    in your case since the letter 'H' is in your [chars] and the name
    starts with an H it gets stripped, but with the second one the first
    letter is a K so it stops there.
    Maybe you can use:

    >>> a[31:]

    'Hughes. John</FONT></TD>\r\n'
    >>> b[31:]

    'Kim, Dong-Hyun</FONT></TD>\r\n'

    but maybe what you REALLY want is:

    >>> a[31:-14]

    'Hughes. John'
    >>> b[31:-14]

    'Kim, Dong-Hyun'
     
    =?iso-8859-1?B?aWFuYXLp?=, Mar 2, 2006
    #3
  4. Ben Cartwright wrote:
    > orangeDinosaur wrote:
    > > I am encountering a behavior I can think of reason for. Sometimes,
    > > when I use the .strip module for strings, it takes away more than what
    > > I've specified. For example:
    > >
    > > >>> a = ' <TD WIDTH=175><FONT SIZE=2>Hughes. John</FONT></TD>\r\n'

    > >
    > > >>> a.strip(' <TD WIDTH=175><FONT SIZE=2>')

    > >
    > > returns:
    > >
    > > 'ughes. John</FONT></TD>\r\n'
    > >
    > > However, if I take another string, for example:
    > >
    > > >>> b = ' <TD WIDTH=175><FONT SIZE=2>Kim, Dong-Hyun</FONT></TD>\r\n'

    > >
    > > >>> b.strip(' <TD WIDTH=175><FONT SIZE=2>')

    > >
    > > returns:
    > >
    > > 'Kim, Dong-Hyun</FONT></TD>\r\n'
    > >
    > > I don't understand why in one case it eats up the 'H' but in the next
    > > case it leaves the 'K' alone.

    >
    >
    > That method... I do not think it means what you think it means. The
    > argument to str.strip is a *set* of characters, e.g.:
    >
    > >>> foo = 'abababaXabbaXabababbbb'
    > >>> foo.strip('ab')

    > 'XabbaX'
    > >>> foo.strip('aabababaab') # no difference!

    > 'XabbaX'
    >
    > For more info, see the string method docs:
    > http://docs.python.org/lib/string-methods.html
    > To do what you're trying to do, try this:
    >
    > >>> prefix = 'hello '
    > >>> bar = 'hello world!'
    > >>> if bar.startswith(prefix): bar = bar[:len(prefix)]

    > ...
    > >>> bar

    > 'world!'



    Apologies, that should be:
    >>> prefix = 'hello '
    >>> bar = 'hello world!'
    >>> if bar.startswith(prefix): bar = bar[len(prefix):]

    ...
    >>> bar

    'world!'

    --Ben
     
    Ben Cartwright, Mar 2, 2006
    #4
  5. thanks!
     
    orangeDinosaur, Mar 2, 2006
    #5
  6. orangeDinosaur

    P Boy Guest

    This seems like a web page parsing question. Another approach can be as
    follows if you know the limiting token strings:

    a.split(' <TD WIDTH=175><FONT
    SIZE=2>')[1].split('</FONT></TD>\r\n')[0]
     
    P Boy, Mar 3, 2006
    #6
  7. orangeDinosaur

    Iain King Guest

    Ben Cartwright wrote:
    > Ben Cartwright wrote:
    > > orangeDinosaur wrote:
    > > > I am encountering a behavior I can think of reason for. Sometimes,
    > > > when I use the .strip module for strings, it takes away more than what
    > > > I've specified. For example:
    > > >
    > > > >>> a = ' <TD WIDTH=175><FONT SIZE=2>Hughes. John</FONT></TD>\r\n'
    > > >
    > > > >>> a.strip(' <TD WIDTH=175><FONT SIZE=2>')
    > > >
    > > > returns:
    > > >
    > > > 'ughes. John</FONT></TD>\r\n'
    > > >
    > > > However, if I take another string, for example:
    > > >
    > > > >>> b = ' <TD WIDTH=175><FONT SIZE=2>Kim, Dong-Hyun</FONT></TD>\r\n'
    > > >
    > > > >>> b.strip(' <TD WIDTH=175><FONT SIZE=2>')
    > > >
    > > > returns:
    > > >
    > > > 'Kim, Dong-Hyun</FONT></TD>\r\n'
    > > >
    > > > I don't understand why in one case it eats up the 'H' but in the next
    > > > case it leaves the 'K' alone.

    > >
    > >
    > > That method... I do not think it means what you think it means. The
    > > argument to str.strip is a *set* of characters, e.g.:
    > >
    > > >>> foo = 'abababaXabbaXabababbbb'
    > > >>> foo.strip('ab')

    > > 'XabbaX'
    > > >>> foo.strip('aabababaab') # no difference!

    > > 'XabbaX'
    > >
    > > For more info, see the string method docs:
    > > http://docs.python.org/lib/string-methods.html
    > > To do what you're trying to do, try this:
    > >
    > > >>> prefix = 'hello '
    > > >>> bar = 'hello world!'
    > > >>> if bar.startswith(prefix): bar = bar[:len(prefix)]

    > > ...
    > > >>> bar

    > > 'world!'

    >
    >
    > Apologies, that should be:
    > >>> prefix = 'hello '
    > >>> bar = 'hello world!'
    > >>> if bar.startswith(prefix): bar = bar[len(prefix):]

    > ...
    > >>> bar

    > 'world!'
    >


    or instead of:

    a.strip(' <TD WIDTH=175><FONT SIZE=2>')

    use:

    a.replace(' <TD WIDTH=175><FONT SIZE=2>','')

    Iain
     
    Iain King, Mar 3, 2006
    #7
  8. orangeDinosaur

    Larry Bates Guest

    orangeDinosaur wrote:
    > Hello,
    >
    > I am encountering a behavior I can think of reason for. Sometimes,
    > when I use the .strip module for strings, it takes away more than what
    > I've specified. For example:
    >
    >>>> a = ' <TD WIDTH=175><FONT SIZE=2>Hughes. John</FONT></TD>\r\n'

    >
    >>>> a.strip(' <TD WIDTH=175><FONT SIZE=2>')

    >
    > returns:
    >
    > 'ughes. John</FONT></TD>\r\n'
    >
    > However, if I take another string, for example:
    >
    >>>> b = ' <TD WIDTH=175><FONT SIZE=2>Kim, Dong-Hyun</FONT></TD>\r\n'

    >
    >>>> b.strip(' <TD WIDTH=175><FONT SIZE=2>')

    >
    > returns:
    >
    > 'Kim, Dong-Hyun</FONT></TD>\r\n'
    >
    > I don't understand why in one case it eats up the 'H' but in the next
    > case it leaves the 'K' alone.
    >

    Others have explained the exact problem, I'll make a suggestion.
    Take a few minutes to look at BeautifulSoup. It parses HTML code
    and allows for extractions of data from strings like this in a
    very easy to use way. If this is a one-off thing, don't bother.
    If you do this commonly, BeautifulSoup is worth a little study.

    -Larry Bates
     
    Larry Bates, Mar 3, 2006
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jeff Epler

    Re: stripping a string

    Jeff Epler, Sep 14, 2003, in forum: Python
    Replies:
    4
    Views:
    376
    Tim Williams
    Sep 16, 2003
  2. Leeds, Mark

    stripping a string

    Leeds, Mark, Sep 13, 2003, in forum: Python
    Replies:
    1
    Views:
    345
    Uwe Schmitt
    Sep 16, 2003
  3. joram gemma

    string.lstrip stripping too much?

    joram gemma, May 15, 2005, in forum: Python
    Replies:
    4
    Views:
    414
    M.E.Farmer
    May 16, 2005
  4. Edward Elliott

    stripping unwanted chars from string

    Edward Elliott, May 4, 2006, in forum: Python
    Replies:
    7
    Views:
    434
    Alex Martelli
    May 4, 2006
  5. Raj
    Replies:
    7
    Views:
    1,002
    Joe Wright
    May 11, 2005
Loading...

Share This Page