string stripping issues

Discussion in 'Python' started by orangeDinosaur, Mar 2, 2006.

  1. Hello,

    I am encountering a behavior I can think of reason for. Sometimes,
    when I use the .strip module for strings, it takes away more than what
    I've specified. For example:

    returns:

    'ughes. John</FONT></TD>\r\n'

    However, if I take another string, for example:

    returns:

    'Kim, Dong-Hyun</FONT></TD>\r\n'

    I don't understand why in one case it eats up the 'H' but in the next
    case it leaves the 'K' alone.
     
    orangeDinosaur, Mar 2, 2006
    #1
    1. Advertisements


  2. That method... I do not think it means what you think it means. The
    argument to str.strip is a *set* of characters, e.g.:
    'XabbaX'

    For more info, see the string method docs:
    http://docs.python.org/lib/string-methods.html
    To do what you're trying to do, try this:
    'world!'

    --Ben
     
    Ben Cartwright, Mar 2, 2006
    #2
    1. Advertisements

  3. from the python manual:

    strip( [chars])
    The chars argument is not a prefix or suffix; rather, all combinations
    of its values are stripped: 'example'

    in your case since the letter 'H' is in your [chars] and the name
    starts with an H it gets stripped, but with the second one the first
    letter is a K so it stops there.
    Maybe you can use:
    'Kim, Dong-Hyun</FONT></TD>\r\n'

    but maybe what you REALLY want is:
    'Kim, Dong-Hyun'
     
    =?iso-8859-1?B?aWFuYXLp?=, Mar 2, 2006
    #3

  4. Apologies, that should be:
    'world!'

    --Ben
     
    Ben Cartwright, Mar 2, 2006
    #4
  5. thanks!
     
    orangeDinosaur, Mar 2, 2006
    #5
  6. orangeDinosaur

    P Boy Guest

    This seems like a web page parsing question. Another approach can be as
    follows if you know the limiting token strings:

    a.split(' <TD WIDTH=175><FONT
    SIZE=2>')[1].split('</FONT></TD>\r\n')[0]
     
    P Boy, Mar 3, 2006
    #6
  7. orangeDinosaur

    Iain King Guest

    or instead of:

    a.strip(' <TD WIDTH=175><FONT SIZE=2>')

    use:

    a.replace(' <TD WIDTH=175><FONT SIZE=2>','')

    Iain
     
    Iain King, Mar 3, 2006
    #7
  8. orangeDinosaur

    Larry Bates Guest

    Others have explained the exact problem, I'll make a suggestion.
    Take a few minutes to look at BeautifulSoup. It parses HTML code
    and allows for extractions of data from strings like this in a
    very easy to use way. If this is a one-off thing, don't bother.
    If you do this commonly, BeautifulSoup is worth a little study.

    -Larry Bates
     
    Larry Bates, Mar 3, 2006
    #8
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.