string stripping issues

O

orangeDinosaur

Hello,

I am encountering a behavior I can think of reason for. Sometimes,
when I use the .strip module for strings, it takes away more than what
I've specified. For example:

returns:

'ughes. John</FONT></TD>\r\n'

However, if I take another string, for example:

returns:

'Kim, Dong-Hyun</FONT></TD>\r\n'

I don't understand why in one case it eats up the 'H' but in the next
case it leaves the 'K' alone.
 
B

Ben Cartwright

orangeDinosaur said:
I am encountering a behavior I can think of reason for. Sometimes,
when I use the .strip module for strings, it takes away more than what
I've specified. For example:


returns:

'ughes. John</FONT></TD>\r\n'

However, if I take another string, for example:


returns:

'Kim, Dong-Hyun</FONT></TD>\r\n'

I don't understand why in one case it eats up the 'H' but in the next
case it leaves the 'K' alone.


That method... I do not think it means what you think it means. The
argument to str.strip is a *set* of characters, e.g.:
'XabbaX'

For more info, see the string method docs:
http://docs.python.org/lib/string-methods.html
To do what you're trying to do, try this:
>>> prefix = 'hello '
>>> bar = 'hello world!'
>>> if bar.startswith(prefix): bar = bar[:len(prefix)] ...
>>> bar
'world!'

--Ben
 
?

=?iso-8859-1?B?aWFuYXLp?=

from the python manual:

strip( [chars])
The chars argument is not a prefix or suffix; rather, all combinations
of its values are stripped: 'example'

in your case since the letter 'H' is in your [chars] and the name
starts with an H it gets stripped, but with the second one the first
letter is a K so it stops there.
Maybe you can use:
'Kim, Dong-Hyun</FONT></TD>\r\n'

but maybe what you REALLY want is:
a[31:-14] 'Hughes. John'
b[31:-14]
'Kim, Dong-Hyun'
 
B

Ben Cartwright

Ben said:
orangeDinosaur said:
I am encountering a behavior I can think of reason for. Sometimes,
when I use the .strip module for strings, it takes away more than what
I've specified. For example:



returns:

'ughes. John</FONT></TD>\r\n'

However, if I take another string, for example:



returns:

'Kim, Dong-Hyun</FONT></TD>\r\n'

I don't understand why in one case it eats up the 'H' but in the next
case it leaves the 'K' alone.


That method... I do not think it means what you think it means. The
argument to str.strip is a *set* of characters, e.g.:
'XabbaX'

For more info, see the string method docs:
http://docs.python.org/lib/string-methods.html
To do what you're trying to do, try this:
prefix = 'hello '
bar = 'hello world!'
if bar.startswith(prefix): bar = bar[:len(prefix)] ...
bar
'world!'


Apologies, that should be:
>>> prefix = 'hello '
>>> bar = 'hello world!'
>>> if bar.startswith(prefix): bar = bar[len(prefix):] ...
>>> bar
'world!'

--Ben
 
P

P Boy

This seems like a web page parsing question. Another approach can be as
follows if you know the limiting token strings:

a.split(' <TD WIDTH=175><FONT
SIZE=2>')[1].split('</FONT></TD>\r\n')[0]
 
I

Iain King

Ben said:
Ben said:
orangeDinosaur said:
I am encountering a behavior I can think of reason for. Sometimes,
when I use the .strip module for strings, it takes away more than what
I've specified. For example:

a = ' <TD WIDTH=175><FONT SIZE=2>Hughes. John</FONT></TD>\r\n'

a.strip(' <TD WIDTH=175><FONT SIZE=2>')

returns:

'ughes. John</FONT></TD>\r\n'

However, if I take another string, for example:

b = ' <TD WIDTH=175><FONT SIZE=2>Kim, Dong-Hyun</FONT></TD>\r\n'

b.strip(' <TD WIDTH=175><FONT SIZE=2>')

returns:

'Kim, Dong-Hyun</FONT></TD>\r\n'

I don't understand why in one case it eats up the 'H' but in the next
case it leaves the 'K' alone.


That method... I do not think it means what you think it means. The
argument to str.strip is a *set* of characters, e.g.:
foo = 'abababaXabbaXabababbbb'
foo.strip('ab') 'XabbaX'
foo.strip('aabababaab') # no difference!
'XabbaX'

For more info, see the string method docs:
http://docs.python.org/lib/string-methods.html
To do what you're trying to do, try this:
prefix = 'hello '
bar = 'hello world!'
if bar.startswith(prefix): bar = bar[:len(prefix)] ...
bar
'world!'


Apologies, that should be:
prefix = 'hello '
bar = 'hello world!'
if bar.startswith(prefix): bar = bar[len(prefix):] ...
bar
'world!'

or instead of:

a.strip(' <TD WIDTH=175><FONT SIZE=2>')

use:

a.replace(' <TD WIDTH=175><FONT SIZE=2>','')

Iain
 
L

Larry Bates

orangeDinosaur said:
Hello,

I am encountering a behavior I can think of reason for. Sometimes,
when I use the .strip module for strings, it takes away more than what
I've specified. For example:


returns:

'ughes. John</FONT></TD>\r\n'

However, if I take another string, for example:


returns:

'Kim, Dong-Hyun</FONT></TD>\r\n'

I don't understand why in one case it eats up the 'H' but in the next
case it leaves the 'K' alone.
Others have explained the exact problem, I'll make a suggestion.
Take a few minutes to look at BeautifulSoup. It parses HTML code
and allows for extractions of data from strings like this in a
very easy to use way. If this is a one-off thing, don't bother.
If you do this commonly, BeautifulSoup is worth a little study.

-Larry Bates
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top