string.replace non-ascii characters

  • Thread starter Samuel Karl Peterson
  • Start date
S

Samuel Karl Peterson

Greetings Pythonistas. I have recently discovered a strange anomoly
with string.replace. It seemingly, randomly does not deal with
characters of ordinal value > 127. I ran into this problem while
downloading auction web pages from ebay and trying to replace the
"\xa0" (dec 160, nbsp char in iso-8859-1) in the string I got from
urllib2. Yet today, all is fine, no problems whatsoever. Sadly, I
did not save the exact error message, but I believe it was a
ValueError thrown on string.replace and the message was something to
the effect "character value not within range(128).

Some googling seemed to indicate other people have reported similar
troubles:

http://mail.python.org/pipermail/python-list/2006-July/391617.html

Anyone have any enlightening advice for me?
 
S

Steven Bethard

Samuel said:
Greetings Pythonistas. I have recently discovered a strange anomoly
with string.replace. It seemingly, randomly does not deal with
characters of ordinal value > 127. I ran into this problem while
downloading auction web pages from ebay and trying to replace the
"\xa0" (dec 160, nbsp char in iso-8859-1) in the string I got from
urllib2. Yet today, all is fine, no problems whatsoever. Sadly, I
did not save the exact error message, but I believe it was a
ValueError thrown on string.replace and the message was something to
the effect "character value not within range(128).

Was it something like this?
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0:
ordinal not in range(128)

You might get that if you're mixing str and unicode. If both strings are
of one type or the other, you should be okay:
''

STeVe
 
S

Samuel Karl Peterson

Was it something like this?

Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position
0: ordinal not in range(128)

Yeah that looks like exactly what was happening, thank you. I wonder
why I had a unicode string though. I thought urllib2 always spat out
a plain string. Oh well.

u'\xa0'.encode('latin-1').replace('\xa0', " ")

Horray.
 
G

Gabriel Genellina

En Mon, 12 Feb 2007 02:38:29 -0300, Samuel Karl Peterson
<[email protected]> escribió:

Sorry to steal the thread! This is only related to your signature:
"if programmers were paid to remove code instead of adding it,
software would be much better" -- unknown

I just did that last week. Around 250 useless lines removed from a 1000
lines module. I think the original coder didn't read the tutorial past the
dictionary examples: *all* functions returned a dictionary or list of
dictionaries! Of course using different names for the same thing here and
there, ugh... I just throw in a few classes and containers, removed all
the nonsensical packing/unpacking of data going back and forth, for a net
decrease of 25% in size (and a great increase in robustness,
maintainability, etc).
If I were paid for the number of lines *written* that would not be a great
deal :)
 
G

Gabriel Genellina

En Mon, 12 Feb 2007 02:38:29 -0300, Samuel Karl Peterson
<[email protected]> escribió:

Sorry to steal the thread! This is only related to your signature:
"if programmers were paid to remove code instead of adding it,
software would be much better" -- unknown

I just did that last week. Around 250 useless lines removed from a 1000
lines module. I think the original coder didn't read the tutorial past the
dictionary examples: *all* functions returned a dictionary or list of
dictionaries! Of course using different names for the same thing here and
there, ugh... I just throw in a few classes and containers, removed all
the nonsensical packing/unpacking of data going back and forth, for a net
decrease of 25% in size (and a great increase in robustness,
maintainability, etc).
If I were paid for the number of lines *written* that would not be a great
deal :)
 
G

Gabriel Genellina

En Mon, 12 Feb 2007 02:38:29 -0300, Samuel Karl Peterson
<[email protected]> escribió:

Sorry to steal the thread! This is only related to your signature:
"if programmers were paid to remove code instead of adding it,
software would be much better" -- unknown

I just did that last week. Around 250 useless lines removed from a 1000
lines module. I think the original coder didn't read the tutorial past the
dictionary examples: *all* functions returned a dictionary or list of
dictionaries! Of course using different names for the same thing here and
there, ugh... I just throw in a few classes and containers, removed all
the nonsensical packing/unpacking of data going back and forth, for a net
decrease of 25% in size (and a great increase in robustness,
maintainability, etc).
If I were paid for the number of lines *written* that would not be a great
deal :)
 
S

Steven D'Aprano

En Mon, 12 Feb 2007 02:38:29 -0300, Samuel Karl Peterson
<[email protected]> escribió:

Sorry to steal the thread! This is only related to your signature:


I just did that last week. Around 250 useless lines removed from a 1000
lines module.
[snip]

Hot out of uni, my first programming job was assisting a consultant who
was writing an application in Apple's "Hypertalk", a so-called "fourth
generation language" with an English-like syntax, aimed at non-programmers.

Virtually the first thing I did was refactor part of his code that looked
something like this:

set the name of button id 1 to 1
set the name of button id 2 to 2
set the name of button id 3 to 3
....
set the name of button id 399 to 399
set the name of button id 400 to 400


into something like this:

for i = 1 to 400:
set the name of button id i to i
 
D

Duncan Booth

Gabriel Genellina said:
If I were paid for the number of lines *written* that would not be a
great deal :)

You don't by any chance get paid by the number of posts to c.l.python?
 
J

John Machin

I was thinking the same thing.

O maker of the monstrous millisecond-muncher, I was thinking that you
were paid by the number of times that you typed 3600000 :)
 
G

Gabriel Genellina

En Mon, 12 Feb 2007 07:44:14 -0300, Duncan Booth
You don't by any chance get paid by the number of posts to c.l.python?

I post a few messages but certainly I'm not the most prolific poster here!
 

Members online

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,119
Latest member
IrmaNorcro
Top