string.replace non-ascii characters

  • Thread starter Samuel Karl Peterson
  • Start date
S

Samuel Karl Peterson

Greetings Pythonistas. I have recently discovered a strange anomoly
with string.replace. It seemingly, randomly does not deal with
characters of ordinal value > 127. I ran into this problem while
downloading auction web pages from ebay and trying to replace the
"\xa0" (dec 160, nbsp char in iso-8859-1) in the string I got from
urllib2. Yet today, all is fine, no problems whatsoever. Sadly, I
did not save the exact error message, but I believe it was a
ValueError thrown on string.replace and the message was something to
the effect "character value not within range(128).

Some googling seemed to indicate other people have reported similar
troubles:

http://mail.python.org/pipermail/python-list/2006-July/391617.html

Anyone have any enlightening advice for me?
 
S

Steven Bethard

Samuel said:
Greetings Pythonistas. I have recently discovered a strange anomoly
with string.replace. It seemingly, randomly does not deal with
characters of ordinal value > 127. I ran into this problem while
downloading auction web pages from ebay and trying to replace the
"\xa0" (dec 160, nbsp char in iso-8859-1) in the string I got from
urllib2. Yet today, all is fine, no problems whatsoever. Sadly, I
did not save the exact error message, but I believe it was a
ValueError thrown on string.replace and the message was something to
the effect "character value not within range(128).

Was it something like this?
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0:
ordinal not in range(128)

You might get that if you're mixing str and unicode. If both strings are
of one type or the other, you should be okay:
''

STeVe
 
S

Samuel Karl Peterson

Was it something like this?

Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position
0: ordinal not in range(128)

Yeah that looks like exactly what was happening, thank you. I wonder
why I had a unicode string though. I thought urllib2 always spat out
a plain string. Oh well.

u'\xa0'.encode('latin-1').replace('\xa0', " ")

Horray.
 
G

Gabriel Genellina

En Mon, 12 Feb 2007 02:38:29 -0300, Samuel Karl Peterson
<[email protected]> escribió:

Sorry to steal the thread! This is only related to your signature:
"if programmers were paid to remove code instead of adding it,
software would be much better" -- unknown

I just did that last week. Around 250 useless lines removed from a 1000
lines module. I think the original coder didn't read the tutorial past the
dictionary examples: *all* functions returned a dictionary or list of
dictionaries! Of course using different names for the same thing here and
there, ugh... I just throw in a few classes and containers, removed all
the nonsensical packing/unpacking of data going back and forth, for a net
decrease of 25% in size (and a great increase in robustness,
maintainability, etc).
If I were paid for the number of lines *written* that would not be a great
deal :)
 
G

Gabriel Genellina

En Mon, 12 Feb 2007 02:38:29 -0300, Samuel Karl Peterson
<[email protected]> escribió:

Sorry to steal the thread! This is only related to your signature:
"if programmers were paid to remove code instead of adding it,
software would be much better" -- unknown

I just did that last week. Around 250 useless lines removed from a 1000
lines module. I think the original coder didn't read the tutorial past the
dictionary examples: *all* functions returned a dictionary or list of
dictionaries! Of course using different names for the same thing here and
there, ugh... I just throw in a few classes and containers, removed all
the nonsensical packing/unpacking of data going back and forth, for a net
decrease of 25% in size (and a great increase in robustness,
maintainability, etc).
If I were paid for the number of lines *written* that would not be a great
deal :)
 
G

Gabriel Genellina

En Mon, 12 Feb 2007 02:38:29 -0300, Samuel Karl Peterson
<[email protected]> escribió:

Sorry to steal the thread! This is only related to your signature:
"if programmers were paid to remove code instead of adding it,
software would be much better" -- unknown

I just did that last week. Around 250 useless lines removed from a 1000
lines module. I think the original coder didn't read the tutorial past the
dictionary examples: *all* functions returned a dictionary or list of
dictionaries! Of course using different names for the same thing here and
there, ugh... I just throw in a few classes and containers, removed all
the nonsensical packing/unpacking of data going back and forth, for a net
decrease of 25% in size (and a great increase in robustness,
maintainability, etc).
If I were paid for the number of lines *written* that would not be a great
deal :)
 
S

Steven D'Aprano

En Mon, 12 Feb 2007 02:38:29 -0300, Samuel Karl Peterson
<[email protected]> escribió:

Sorry to steal the thread! This is only related to your signature:


I just did that last week. Around 250 useless lines removed from a 1000
lines module.
[snip]

Hot out of uni, my first programming job was assisting a consultant who
was writing an application in Apple's "Hypertalk", a so-called "fourth
generation language" with an English-like syntax, aimed at non-programmers.

Virtually the first thing I did was refactor part of his code that looked
something like this:

set the name of button id 1 to 1
set the name of button id 2 to 2
set the name of button id 3 to 3
....
set the name of button id 399 to 399
set the name of button id 400 to 400


into something like this:

for i = 1 to 400:
set the name of button id i to i
 
D

Duncan Booth

Gabriel Genellina said:
If I were paid for the number of lines *written* that would not be a
great deal :)

You don't by any chance get paid by the number of posts to c.l.python?
 
J

John Machin

I was thinking the same thing.

O maker of the monstrous millisecond-muncher, I was thinking that you
were paid by the number of times that you typed 3600000 :)
 
G

Gabriel Genellina

En Mon, 12 Feb 2007 07:44:14 -0300, Duncan Booth
You don't by any chance get paid by the number of posts to c.l.python?

I post a few messages but certainly I'm not the most prolific poster here!
 

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,265
Latest member
TodLarocca

Latest Threads

Top