W
Wichert Akkerman
I'm struggling with what should be a trivial problem but I can't seem to
come up with a proper solution: I am working on a CGI that takes utf-8
input from a browser. The input is nicely encoded so you get something
like this:
firstname=t%C3%A9s
where %C3CA9 is a single character in utf-8 encoding. Passing this
through urllib.unquote does not help:
u't%C3%A9st'
The problem turned out to be that urllib.unquote() process processes
its input character by character which breaks when it tries to call
chr() for a character: it gets a character which is not valid ascii
(outside the legal range) or valid unicode (it's only half a utf-8
character) and as a result it fails:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
I can't seem to find a working method to do this conversion correctly.
Can someone point me in the right direction? (and please cc me on
replies since I'm not currently subscribed to this list/newsgroup).
Wichert.
come up with a proper solution: I am working on a CGI that takes utf-8
input from a browser. The input is nicely encoded so you get something
like this:
firstname=t%C3%A9s
where %C3CA9 is a single character in utf-8 encoding. Passing this
through urllib.unquote does not help:
u't%C3%A9st'
The problem turned out to be that urllib.unquote() process processes
its input character by character which breaks when it tries to call
chr() for a character: it gets a character which is not valid ascii
(outside the legal range) or valid unicode (it's only half a utf-8
character) and as a result it fails:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
I can't seem to find a working method to do this conversion correctly.
Can someone point me in the right direction? (and please cc me on
replies since I'm not currently subscribed to this list/newsgroup).
Wichert.