utf8 encoding problem

W

Wichert Akkerman

I'm struggling with what should be a trivial problem but I can't seem to
come up with a proper solution: I am working on a CGI that takes utf-8
input from a browser. The input is nicely encoded so you get something
like this:

firstname=t%C3%A9s

where %C3CA9 is a single character in utf-8 encoding. Passing this
through urllib.unquote does not help:
u't%C3%A9st'

The problem turned out to be that urllib.unquote() process processes
its input character by character which breaks when it tries to call
chr() for a character: it gets a character which is not valid ascii
(outside the legal range) or valid unicode (it's only half a utf-8
character) and as a result it fails:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)


I can't seem to find a working method to do this conversion correctly.
Can someone point me in the right direction? (and please cc me on
replies since I'm not currently subscribed to this list/newsgroup).

Wichert.
 
E

Erik Max Francis

Wichert said:
I'm struggling with what should be a trivial problem but I can't seem
to
come up with a proper solution: I am working on a CGI that takes utf-8
input from a browser. The input is nicely encoded so you get something
like this:

firstname=t%C3%A9s

where %C3CA9 is a single character in utf-8 encoding. Passing this
through urllib.unquote does not help:

u't%C3%A9st'

Unquote it as a normal string, then convert it to Unicode.
u't\xe9s'
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top