Double decoding of strings??

manuzhai · Dec 5, 2005

Hi all,

I have a bit of a problem. I'm trying to use Python to work with some
data which turns out to be garbage. Ultimately, I think the solution
will be to .decode('utf-8') a string twice, but Python doesn't like
doing this the second time. That could possibly be understandable, but
then why does the unicode object have a .decode() method at all?

I get 'WVL Algemeen Altru\xc3\x83\xc2\xafsme genormeerd Afbeelden' at
first.
I .decode('utf-8') this to u'WVL Algemeen Altru\xc3\xafsme genormeerd
Afbeelden'.
I then try to .decode('utf-8') this again, but that gives an error:

Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "C:\Program Files\Python\lib\encodings\utf_8.py", line 16, in
decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position
18-19: ordinal not in range(128)

If I copy/paste 'WVL Algemeen Altru\xc3\xafsme genormeerd Afbeelden'
and try to .decode('utf-8') it, that works fine, and it gets me the
result I want, which is u'WVL Algemeen Altru\xefsme genormeerd
Afbeelden'.

Why does it work this way? How can I make it work?

Regards,

Manuzhai

Peter Otten · Dec 5, 2005

Ultimately, I think the solution will be to .decode('utf-8') a string
twice
Try

"Altru\xc3\x83\xc2\xafsme".decode("utf8").encode("latin1").decode("utf8")
u'Altru\xefsme'

Altruïsme

Peter

MeCab UTF-8 Decoding Problem	6	Jun 29, 2013
Encoding/decoding: Still don't get it :-/	4	Mar 13, 2009
Question of UTF16BE encoding / decoding	2	May 5, 2009
[UnicodeEncodeError] Don't know what else to try	7	Nov 14, 2008
[email protected]	0	Jan 14, 2014
Encoding trouble when script called from application	0	Jan 14, 2014
logging of strings with broken encoding	8	Jul 2, 2009
Trouble with UnicodeEncodeError and email	0	Jan 8, 2014

Double decoding of strings??

manuzhai

Peter Otten

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads