Double decoding of strings??

M

manuzhai

Hi all,

I have a bit of a problem. I'm trying to use Python to work with some
data which turns out to be garbage. Ultimately, I think the solution
will be to .decode('utf-8') a string twice, but Python doesn't like
doing this the second time. That could possibly be understandable, but
then why does the unicode object have a .decode() method at all?

I get 'WVL Algemeen Altru\xc3\x83\xc2\xafsme genormeerd Afbeelden' at
first.
I .decode('utf-8') this to u'WVL Algemeen Altru\xc3\xafsme genormeerd
Afbeelden'.
I then try to .decode('utf-8') this again, but that gives an error:

Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "C:\Program Files\Python\lib\encodings\utf_8.py", line 16, in
decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position
18-19: ordinal not in range(128)

If I copy/paste 'WVL Algemeen Altru\xc3\xafsme genormeerd Afbeelden'
and try to .decode('utf-8') it, that works fine, and it gets me the
result I want, which is u'WVL Algemeen Altru\xefsme genormeerd
Afbeelden'.

Why does it work this way? How can I make it work?

Regards,

Manuzhai
 
P

Peter Otten

Ultimately, I think the solution will be to .decode('utf-8') a string
twice
Try

"Altru\xc3\x83\xc2\xafsme".decode("utf8").encode("latin1").decode("utf8")
u'Altru\xefsme'
Altruïsme

Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,206
Latest member
SybilSchil

Latest Threads

Top