Interpreting string containing \u000a

F

Francis Girard

Hi,

I have an ISO-8859-1 file containing things like
"Hello\u000d\u000aWorld", i.e. the character '\', followed by the
character 'u' and then '0', etc.

What is the easiest way to automatically translate these codes into
unicode characters ?

Thank you

Francis Girard
 
P

Peter Otten

Francis said:
I have an ISO-8859-1 file containing things like
"Hello\u000d\u000aWorld", i.e. the character '\', followed by the
character 'u' and then '0', etc.

What is the easiest way to automatically translate these codes into
unicode characters ?

If the file really contains the escape sequences use "unicode-escape" as the
encoding:
u'Hello\r\nWorld'

If it contains the raw bytes use "iso-8859-1":
u'Hello\r\nWorld'

Open the file with

codecs.open(filename, encoding=encoding_as_determined_above)

instead of the builtin open().

Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top