Interpreting string containing \u000a

Francis Girard · Jun 18, 2008

Hi,

I have an ISO-8859-1 file containing things like
"Hello\u000d\u000aWorld", i.e. the character '\', followed by the
character 'u' and then '0', etc.

What is the easiest way to automatically translate these codes into
unicode characters ?

Thank you

Francis Girard

Peter Otten · Jun 18, 2008

Francis said:
I have an ISO-8859-1 file containing things like
"Hello\u000d\u000aWorld", i.e. the character '\', followed by the
character 'u' and then '0', etc.

What is the easiest way to automatically translate these codes into
unicode characters ?

If the file really contains the escape sequences use "unicode-escape" as the
encoding:
u'Hello\r\nWorld'

If it contains the raw bytes use "iso-8859-1":
u'Hello\r\nWorld'

Open the file with

codecs.open(filename, encoding=encoding_as_determined_above)

instead of the builtin open().

Peter

Unicode raw string containing \u	3	Oct 28, 2007
Removing lines containing same first string boundaries?	31	Mar 17, 2014
printing list containing unicode string	8	Sep 10, 2007
Can't solve problems! please Help	0	Sep 26, 2022
Tasks	1	Nov 29, 2022
Flexible string representation, unicode, typography, ...	94	Aug 23, 2012
Replace every n instances of a string	1	Aug 15, 2003
Special chars with HTMLParser	4	Aug 5, 2009

Interpreting string containing \u000a

Francis Girard

Peter Otten

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads