Convert unicode escape sequences to unicode in a file

Jeremy · Jan 11, 2011

I have a file that has unicode escape sequences, i.e.,

J\u00e9r\u00f4me

and I want to replace all of them in a file and write the results to a new file. The simple script I've created is copied below. However, I am getting the following error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 947: ordinal not in range(128)

It appears that the data isn't being converted when writing to the file. Can someone please help?

Thanks,
Jeremy

if __name__ == "__main__":
f = codecs.open(filename, 'r', 'unicode-escape')
lines = f.readlines()
line = ''.join(lines)
f.close()

utFound = re.sub('STRINGDECODE\((.+?)\)', r'\1', line)
print(utFound[:1000])

o = open('newDice.sql', 'w')
o.write(utFound.decode('utf-8'))
o.close()

Alex Willmer · Jan 11, 2011

I have a file that has unicode escape sequences, i.e.,

J\u00e9r\u00f4me

and I want to replace all of them in a file and write the results to a new file. The simple script I've created is copied below. However, I am getting the following error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 947: ordinal not in range(128)

It appears that the data isn't being converted when writing to the file. Can someone please help?

Are you _sure_ that your file contains the characters '\', 'u', '0',
'0', 'e' and '9'? I expect that actually your file contains a byte
with value 0xe9 and you have inspected the file using Python, which
has printed the byte using a Unicode escape sequence. Open the file
using a text editor or hex editor and look at the value at offset 947
to be sure.

If so, you need to replace 'unicode-escape' with the actual encoding
of the file.

if __name__ == "__main__":
f = codecs.open(filename, 'r', 'unicode-escape')
lines = f.readlines()
line = ''.join(lines)
f.close()

utFound = re.sub('STRINGDECODE\((.+?)\)', r'\1', line)
print(utFound[:1000])

o = open('newDice.sql', 'w')
o.write(utFound.decode('utf-8'))
o.close()

Reversing backslashed escape sequences	3	Jul 1, 2010
Ascii to Unicode.	4	Jul 28, 2010
Windows XP unicode and escape sequences	2	Dec 12, 2007
Unicode in writing to a file	4	Apr 23, 2009
Right solution to unicode error?	21	Nov 7, 2012
how to write a unicode string to a file ?	0	Oct 16, 2009
pexpect and unicode strings	1	Sep 5, 2009
Python 3.3, gettext and Unicode problems	0	Dec 31, 2012

Convert unicode escape sequences to unicode in a file

Jeremy

Alex Willmer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads