How to dump a Python 2.6 dictionary with UTF-8 strings?


W

W. Martin Borgert

Hi,

naively, I thought the following code:

#!/usr/bin/env python2.6
# -*- coding: utf-8 -*-
import codecs
d = { u'key': u'我爱中国人' }
if __name__ == "__main__":
with codecs.open("ilike.txt", "w", "utf-8") as f:
print >>f, d

would produce a file ilike.txt like this:

{u'key': u'我爱中国人'}

But unfortunately, it results in:

{u'key': u'\u6211\u7231\u4e2d\u56fd\u4eba'}

What's the right way to get the strings in UTF-8?

Thanks in advance!
 
Ad

Advertisements

M

Martin v. Loewis

What's the right way to get the strings in UTF-8?

This will work. I doubt you can get it much simpler
in 2.x; in 3.x, your code will work out of the box
(with proper syntactical adjustments).

import pprint, cStringIO

class UniPrinter(pprint.PrettyPrinter):
def format(self, obj, context, maxlevels, level):
if not isinstance(obj, unicode):
return pprint.PrettyPrinter.format(self, obj,
context,
maxlevels,
level)
out = cStringIO.StringIO()
out.write('u"')
for c in obj:
if ord(c)<32 or c in u'"\\':
out.write('\\x%.2x' % ord(c))
else:
out.write(c.encode("utf-8"))
out.write('"')
# result, readable, recursive
return out.getvalue(), True, False

UniPrinter().pprint({ u'k"e\\y': u'我爱中国人' })
 
A

Alex Willmer

Hi,

naively, I thought the following code:

#!/usr/bin/env python2.6
# -*- coding: utf-8 -*-
import codecs
d = { u'key': u'我爱中国人' }
if __name__ == "__main__":
    with codecs.open("ilike.txt", "w", "utf-8") as f:
        print >>f, d

would produce a file ilike.txt like this:

{u'key': u'我爱中国人'}

But unfortunately, it results in:

{u'key': u'\u6211\u7231\u4e2d\u56fd\u4eba'}

What's the right way to get the strings in UTF-8?

Thanks in advance!

It has worked, you're just seeing how python presents unicode
characters in the interactive interpreter:

Python 2.7.1+ (r271:86832, Dec 24 2010, 10:04:43)
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
x = {u'key': u'\u6211\u7231\u4e2d\u56fd\u4eba'}
x {u'key': u'\u6211\u7231\u4e2d\u56fd\u4eba'}
print x {u'key': u'\u6211\u7231\u4e2d\u56fd\u4eba'}
print x['key']
我爱中国人

That last line only works if your terminal uses an suitable encoding
(e.g. utf-8).

Regards, Alex
 
Ad

Advertisements

W

W. Martin Borgert

This will work. I doubt you can get it much simpler
in 2.x; in 3.x, your code will work out of the box
(with proper syntactical adjustments).

Thanks, this works like a charm. I tried pprint before for this
task and failed. Now I know why :~)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top