How to dump a Python 2.6 dictionary with UTF-8 strings?

Discussion in 'Python' started by W. Martin Borgert, Jan 11, 2011.

  1. Hi,

    naively, I thought the following code:

    #!/usr/bin/env python2.6
    # -*- coding: utf-8 -*-
    import codecs
    d = { u'key': u'我爱中国人' }
    if __name__ == "__main__":
    with codecs.open("ilike.txt", "w", "utf-8") as f:
    print >>f, d

    would produce a file ilike.txt like this:

    {u'key': u'我爱中国人'}

    But unfortunately, it results in:

    {u'key': u'\u6211\u7231\u4e2d\u56fd\u4eba'}

    What's the right way to get the strings in UTF-8?

    Thanks in advance!
     
    W. Martin Borgert, Jan 11, 2011
    #1
    1. Advertising

  2. > What's the right way to get the strings in UTF-8?

    This will work. I doubt you can get it much simpler
    in 2.x; in 3.x, your code will work out of the box
    (with proper syntactical adjustments).

    import pprint, cStringIO

    class UniPrinter(pprint.PrettyPrinter):
    def format(self, obj, context, maxlevels, level):
    if not isinstance(obj, unicode):
    return pprint.PrettyPrinter.format(self, obj,
    context,
    maxlevels,
    level)
    out = cStringIO.StringIO()
    out.write('u"')
    for c in obj:
    if ord(c)<32 or c in u'"\\':
    out.write('\\x%.2x' % ord(c))
    else:
    out.write(c.encode("utf-8"))
    out.write('"')
    # result, readable, recursive
    return out.getvalue(), True, False

    UniPrinter().pprint({ u'k"e\\y': u'我爱中国人' })
     
    Martin v. Loewis, Jan 11, 2011
    #2
    1. Advertising

  3. W. Martin Borgert

    Alex Willmer Guest

    On Jan 11, 10:40 pm, "W. Martin Borgert" <> wrote:
    > Hi,
    >
    > naively, I thought the following code:
    >
    > #!/usr/bin/env python2.6
    > # -*- coding: utf-8 -*-
    > import codecs
    > d = { u'key': u'我爱中国人' }
    > if __name__ == "__main__":
    >     with codecs.open("ilike.txt", "w", "utf-8") as f:
    >         print >>f, d
    >
    > would produce a file ilike.txt like this:
    >
    > {u'key': u'我爱中国人'}
    >
    > But unfortunately, it results in:
    >
    > {u'key': u'\u6211\u7231\u4e2d\u56fd\u4eba'}
    >
    > What's the right way to get the strings in UTF-8?
    >
    > Thanks in advance!


    It has worked, you're just seeing how python presents unicode
    characters in the interactive interpreter:

    Python 2.7.1+ (r271:86832, Dec 24 2010, 10:04:43)
    [GCC 4.5.2] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> x = {u'key': u'\u6211\u7231\u4e2d\u56fd\u4eba'}
    >>> x

    {u'key': u'\u6211\u7231\u4e2d\u56fd\u4eba'}
    >>> print x

    {u'key': u'\u6211\u7231\u4e2d\u56fd\u4eba'}
    >>> print x['key']

    我爱中国人

    That last line only works if your terminal uses an suitable encoding
    (e.g. utf-8).

    Regards, Alex
     
    Alex Willmer, Jan 11, 2011
    #3
  4. On 2011-01-12 00:27, Martin v. Loewis wrote:
    > This will work. I doubt you can get it much simpler
    > in 2.x; in 3.x, your code will work out of the box
    > (with proper syntactical adjustments).


    Thanks, this works like a charm. I tried pprint before for this
    task and failed. Now I know why :~)
     
    W. Martin Borgert, Jan 12, 2011
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. halfdog
    Replies:
    12
    Views:
    12,599
  2. blatt
    Replies:
    40
    Views:
    572
  3. Chris Angelico
    Replies:
    0
    Views:
    92
    Chris Angelico
    Jul 14, 2013
  4. Terry Reedy
    Replies:
    0
    Views:
    93
    Terry Reedy
    Jul 15, 2013
  5. Chris Angelico
    Replies:
    0
    Views:
    95
    Chris Angelico
    Jul 15, 2013
Loading...

Share This Page