unicode in exception traceback

Discussion in 'Python' started by WaterWalk, Apr 3, 2008.

  1. WaterWalk

    WaterWalk Guest

    Hello. I just found on Windows when an exception is raised and
    traceback info is printed on STDERR, all the characters printed are
    just plain ASCII. Take the unicode character u'\u4e00' for example. If
    I write:

    print u'\u4e00'

    If the system locale is "PRC China", then this statement will print
    this character as a single Chinese character.

    But if i write: assert u'\u4e00' == 1

    An AssertionError will be raised and traceback info will be put to
    STDERR, while this time, u'\u4e00' will simply be printed just as
    u'\u4e00', several ASCII characters instead of one single Chinese
    character. I use the coding directive commen(# -*- coding: utf-8 -*-)t
    on the first line of Python source file and also save it in utf-8
    format, but the problem remains.

    What's worse, if i directly write Chinese characters in a unicode
    string, when the traceback info is printed, they'll appear in a non-
    readable way, that is, they show themselves as something else. It's
    like printing something DBCS characters when the locale is incorrect.

    I think this problem isn't unique. When using some other East-Asia
    characters, the same problem may recur.

    Is there any workaround to it?
    WaterWalk, Apr 3, 2008
    #1
    1. Advertising

  2. WaterWalk

    Peter Otten Guest

    WaterWalk wrote:

    > Hello. I just found on Windows when an exception is raised and
    > traceback info is printed on STDERR, all the characters printed are
    > just plain ASCII. Take the unicode character u'\u4e00' for example. If
    > I write:
    >
    > print u'\u4e00'
    >
    > If the system locale is "PRC China", then this statement will print
    > this character as a single Chinese character.
    >
    > But if i write: assert u'\u4e00' == 1
    >
    > An AssertionError will be raised and traceback info will be put to
    > STDERR, while this time, u'\u4e00' will simply be printed just as
    > u'\u4e00', several ASCII characters instead of one single Chinese
    > character. I use the coding directive commen(# -*- coding: utf-8 -*-)t
    > on the first line of Python source file and also save it in utf-8
    > format, but the problem remains.
    >
    > What's worse, if i directly write Chinese characters in a unicode
    > string, when the traceback info is printed, they'll appear in a non-
    > readable way, that is, they show themselves as something else. It's
    > like printing something DBCS characters when the locale is incorrect.
    >
    > I think this problem isn't unique. When using some other East-Asia
    > characters, the same problem may recur.
    >
    > Is there any workaround to it?


    Pass a byte string but make some effort to use the right encoding:

    >>> assert False, u"\u4e00".encode(sys.stdout.encoding or "ascii", "xmlcharrefreplace")

    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    AssertionError: 一

    You might be able to do this in the except hook:

    $ cat unicode_exception_message.py
    import sys

    def eh(etype, exc, tb, original_excepthook=sys.excepthook):
    message = exc.args[0]
    if isinstance(message, unicode):
    exc.args = (message.encode(sys.stderr.encoding or "ascii", "xmlcharrefreplace"),) + exc.args[1:]
    return original_excepthook(etype, exc, tb)

    sys.excepthook = eh

    assert False, u"\u4e00"

    $ python unicode_exception_message.py
    Traceback (most recent call last):
    File "unicode_exception_message.py", line 11, in <module>
    assert False, u"\u4e00"
    AssertionError: 一

    If python cannot figure out the encoding this falls back to ascii with
    xml charrefs:

    $ python unicode_exception_message.py 2>tmp.txt
    $ cat tmp.txt
    Traceback (most recent call last):
    File "unicode_exception_message.py", line 11, in <module>
    assert False, u"\u4e00"
    AssertionError: 一

    Note that I've not done any tests; e.g. if there are exceptions with
    immutable .args the except hook itself will fail.

    Peter
    Peter Otten, Apr 3, 2008
    #2
    1. Advertising

  3. WaterWalk

    WaterWalk Guest

    On Apr 3, 5:56 pm, Peter Otten <> wrote:
    > WaterWalk wrote:
    > > Hello. I just found on Windows when an exception is raised and
    > > traceback info is printed on STDERR, all the characters printed are
    > > just plain ASCII. Take the unicode character u'\u4e00' for example. If
    > > I write:

    >
    > > print u'\u4e00'

    >
    > > If the system locale is "PRC China", then this statement will print
    > > this character as a single Chinese character.

    >
    > > But if i write: assert u'\u4e00' == 1

    >
    > > An AssertionError will be raised and traceback info will be put to
    > > STDERR, while this time, u'\u4e00' will simply be printed just as
    > > u'\u4e00', several ASCII characters instead of one single Chinese
    > > character. I use the coding directive commen(# -*- coding: utf-8 -*-)t
    > > on the first line of Python source file and also save it in utf-8
    > > format, but the problem remains.

    >
    > > What's worse, if i directly write Chinese characters in a unicode
    > > string, when the traceback info is printed, they'll appear in a non-
    > > readable way, that is, they show themselves as something else. It's
    > > like printing something DBCS characters when the locale is incorrect.

    >
    > > I think this problem isn't unique. When using some other East-Asia
    > > characters, the same problem may recur.

    >
    > > Is there any workaround to it?

    >
    > Pass a byte string but make some effort to use the right encoding:
    >
    > >>> assert False, u"\u4e00".encode(sys.stdout.encoding or "ascii", "xmlcharrefreplace")

    >
    > Traceback (most recent call last):
    > File "<stdin>", line 1, in <module>
    > AssertionError: $B0l(B
    >
    > You might be able to do this in the except hook:
    >
    > $ cat unicode_exception_message.py
    > import sys
    >
    > def eh(etype, exc, tb, original_excepthook=sys.excepthook):
    > message = exc.args[0]
    > if isinstance(message, unicode):
    > exc.args = (message.encode(sys.stderr.encoding or "ascii", "xmlcharrefreplace"),) + exc.args[1:]
    > return original_excepthook(etype, exc, tb)
    >
    > sys.excepthook = eh
    >
    > assert False, u"\u4e00"
    >
    > $ python unicode_exception_message.py
    > Traceback (most recent call last):
    > File "unicode_exception_message.py", line 11, in <module>
    > assert False, u"\u4e00"
    > AssertionError: $B0l(B
    >
    > If python cannot figure out the encoding this falls back to ascii with
    > xml charrefs:
    >
    > $ python unicode_exception_message.py 2>tmp.txt
    > $ cat tmp.txt
    > Traceback (most recent call last):
    > File "unicode_exception_message.py", line 11, in <module>
    > assert False, u"\u4e00"
    > AssertionError: 一
    >
    > Note that I've not done any tests; e.g. if there are exceptions with
    > immutable .args the except hook itself will fail.
    >
    > Peter


    Thanks. My brief test indicates that it works. I'll try it in more
    situations.
    WaterWalk, Apr 3, 2008
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jan Decaluwe
    Replies:
    0
    Views:
    307
    Jan Decaluwe
    Nov 7, 2003
  2. Joe Peterson
    Replies:
    4
    Views:
    298
    Joe Peterson
    Jul 5, 2005
  3. Saizan
    Replies:
    5
    Views:
    330
    Fabio Zadrozny
    Feb 2, 2006
  4. Dan Stromberg
    Replies:
    1
    Views:
    294
    Piet van Oostrum
    Dec 2, 2010
  5. Jack Bates
    Replies:
    0
    Views:
    275
    Jack Bates
    May 2, 2011
Loading...

Share This Page