unicode in exception traceback

W

WaterWalk

Hello. I just found on Windows when an exception is raised and
traceback info is printed on STDERR, all the characters printed are
just plain ASCII. Take the unicode character u'\u4e00' for example. If
I write:

print u'\u4e00'

If the system locale is "PRC China", then this statement will print
this character as a single Chinese character.

But if i write: assert u'\u4e00' == 1

An AssertionError will be raised and traceback info will be put to
STDERR, while this time, u'\u4e00' will simply be printed just as
u'\u4e00', several ASCII characters instead of one single Chinese
character. I use the coding directive commen(# -*- coding: utf-8 -*-)t
on the first line of Python source file and also save it in utf-8
format, but the problem remains.

What's worse, if i directly write Chinese characters in a unicode
string, when the traceback info is printed, they'll appear in a non-
readable way, that is, they show themselves as something else. It's
like printing something DBCS characters when the locale is incorrect.

I think this problem isn't unique. When using some other East-Asia
characters, the same problem may recur.

Is there any workaround to it?
 
P

Peter Otten

WaterWalk said:
Hello. I just found on Windows when an exception is raised and
traceback info is printed on STDERR, all the characters printed are
just plain ASCII. Take the unicode character u'\u4e00' for example. If
I write:

print u'\u4e00'

If the system locale is "PRC China", then this statement will print
this character as a single Chinese character.

But if i write: assert u'\u4e00' == 1

An AssertionError will be raised and traceback info will be put to
STDERR, while this time, u'\u4e00' will simply be printed just as
u'\u4e00', several ASCII characters instead of one single Chinese
character. I use the coding directive commen(# -*- coding: utf-8 -*-)t
on the first line of Python source file and also save it in utf-8
format, but the problem remains.

What's worse, if i directly write Chinese characters in a unicode
string, when the traceback info is printed, they'll appear in a non-
readable way, that is, they show themselves as something else. It's
like printing something DBCS characters when the locale is incorrect.

I think this problem isn't unique. When using some other East-Asia
characters, the same problem may recur.

Is there any workaround to it?

Pass a byte string but make some effort to use the right encoding:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AssertionError: 一

You might be able to do this in the except hook:

$ cat unicode_exception_message.py
import sys

def eh(etype, exc, tb, original_excepthook=sys.excepthook):
message = exc.args[0]
if isinstance(message, unicode):
exc.args = (message.encode(sys.stderr.encoding or "ascii", "xmlcharrefreplace"),) + exc.args[1:]
return original_excepthook(etype, exc, tb)

sys.excepthook = eh

assert False, u"\u4e00"

$ python unicode_exception_message.py
Traceback (most recent call last):
File "unicode_exception_message.py", line 11, in <module>
assert False, u"\u4e00"
AssertionError: 一

If python cannot figure out the encoding this falls back to ascii with
xml charrefs:

$ python unicode_exception_message.py 2>tmp.txt
$ cat tmp.txt
Traceback (most recent call last):
File "unicode_exception_message.py", line 11, in <module>
assert False, u"\u4e00"
AssertionError: 一

Note that I've not done any tests; e.g. if there are exceptions with
immutable .args the except hook itself will fail.

Peter
 
W

WaterWalk

WaterWalk said:
Hello. I just found on Windows when an exception is raised and
traceback info is printed on STDERR, all the characters printed are
just plain ASCII. Take the unicode character u'\u4e00' for example. If
I write:
print u'\u4e00'
If the system locale is "PRC China", then this statement will print
this character as a single Chinese character.
But if i write: assert u'\u4e00' == 1
An AssertionError will be raised and traceback info will be put to
STDERR, while this time, u'\u4e00' will simply be printed just as
u'\u4e00', several ASCII characters instead of one single Chinese
character. I use the coding directive commen(# -*- coding: utf-8 -*-)t
on the first line of Python source file and also save it in utf-8
format, but the problem remains.
What's worse, if i directly write Chinese characters in a unicode
string, when the traceback info is printed, they'll appear in a non-
readable way, that is, they show themselves as something else. It's
like printing something DBCS characters when the locale is incorrect.
I think this problem isn't unique. When using some other East-Asia
characters, the same problem may recur.
Is there any workaround to it?

Pass a byte string but make some effort to use the right encoding:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AssertionError: $B0l(B

You might be able to do this in the except hook:

$ cat unicode_exception_message.py
import sys

def eh(etype, exc, tb, original_excepthook=sys.excepthook):
message = exc.args[0]
if isinstance(message, unicode):
exc.args = (message.encode(sys.stderr.encoding or "ascii", "xmlcharrefreplace"),) + exc.args[1:]
return original_excepthook(etype, exc, tb)

sys.excepthook = eh

assert False, u"\u4e00"

$ python unicode_exception_message.py
Traceback (most recent call last):
File "unicode_exception_message.py", line 11, in <module>
assert False, u"\u4e00"
AssertionError: $B0l(B

If python cannot figure out the encoding this falls back to ascii with
xml charrefs:

$ python unicode_exception_message.py 2>tmp.txt
$ cat tmp.txt
Traceback (most recent call last):
File "unicode_exception_message.py", line 11, in <module>
assert False, u"\u4e00"
AssertionError: 一

Note that I've not done any tests; e.g. if there are exceptions with
immutable .args the except hook itself will fail.

Peter

Thanks. My brief test indicates that it works. I'll try it in more
situations.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top