Unicode

T

Thomas Heller

I thought I understand unicode (somewhat, at least), but this seems
not to be the case.

I expected the following code to print 'µm' two times to the console:

<code>
# -*- coding: cp850 -*-

a = u"µm"
b = u"\u03bcm"

print(a)
print(b)
</code>

But what I get is this:

<output>
µm
Traceback (most recent call last):
File "x.py", line 7, in <module>
print(b)
File "C:\Python33-64\lib\encodings\cp850.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u03bc' in
position 0: character maps to <undefined>
</output>

Using (german) windows, command prompt, codepage 850.

The same happens with Python 2.7. What am I doing wrong?

Thanks,
Thomas
 
S

Steven D'Aprano

I thought I understand unicode (somewhat, at least), but this seems not
to be the case.

I expected the following code to print 'µm' two times to the console:

<code>
# -*- coding: cp850 -*-

a = u"µm"
b = u"\u03bcm"

print(a)
print(b)
</code>

But what I get is this:

<output>
µm
Traceback (most recent call last):
File "x.py", line 7, in <module>
print(b)
File "C:\Python33-64\lib\encodings\cp850.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u03bc' in
position 0: character maps to <undefined> </output>

Using (german) windows, command prompt, codepage 850.

The same happens with Python 2.7. What am I doing wrong?


That's because the two strings are not the same.

You can isolate the error by noting that the second one only raises an
exception when you try to print it. That suggests that the problem is
that it contains a character which is not defined in your terminal's
codepage. So let's inspect the strings more carefully:


py> a = u"µm"
py> b = u"\u03bcm"
py> a == b
False
py> ord(a[0]), ord(b[0])
(181, 956)
py> import unicodedata
py> unicodedata.name(a[0])
'MICRO SIGN'
py> unicodedata.name(b[0])
'GREEK SMALL LETTER MU'

Does codepage 850 include Greek Small Letter Mu? The evidence suggests it
does not.

If you can, you should set the terminal's encoding to UTF-8. That will
avoid this sort of problem.
 
T

Thomas Heller

Am 15.03.2013 11:58, schrieb Steven D'Aprano:[Windows: Problems with unicode output to console]
You can isolate the error by noting that the second one only raises an
exception when you try to print it. That suggests that the problem is
that it contains a character which is not defined in your terminal's
codepage. So let's inspect the strings more carefully:


py> a = u"µm"
py> b = u"\u03bcm"
py> a == b
False
py> ord(a[0]), ord(b[0])
(181, 956)
py> import unicodedata
py> unicodedata.name(a[0])
'MICRO SIGN'
py> unicodedata.name(b[0])
'GREEK SMALL LETTER MU'

Does codepage 850 include Greek Small Letter Mu? The evidence suggests it
does not.

If you can, you should set the terminal's encoding to UTF-8. That will
avoid this sort of problem.

Thanks for the clarification.

For the archives: Setting the console codepage to 65001 and the font to
lucida console helps.

Thomas
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Staff online

Members online

Forum statistics

Threads
474,058
Messages
2,570,446
Members
47,119
Latest member
nocode69

Latest Threads

Top