Problem with Unicode char in Python 3.3.0

F

Franck Ditter

Hi !
I work on MacOS-X Lion and IDLE/Python 3.3.0
I can't get the treble key (U1D11E) !
SyntaxError: (unicode error) 'unicodeescape' codec can't
decode bytes in position 0-6: end of string in escape sequence

How can I display musical keys ?

Thanks,

franck
 
P

Peter Otten

Franck said:
I work on MacOS-X Lion and IDLE/Python 3.3.0
I can't get the treble key (U1D11E) !

SyntaxError: (unicode error) 'unicodeescape' codec can't
decode bytes in position 0-6: end of string in escape sequence

How can I display musical keys ?
Try
'ð„ž'
 
F

Franck Ditter

marduk said:
You probably meant:



For that synax you must use either '\uXXXX' or '\UXXXXXXXX' (i.e.
specify either 4 or 8 hex digits).

http://docs.python.org/2/howto/unicode#unicode-literals-in-python-source-code

<<< print('\U0001d11e')
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
print('\U0001d11e')
UnicodeEncodeError: 'UCS-2' codec can't encode character '\U0001d11e'
in position 0: Non-BMP character not supported in Tk
 
C

Chris Angelico

<<< print('\U0001d11e')
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
print('\U0001d11e')
UnicodeEncodeError: 'UCS-2' codec can't encode character '\U0001d11e'
in position 0: Non-BMP character not supported in Tk

That's a different issue; IDLE can't handle non-BMP characters. Try it
from the terminal if you can - on my Linux systems (Debians and
Ubuntus with GNOME and gnome-terminal), the terminal is set to UTF-8
and quite happily accepts the full Unicode range. On Windows, that may
well not be the case, though.

ChrisA
 
T

Terry Reedy

<<< print('\U0001d11e')
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
print('\U0001d11e')
UnicodeEncodeError: 'UCS-2' codec can't encode character '\U0001d11e'
in position 0: Non-BMP character not supported in Tk

The message comes from printing to a tk text widget (the IDLE shell),
not from creating the 1 char string. c = '\U0001d11e' works fine. When
you have problems with creating and printing unicode, *separate*
creating from printing to see where the problem is. (I do not know if
the brand new tcl/tk 8.6 is any better.)

The windows console also chokes, but with a different message.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Programs\Python33\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001d11e'
in posit
ion 0: character maps to <undefined>

Yes, this is very annoying, especially in Win 7.
 
T

Terry Reedy

<<< print('\U0001d11e')
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
print('\U0001d11e')
UnicodeEncodeError: 'UCS-2' codec can't encode character '\U0001d11e'
in position 0: Non-BMP character not supported in Tk

The message comes from printing to a tk text widget (the IDLE shell),
not from creating the 1 char string. c = '\U0001d11e' works fine. When
you have problems with creating and printing unicode, *separate*
creating from printing to see where the problem is. (I do not know if
the brand new tcl/tk 8.6 is any better.)

The windows console also chokes, but with a different message.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Programs\Python33\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001d11e'
in posit
ion 0: character maps to <undefined>

Yes, this is very annoying, especially in Win 7.

The above is in 3.3, in which '\U0001d11e' is actually translated to a
length 1 string. In 3.2-, that literal is translated (on 3.2- narrow
builds, as on Windows) to a length 2 string surrogate pair (in the BMP).
On printing, the pair of surrogates got translated to a square box used
for all characters for which the font does not have a glyph. ð„žWhen cut
and pasted, it shows in this mail composer as a weird music sign with
peculiar behavior.
3 -s, 3 spaces, paste, 3 spaces, 3 -s, but it may disappear.
--- ð„ž ---
So 3.3 is the first Windows version to get the UnicodeEncodeError on
printing.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top