Problem with Unicode char in Python 3.3.0

Discussion in 'Python' started by Franck Ditter, Jan 6, 2013.

  1. Hi !
    I work on MacOS-X Lion and IDLE/Python 3.3.0
    I can't get the treble key (U1D11E) !

    >>> "\U1D11E"

    SyntaxError: (unicode error) 'unicodeescape' codec can't
    decode bytes in position 0-6: end of string in escape sequence

    How can I display musical keys ?

    Thanks,

    franck
    Franck Ditter, Jan 6, 2013
    #1
    1. Advertising

  2. Franck Ditter

    Peter Otten Guest

    Franck Ditter wrote:

    > I work on MacOS-X Lion and IDLE/Python 3.3.0
    > I can't get the treble key (U1D11E) !
    >
    >>>> "\U1D11E"

    > SyntaxError: (unicode error) 'unicodeescape' codec can't
    > decode bytes in position 0-6: end of string in escape sequence
    >
    > How can I display musical keys ?


    Try
    >>> "\U0001D11E"

    'ð„ž'
    Peter Otten, Jan 6, 2013
    #2
    1. Advertising

  3. Franck Ditter

    marduk Guest

    On Sun, Jan 6, 2013, at 11:43 AM, Franck Ditter wrote:
    > Hi !
    > I work on MacOS-X Lion and IDLE/Python 3.3.0
    > I can't get the treble key (U1D11E) !
    >
    > >>> "\U1D11E"

    > SyntaxError: (unicode error) 'unicodeescape' codec can't
    > decode bytes in position 0-6: end of string in escape sequence
    >


    You probably meant:

    >>> '\U0001d11e'



    For that synax you must use either '\uXXXX' or '\UXXXXXXXX' (i.e.
    specify either 4 or 8 hex digits).

    http://docs.python.org/2/howto/unicode#unicode-literals-in-python-source-code
    marduk, Jan 6, 2013
    #3
  4. In article <>,
    marduk <> wrote:

    > On Sun, Jan 6, 2013, at 11:43 AM, Franck Ditter wrote:
    > > Hi !
    > > I work on MacOS-X Lion and IDLE/Python 3.3.0
    > > I can't get the treble key (U1D11E) !
    > >
    > > >>> "\U1D11E"

    > > SyntaxError: (unicode error) 'unicodeescape' codec can't
    > > decode bytes in position 0-6: end of string in escape sequence
    > >

    >
    > You probably meant:
    >
    > >>> '\U0001d11e'

    >
    >
    > For that synax you must use either '\uXXXX' or '\UXXXXXXXX' (i.e.
    > specify either 4 or 8 hex digits).
    >
    > http://docs.python.org/2/howto/unicode#unicode-literals-in-python-source-code


    <<< print('\U0001d11e')
    Traceback (most recent call last):
    File "<pyshell#1>", line 1, in <module>
    print('\U0001d11e')
    UnicodeEncodeError: 'UCS-2' codec can't encode character '\U0001d11e'
    in position 0: Non-BMP character not supported in Tk
    Franck Ditter, Jan 7, 2013
    #4
  5. On Mon, Jan 7, 2013 at 11:57 PM, Franck Ditter <> wrote:
    > <<< print('\U0001d11e')
    > Traceback (most recent call last):
    > File "<pyshell#1>", line 1, in <module>
    > print('\U0001d11e')
    > UnicodeEncodeError: 'UCS-2' codec can't encode character '\U0001d11e'
    > in position 0: Non-BMP character not supported in Tk


    That's a different issue; IDLE can't handle non-BMP characters. Try it
    from the terminal if you can - on my Linux systems (Debians and
    Ubuntus with GNOME and gnome-terminal), the terminal is set to UTF-8
    and quite happily accepts the full Unicode range. On Windows, that may
    well not be the case, though.

    ChrisA
    Chris Angelico, Jan 7, 2013
    #5
  6. Franck Ditter

    Terry Reedy Guest

    On 1/7/2013 7:57 AM, Franck Ditter wrote:

    > <<< print('\U0001d11e')
    > Traceback (most recent call last):
    > File "<pyshell#1>", line 1, in <module>
    > print('\U0001d11e')
    > UnicodeEncodeError: 'UCS-2' codec can't encode character '\U0001d11e'
    > in position 0: Non-BMP character not supported in Tk


    The message comes from printing to a tk text widget (the IDLE shell),
    not from creating the 1 char string. c = '\U0001d11e' works fine. When
    you have problems with creating and printing unicode, *separate*
    creating from printing to see where the problem is. (I do not know if
    the brand new tcl/tk 8.6 is any better.)

    The windows console also chokes, but with a different message.

    >>> c='\U0001d11e'
    >>> print(c)

    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "C:\Programs\Python33\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
    UnicodeEncodeError: 'charmap' codec can't encode character '\U0001d11e'
    in posit
    ion 0: character maps to <undefined>

    Yes, this is very annoying, especially in Win 7.

    --
    Terry Jan Reedy
    Terry Reedy, Jan 7, 2013
    #6
  7. Franck Ditter

    Terry Reedy Guest

    On 1/7/2013 8:12 AM, Terry Reedy wrote:
    > On 1/7/2013 7:57 AM, Franck Ditter wrote:
    >
    >> <<< print('\U0001d11e')
    >> Traceback (most recent call last):
    >> File "<pyshell#1>", line 1, in <module>
    >> print('\U0001d11e')
    >> UnicodeEncodeError: 'UCS-2' codec can't encode character '\U0001d11e'
    >> in position 0: Non-BMP character not supported in Tk

    >
    > The message comes from printing to a tk text widget (the IDLE shell),
    > not from creating the 1 char string. c = '\U0001d11e' works fine. When
    > you have problems with creating and printing unicode, *separate*
    > creating from printing to see where the problem is. (I do not know if
    > the brand new tcl/tk 8.6 is any better.)
    >
    > The windows console also chokes, but with a different message.
    >
    > >>> c='\U0001d11e'
    > >>> print(c)

    > Traceback (most recent call last):
    > File "<stdin>", line 1, in <module>
    > File "C:\Programs\Python33\lib\encodings\cp437.py", line 19, in encode
    > return codecs.charmap_encode(input,self.errors,encoding_map)[0]
    > UnicodeEncodeError: 'charmap' codec can't encode character '\U0001d11e'
    > in posit
    > ion 0: character maps to <undefined>
    >
    > Yes, this is very annoying, especially in Win 7.


    The above is in 3.3, in which '\U0001d11e' is actually translated to a
    length 1 string. In 3.2-, that literal is translated (on 3.2- narrow
    builds, as on Windows) to a length 2 string surrogate pair (in the BMP).
    On printing, the pair of surrogates got translated to a square box used
    for all characters for which the font does not have a glyph. ð„žWhen cut
    and pasted, it shows in this mail composer as a weird music sign with
    peculiar behavior.
    3 -s, 3 spaces, paste, 3 spaces, 3 -s, but it may disappear.
    --- ð„ž ---
    So 3.3 is the first Windows version to get the UnicodeEncodeError on
    printing.

    --
    Terry Jan Reedy
    Terry Reedy, Jan 8, 2013
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. wwj
    Replies:
    7
    Views:
    537
  2. wwj
    Replies:
    24
    Views:
    2,477
    Mike Wahler
    Nov 7, 2003
  3. lovecreatesbeauty
    Replies:
    1
    Views:
    1,007
    Ian Collins
    May 9, 2006
  4. Replies:
    3
    Views:
    719
  5. Chirag Mistry
    Replies:
    6
    Views:
    160
    Ollivier Robert
    Feb 8, 2008
Loading...

Share This Page