cp936 uses gbk codec,doesn't decode `\x80` as U+20AC EURO SIGN

Discussion in 'Python' started by John Machin, Oct 10, 2010.

  1. John Machin

    John Machin Guest

    |>>> '\x80'.decode('cp936')
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    UnicodeDecodeError: 'gbk' codec can't decode byte 0x80
    in position 0: incomplete multibyte sequence

    However:

    Retrieved 2010-10-10 from
    http://www.unicode.org/Public
    /MAPPINGS/VENDORS/MICSFT/WINDOWS/CP936.TXT

    # Name: cp936 to Unicode table
    # Unicode version: 2.0
    # Table version: 2.01
    # Table format: Format A
    # Date: 1/7/2000
    #
    # Contact:
    ...
    0x7F 0x007F #DELETE
    0x80 0x20AC #EURO SIGN
    0x81 #DBCS LEAD BYTE

    Retrieved 2010-10-10 from
    http://msdn.microsoft.com/en-us/goglobal/cc305153.aspx

    Windows Codepage 936
    [pictorial mapping; shows 80 mapping to 20AC]

    Retrieved 2010-10-10 from
    http://demo.icu-project.org
    /icu-bin/convexp?conv=windows-936-2000&s=ALL

    [pictorial mapping for converter
    "windows-936-2000" with
    aliases including GBK, CP936, MS936;
    shows 80 mapping to 20AC]

    So Microsoft appears to think that
    cp936 includes the euro,
    and the ICU project seem to think that GBK and cp936
    both include the euro.

    A couple of questions:

    Is this a bug or a shrug?

    Where can one find the mapping tables
    from which the various CJK codecs are derived?
    John Machin, Oct 10, 2010
    #1
    1. Advertising

  2. Re: cp936 uses gbk codec, doesn't decode `\x80` as U+20AC EURO SIGN

    John Machin wrote:
    > |>>> '\x80'.decode('cp936')
    > Traceback (most recent call last):
    > File "<stdin>", line 1, in <module>
    > UnicodeDecodeError: 'gbk' codec can't decode byte 0x80
    > in position 0: incomplete multibyte sequence

    [...]
    > So Microsoft appears to think that
    > cp936 includes the euro,
    > and the ICU project seem to think that GBK and cp936
    > both include the euro.
    >
    > A couple of questions:
    >
    > Is this a bug or a shrug?


    Bug, IMHO.

    Uli

    --
    Sator Laser GmbH
    Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932
    Ulrich Eckhardt, Oct 11, 2010
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. rphil

    Euro sign in .Net

    rphil, Apr 26, 2005, in forum: ASP .Net
    Replies:
    4
    Views:
    3,126
    Joerg Jooss
    Apr 28, 2005
  2. kingski

    Problem: Euro sign in sending email !

    kingski, Mar 3, 2006, in forum: ASP .Net
    Replies:
    7
    Views:
    709
    Juan T. Llibre
    Mar 4, 2006
  3. kingski

    Problem: Euro sign in send mail.

    kingski, Mar 3, 2006, in forum: ASP .Net
    Replies:
    0
    Views:
    432
    kingski
    Mar 3, 2006
  4. Marco W
    Replies:
    1
    Views:
    610
    David Carlisle
    Jun 8, 2005
  5. Yohan N. Leder
    Replies:
    11
    Views:
    1,016
    Jukka K. Korpela
    May 20, 2006
Loading...

Share This Page