cp936 uses gbk codec,doesn't decode `\x80` as U+20AC EURO SIGN

Discussion in 'Python' started by John Machin, Oct 10, 2010.

  1. John Machin

    John Machin Guest

    |>>> '\x80'.decode('cp936')
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    UnicodeDecodeError: 'gbk' codec can't decode byte 0x80
    in position 0: incomplete multibyte sequence


    Retrieved 2010-10-10 from

    # Name: cp936 to Unicode table
    # Unicode version: 2.0
    # Table version: 2.01
    # Table format: Format A
    # Date: 1/7/2000
    # Contact:
    0x7F 0x007F #DELETE
    0x80 0x20AC #EURO SIGN
    0x81 #DBCS LEAD BYTE

    Retrieved 2010-10-10 from

    Windows Codepage 936
    [pictorial mapping; shows 80 mapping to 20AC]

    Retrieved 2010-10-10 from

    [pictorial mapping for converter
    "windows-936-2000" with
    aliases including GBK, CP936, MS936;
    shows 80 mapping to 20AC]

    So Microsoft appears to think that
    cp936 includes the euro,
    and the ICU project seem to think that GBK and cp936
    both include the euro.

    A couple of questions:

    Is this a bug or a shrug?

    Where can one find the mapping tables
    from which the various CJK codecs are derived?
    John Machin, Oct 10, 2010
    1. Advertisements

  2. Bug, IMHO.

    Ulrich Eckhardt, Oct 11, 2010
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.