J
John Machin
|>>> '\x80'.decode('cp936')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'gbk' codec can't decode byte 0x80
in position 0: incomplete multibyte sequence
However:
Retrieved 2010-10-10 from
http://www.unicode.org/Public
/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP936.TXT
# Name: cp936 to Unicode table
# Unicode version: 2.0
# Table version: 2.01
# Table format: Format A
# Date: 1/7/2000
#
# Contact: (e-mail address removed)
...
0x7F 0x007F #DELETE
0x80 0x20AC #EURO SIGN
0x81 #DBCS LEAD BYTE
Retrieved 2010-10-10 from
http://msdn.microsoft.com/en-us/goglobal/cc305153.aspx
Windows Codepage 936
[pictorial mapping; shows 80 mapping to 20AC]
Retrieved 2010-10-10 from
http://demo.icu-project.org
/icu-bin/convexp?conv=windows-936-2000&s=ALL
[pictorial mapping for converter
"windows-936-2000" with
aliases including GBK, CP936, MS936;
shows 80 mapping to 20AC]
So Microsoft appears to think that
cp936 includes the euro,
and the ICU project seem to think that GBK and cp936
both include the euro.
A couple of questions:
Is this a bug or a shrug?
Where can one find the mapping tables
from which the various CJK codecs are derived?
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'gbk' codec can't decode byte 0x80
in position 0: incomplete multibyte sequence
However:
Retrieved 2010-10-10 from
http://www.unicode.org/Public
/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP936.TXT
# Name: cp936 to Unicode table
# Unicode version: 2.0
# Table version: 2.01
# Table format: Format A
# Date: 1/7/2000
#
# Contact: (e-mail address removed)
...
0x7F 0x007F #DELETE
0x80 0x20AC #EURO SIGN
0x81 #DBCS LEAD BYTE
Retrieved 2010-10-10 from
http://msdn.microsoft.com/en-us/goglobal/cc305153.aspx
Windows Codepage 936
[pictorial mapping; shows 80 mapping to 20AC]
Retrieved 2010-10-10 from
http://demo.icu-project.org
/icu-bin/convexp?conv=windows-936-2000&s=ALL
[pictorial mapping for converter
"windows-936-2000" with
aliases including GBK, CP936, MS936;
shows 80 mapping to 20AC]
So Microsoft appears to think that
cp936 includes the euro,
and the ICU project seem to think that GBK and cp936
both include the euro.
A couple of questions:
Is this a bug or a shrug?
Where can one find the mapping tables
from which the various CJK codecs are derived?