H
harrismh777
hi folks,
I am puzzled by unicode generally, and within the context of python
specifically. For one thing, what do we mean that unicode is used in
python 3.x by default. (I know what default means, I mean, what changed?)
I think part of my problem is that I'm spoiled (American, ascii
heritage) and have been either stuck in ascii knowingly, or UTF-8
without knowing (just because the code points lined up). I am confused
by the implications for using 3.x, because I am reading that there are
significant things to be aware of... what?
On my installation 2.6 sys.maxunicode comes up with 1114111, and my
2.7 and 3.2 installs come up with 65535 each. So, I am assuming that 2.6
was compiled with UCS-4 (UTF-32) option for 4 byte unicode(?) and that
the default compile option for 2.7 & 3.2 (I didn't change anything) is
set for UCS-2 (UTF-16) or 2 byte unicode(?). Do I understand this much
correctly?
The books say that the .py sources are UTF-8 by default... and that
3.x is either UCS-2 or UCS-4. If I use the file handling capabilities
of Python in 3.x (by default) what encoding will be used, and how will
that affect the output?
If I do not specify any code points above ascii 0xFF does any of
this matter anyway?
Thanks.
kind regards,
m harris
I am puzzled by unicode generally, and within the context of python
specifically. For one thing, what do we mean that unicode is used in
python 3.x by default. (I know what default means, I mean, what changed?)
I think part of my problem is that I'm spoiled (American, ascii
heritage) and have been either stuck in ascii knowingly, or UTF-8
without knowing (just because the code points lined up). I am confused
by the implications for using 3.x, because I am reading that there are
significant things to be aware of... what?
On my installation 2.6 sys.maxunicode comes up with 1114111, and my
2.7 and 3.2 installs come up with 65535 each. So, I am assuming that 2.6
was compiled with UCS-4 (UTF-32) option for 4 byte unicode(?) and that
the default compile option for 2.7 & 3.2 (I didn't change anything) is
set for UCS-2 (UTF-16) or 2 byte unicode(?). Do I understand this much
correctly?
The books say that the .py sources are UTF-8 by default... and that
3.x is either UCS-2 or UCS-4. If I use the file handling capabilities
of Python in 3.x (by default) what encoding will be used, and how will
that affect the output?
If I do not specify any code points above ascii 0xFF does any of
this matter anyway?
Thanks.
kind regards,
m harris