Using Unicode scripts

yzzzzz · Jul 18, 2003

Hi,

I am writing my python programs using a Unicode text editor. The files are
encoded in UTF-8. Python's default encoding seems to be Latin 1 (ISO-8859-1)
or maybe Windows-1252 (CP1252) which aren't compatible with UTF-8.

For example, if I type print "é", it prints Ã©. If I use a unicode string:
a=u"é" and if I choose to encode it in UTF-8, I get 4 Latin 1 characters,
which makes sense if the interpreter thinks I typed in u"Ã©".

How can I solve this problem?

Thank you

PS. I have no problem using Unicode strings in Python, I know how to
manipulate and convert them, I'm just looking for how to specify the default
encoding for the scripts I write.

Thomas Heller · Jul 18, 2003

yzzzzz said:
Hi,

I am writing my python programs using a Unicode text editor. The files are
encoded in UTF-8. Python's default encoding seems to be Latin 1 (ISO-8859-1)
or maybe Windows-1252 (CP1252) which aren't compatible with UTF-8.

For example, if I type print "é", it prints Ã©. If I use a unicode string:
a=u"é" and if I choose to encode it in UTF-8, I get 4 Latin 1 characters,
which makes sense if the interpreter thinks I typed in u"Ã©".

How can I solve this problem?

Thank you

PS. I have no problem using Unicode strings in Python, I know how to
manipulate and convert them, I'm just looking for how to specify the default
encoding for the scripts I write.

Use Python 2.3, and read PEP 263.

Thomas

=?ISO-8859-15?Q?Gerhard_H=E4ring?= · Jul 18, 2003

yzzzzz said:
Hi,

Hi "yzzzzz",

I am writing my python programs using a Unicode text editor. The files are
encoded in UTF-8. Python's default encoding seems to be Latin 1 (ISO-8859-1)
or maybe Windows-1252 (CP1252) which aren't compatible with UTF-8.

For example, if I type print "é", it prints Ã©. If I use a unicode string:
a=u"é" and if I choose to encode it in UTF-8, I get 4 Latin 1 characters,
which makes sense if the interpreter thinks I typed in u"Ã©".

How can I solve this problem?

You might want to read the thread on this list/newsgroup I started
yesterday called "Unicode problem"

Is it feasible for you to upgrade to Python 2.3? If so I'd recommend you
do it already. 2.3 is pretty close to release now and it has support for
source files in Unicode format. If your Unicode editor saves the text
file with a BOM (it should) then under Python 2.3 your scripts will work
as expected.

Thank you

PS. I have no problem using Unicode strings in Python, I know how to
manipulate and convert them, I'm just looking for how to specify the default
encoding for the scripts I write.

See http://www.python.org/peps/pep-0263.html This is how it is
implemented in Python 2.3.

-- Gerhard

yzzzzz · Jul 18, 2003

OK, problem solved!
I got the new Python, it all works. I just had to add the UTF-8 BOM myself
(UltraEdit doesn't do it by default) but that wasn't too difficult to do
(copy and paste a ZWNBSP).

One last question: I'm using windows, so the console's encoding is CP437. If
I try to print a unicode string, the string is converted to CP437 and
printed and that works fine. However if I try to print a normal
(non-unicode) string from a UTF-8 encoded file with BOM, for example print
"é", it sends out the two UTF-8 bytes Ã© which appear as lines in the CP437
charset. But if I print the exact same character in a Latin 1 encoded file,
it comes out as the Latin 1 byte for "é" which shows up as a theta in CP437.
This means that Python doesn't take into account the specified encoding
(Latin 1 or UTF-8) and prints out the raw bytes as they appear in the source
file, regardless of the encoding used. Is this normal? (this isn't really a
problem for me as I am only going to use unicode strings now)

Thinking Unicode	0	Aug 8, 2013
Python Unicode handling wins again -- mostly	67	Nov 30, 2013
Unicode	20	Dec 16, 2012
Unicode 7	52	Apr 29, 2014
Unicode literals and byte string interpretation.	4	Oct 28, 2011
Unicode	2	Mar 15, 2013
Unicode error	19	Jul 23, 2010
Trying to understand this moji-bake	9	Jan 25, 2014

Using Unicode scripts

yzzzzz

Thomas Heller

=?ISO-8859-15?Q?Gerhard_H=E4ring?=

yzzzzz

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads