a question about unicode in python

H

hzqij

i have a python source code test.py

# -*- coding: UTF-8 -*-

# s is a unicode string, include chinese
s = u'ÕÅÈý'

then i run

$ python test.py
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1:
invalid data

by in python interactive, it is right

why?
 
M

Marc 'BlackJack' Rintsch

i have a python source code test.py

# -*- coding: UTF-8 -*-

# s is a unicode string, include chinese
s = u'张三'

then i run

$ python test.py
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1:
invalid data

by in python interactive, it is right


why?

Does the "coding comment" match the actual encoding of the source file?

Ciao,
Marc 'BlackJack' Rintsch
 
W

WolfgangZ

hzqij said:
i have a python source code test.py

# -*- coding: UTF-8 -*-

# s is a unicode string, include chinese
s = u'ÕÅÈý'

then i run

$ python test.py
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1:
invalid data

by in python interactive, it is right


why?

just an idea: is your text editor really supporting utf-8? In the mail
it is only displayed as '??' which looks for me as the mail editor did
not send the mail as utf. Try to attach a correct text file.
 
E

Evan Klitzke

i have a python source code test.py

# -*- coding: UTF-8 -*-

As Marc pointed out, you should test the actual file encoding of the
program to check that it is, in fact, UTF-8 encoded. If you're on a
Unix/Linux system you should be able to test for a UTF-8 encoded file
using the "file" command, e.g.

evan@dhcp-10-10-7-101 ~ $ file ~/uni.py
/home/evan/uni.py: UTF-8 Unicode text
 
K

kyosohma

As Marc pointed out, you should test the actual file encoding of the
program to check that it is, in fact, UTF-8 encoded. If you're on a
Unix/Linux system you should be able to test for a UTF-8 encoded file
using the "file" command, e.g.

evan@dhcp-10-10-7-101 ~ $ file ~/uni.py
/home/evan/uni.py: UTF-8 Unicode text

If you're using IDLE to edit the source with, you can set IDLE to
encode in utf8 by going to Options, Configure IDLE, General Tab, and
change the Default Source Encoding to utf-8.

Mike
 
A

Andre Engels

2007/6/12 said:
just an idea: is your text editor really supporting utf-8? In the mail
it is only displayed as '??' which looks for me as the mail editor did
not send the mail as utf. Try to attach a correct text file.

That must be your mail client, not his text editor or mail client. I
do see two Chinese characters in the message.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
474,431
Messages
2,571,679
Members
48,796
Latest member
Greg L.

Latest Threads

Top