How to pass Chinese characters as command-line arguments?

K

kj

I want to pass Chinese characters as command-line arguments to a
Python script. My terminal has no problem displaying these
characters, and passing them to the script, but I can't get Python
to understand them properly.

E.g. if I pass one such character to the simple script

import sys
print sys.argv[1]
print type(sys.argv[1])

the first line of the output looks fine (identical to the input),
but the second line says "<type 'str'>". If I add the line

arg = unicode(sys.argv[1])

I get the error

Traceback (most recent call last):
File "kgrep.py", line 4, in <module>
arg = unicode(sys.argv[1])
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 0: ordinal not in range(128)

What must I do to get Python to recognize command-line arguments
as utf-8 Unicode?

FWIW, my os is Darwin, my shell (zsh) runs on Terminal, and none
of my locale (LC_*) variables is set.

TIA!

kynn
 
D

Diez B. Roggisch

Am 31.01.10 16:52, schrieb kj:
I want to pass Chinese characters as command-line arguments to a
Python script. My terminal has no problem displaying these
characters, and passing them to the script, but I can't get Python
to understand them properly.

E.g. if I pass one such character to the simple script

import sys
print sys.argv[1]
print type(sys.argv[1])

the first line of the output looks fine (identical to the input),
but the second line says "<type 'str'>". If I add the line

arg = unicode(sys.argv[1])

I get the error

Traceback (most recent call last):
File "kgrep.py", line 4, in<module>
arg = unicode(sys.argv[1])
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 0: ordinal not in range(128)

What must I do to get Python to recognize command-line arguments
as utf-8 Unicode?

The last sentence reveals your problem: utf-8 is *not* unicode. It's an
encoding of unicode, which is a crucial difference.

From the outside you get byte-streams, and if these happen to be
encoded in utf-8, you can simply decode them:

arg = unicode(sys.argv[1], "utf-8")

Diez
 
K

kj

In said:
Am 31.01.10 16:52, schrieb kj:
I want to pass Chinese characters as command-line arguments to a
Python script. My terminal has no problem displaying these
characters, and passing them to the script, but I can't get Python
to understand them properly.

E.g. if I pass one such character to the simple script

import sys
print sys.argv[1]
print type(sys.argv[1])

the first line of the output looks fine (identical to the input),
but the second line says "<type 'str'>". If I add the line

arg = unicode(sys.argv[1])

I get the error

Traceback (most recent call last):
File "kgrep.py", line 4, in<module>
arg = unicode(sys.argv[1])
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 0: ordinal not in range(128)

What must I do to get Python to recognize command-line arguments
as utf-8 Unicode?
The last sentence reveals your problem: utf-8 is *not* unicode. It's an
encoding of unicode, which is a crucial difference.
From the outside you get byte-streams, and if these happen to be
encoded in utf-8, you can simply decode them:
arg = unicode(sys.argv[1], "utf-8")

Thanks!

kynn
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,754
Messages
2,569,522
Members
44,995
Latest member
PinupduzSap

Latest Threads

Top