Unicode problem

R

Rehceb Rotkiv

Please have a look at this little script:

#!/usr/bin/python
import sys
import codecs
fileHandle = codecs.open(sys.argv[1], 'r', 'utf-8')
fileString = fileHandle.read()
print fileString

if I call it from a Bash shell like this

$ ./test.py testfile.utf8.txt

it works just fine, but when I try to pipe the output to another process
("|") or into a file (">"), e.g. like this

$ ./test.py testfile.utf8.txt | cat

I get an error:

Traceback (most recent call last):
File "./test.py", line 6, in ?
print fileString
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in
position 538: ordinal not in range(128)

I absolutely don't know what's the problem here, can you help?

Thanks,
Rehceb
 
G

Gabriel Genellina

Rehceb said:
#!/usr/bin/python
import sys
import codecs
fileHandle = codecs.open(sys.argv[1], 'r', 'utf-8')
fileString = fileHandle.read()
print fileString

if I call it from a Bash shell like this

$ ./test.py testfile.utf8.txt

it works just fine, but when I try to pipe the output to another process
("|") or into a file (">"), e.g. like this

$ ./test.py testfile.utf8.txt | cat

I get an error:

Traceback (most recent call last):
File "./test.py", line 6, in ?
print fileString
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in
position 538: ordinal not in range(128)

I absolutely don't know what's the problem here, can you help?

Using codecs.open, when you read the file you get Unicode. When you
print the Unicode object, it is encoded using your terminal default
encoding (utf8 I presume?)
But when you redirect the output, it's no more connected to your
terminal so no encoding can be assumed, and the default encoding is
used.

Try this line at the top:
print
"stdout:",sys.stdout.encoding,"default:",sys.getdefaultencoding()
I get stdout: ANSI_X3.4-1968 default: ascii normally and stdout: None
default: ascii when redirected.

You have to encode the Unicode object explicitely: print
fileString.encode("utf-8")
(or any other suitable one; I said utf-8 just because you read the
input file using that)
 
R

Rehceb Rotkiv

You have to encode the Unicode object explicitely: print
fileString.encode("utf-8")
(or any other suitable one; I said utf-8 just because you read the input
file using that)

Thanks! That's a nice little stumbling block for a newbie like me ;) Is
there a way to make utf-8 the default encoding for every string, so that
I do not have to encode each string explicitly?
 
G

Guest

Thanks! That's a nice little stumbling block for a newbie like me ;) Is
there a way to make utf-8 the default encoding for every string, so that
I do not have to encode each string explicitly?

You can make sys.stdout encode each string with UTF-8, with

sys.stdout = codecs.getwriter('utf-8')(sys.stdout)

Make sure that you then that *all* strings that you print
are Unicode strings.

HTH,
Martin
 
G

Georg Brandl

Martin said:
You can make sys.stdout encode each string with UTF-8, with

sys.stdout = codecs.getwriter('utf-8')(sys.stdout)

Make sure that you then that *all* strings that you print
are Unicode strings.

BTW, any reason why an EncodedFile can't act like a Unicode writer/reader object
if one of its encodings is explicitly set to None?

IMO the docs don't make it clear that getwriter() is the correct API to use
here. I've wanted to write "sys.stdout = codecs.EncodedFile(sys.stdout,
'utf-8')" more than once.

Georg
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

BTW, any reason why an EncodedFile can't act like a Unicode
writer/reader object
if one of its encodings is explicitly set to None?

AFAIU, that's not the intention of EncodedFile: instead, it is
meant to do recoding. I find it a pretty useless API, and
rather see it go away than being enhanced.

Regards,
Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
474,262
Messages
2,571,056
Members
48,769
Latest member
Clifft

Latest Threads

Top