UTF-8 output problems

M

Michael B. Trausch

I am having a slight problem with UTF-8 output with Python. I have the
following program:

x = 0

while x < 0x4000:
print u"This is Unicode code point %d (0x%x): %s" % (x, x,
unichr(x))
x += 1

This program works perfectly when run directly:

mbt@pepper:~/tmp$ python test.py
This is Unicode code point 0 (0x0):
This is Unicode code point 1 (0x1):
This is Unicode code point 2 (0x2):
This is Unicode code point 3 (0x3):
This is Unicode code point 4 (0x4):
This is Unicode code point 5 (0x5):
This is Unicode code point 6 (0x6):
This is Unicode code point 7 (0x7):
This is Unicode code point 8 (0x8):
This is Unicode code point 9 (0x9):
This is Unicode code point 10 (0xa):
(... continued)

However, when I attempt to redirect the output to a file:

mbt@pepper:~/tmp$ python test.py >f
Traceback (most recent call last):
File "test.py", line 6, in <module>
print u"This is Unicode code point %d (0x%x): %s" % (x, x,
unichr(x))
UnicodeEncodeError: 'ascii' codec can't encode character u'\x80' in
position 39: ordinal not in range(128)

This is slightly confusing to me. The output goes all the way to the
end of the program when it is not redirected. Why is Python treating
the situation differently when the output is redirected? This failure
occurs for all redirection, by the way: >, >>, 1>2, pipes, and so forth.

Any ideas?

— Mike

--
Michael B. Trausch
(e-mail address removed)
Phone: (404) 592-5746
Jabber IM:
(e-mail address removed)
(e-mail address removed)
Demand Freedom! Use open and free protocols, standards, and software!

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQBF8gW+0kE/IBnFmjARAg4SAJ0RBrk/+W1udAMJXVGN1ev5Cid1MwCePLEj
N/AcFNwgm9mgYtP61Z9HYs0=
=w41X
-----END PGP SIGNATURE-----
 
M

Marc 'BlackJack' Rintsch

Michael B. said:
However, when I attempt to redirect the output to a file:

mbt@pepper:~/tmp$ python test.py >f
Traceback (most recent call last):
File "test.py", line 6, in <module>
print u"This is Unicode code point %d (0x%x): %s" % (x, x,
unichr(x))
UnicodeEncodeError: 'ascii' codec can't encode character u'\x80' in
position 39: ordinal not in range(128)

This is slightly confusing to me. The output goes all the way to the
end of the program when it is not redirected. Why is Python treating
the situation differently when the output is redirected?

If you print to a terminal `sys.stdout` is connected to that terminal and
there are ways to figure out that it is a terminal (`os.isatty()`) and
which encoding the terminal excepts. At least in most cases. But there
is no way to tell what encoding a file or pipe should have. So Python
refuses to guess.

If an encoding could be determined the `sys.stdout.encoding` attribute is
set to the name, otherwise it's `None`.

Ciao,
Marc 'BlackJack' Rintsch
 
L

Laurent Pointal

Michael said:
I am having a slight problem with UTF-8 output with Python. I have the
following program:

x = 0

while x < 0x4000:
print u"This is Unicode code point %d (0x%x): %s" % (x, x,
unichr(x))
x += 1

This program works perfectly when run directly:

mbt@pepper:~/tmp$ python test.py
This is Unicode code point 0 (0x0):
This is Unicode code point 1 (0x1):
This is Unicode code point 2 (0x2):
This is Unicode code point 3 (0x3):
This is Unicode code point 4 (0x4):
This is Unicode code point 5 (0x5):
This is Unicode code point 6 (0x6):
This is Unicode code point 7 (0x7):
This is Unicode code point 8 (0x8):
This is Unicode code point 9 (0x9):
This is Unicode code point 10 (0xa):
(... continued)

However, when I attempt to redirect the output to a file:

mbt@pepper:~/tmp$ python test.py >f
Traceback (most recent call last):
File "test.py", line 6, in <module>
print u"This is Unicode code point %d (0x%x): %s" % (x, x,
unichr(x))
UnicodeEncodeError: 'ascii' codec can't encode character u'\x80' in
position 39: ordinal not in range(128)

This is slightly confusing to me. The output goes all the way to the
end of the program when it is not redirected. Why is Python treating
the situation differently when the output is redirected? This failure
occurs for all redirection, by the way: >, >>, 1>2, pipes, and so forth.

Any ideas?

In complement to Marc reply, you can open a file with a specific encoding
(see codecs.open() function), and use print >> f,... to fill that file.

A+

Laurent.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top