UTF-8 output problems

Discussion in 'Python' started by Michael B. Trausch, Mar 10, 2007.

  1. I am having a slight problem with UTF-8 output with Python. I have the
    following program:

    x = 0

    while x < 0x4000:
    print u"This is Unicode code point %d (0x%x): %s" % (x, x,
    unichr(x))
    x += 1

    This program works perfectly when run directly:

    mbt@pepper:~/tmp$ python test.py
    This is Unicode code point 0 (0x0):
    This is Unicode code point 1 (0x1):
    This is Unicode code point 2 (0x2):
    This is Unicode code point 3 (0x3):
    This is Unicode code point 4 (0x4):
    This is Unicode code point 5 (0x5):
    This is Unicode code point 6 (0x6):
    This is Unicode code point 7 (0x7):
    This is Unicode code point 8 (0x8):
    This is Unicode code point 9 (0x9):
    This is Unicode code point 10 (0xa):
    (... continued)

    However, when I attempt to redirect the output to a file:

    mbt@pepper:~/tmp$ python test.py >f
    Traceback (most recent call last):
    File "test.py", line 6, in <module>
    print u"This is Unicode code point %d (0x%x): %s" % (x, x,
    unichr(x))
    UnicodeEncodeError: 'ascii' codec can't encode character u'\x80' in
    position 39: ordinal not in range(128)

    This is slightly confusing to me. The output goes all the way to the
    end of the program when it is not redirected. Why is Python treating
    the situation differently when the output is redirected? This failure
    occurs for all redirection, by the way: >, >>, 1>2, pipes, and so forth.

    Any ideas?

    — Mike

    --
    Michael B. Trausch

    Phone: (404) 592-5746
    Jabber IM:


    Demand Freedom! Use open and free protocols, standards, and software!

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.6 (GNU/Linux)

    iD8DBQBF8gW+0kE/IBnFmjARAg4SAJ0RBrk/+W1udAMJXVGN1ev5Cid1MwCePLEj
    N/AcFNwgm9mgYtP61Z9HYs0=
    =w41X
    -----END PGP SIGNATURE-----
    Michael B. Trausch, Mar 10, 2007
    #1
    1. Advertising

  2. In <>, Michael B.
    Trausch wrote:

    > However, when I attempt to redirect the output to a file:
    >
    > mbt@pepper:~/tmp$ python test.py >f
    > Traceback (most recent call last):
    > File "test.py", line 6, in <module>
    > print u"This is Unicode code point %d (0x%x): %s" % (x, x,
    > unichr(x))
    > UnicodeEncodeError: 'ascii' codec can't encode character u'\x80' in
    > position 39: ordinal not in range(128)
    >
    > This is slightly confusing to me. The output goes all the way to the
    > end of the program when it is not redirected. Why is Python treating
    > the situation differently when the output is redirected?


    If you print to a terminal `sys.stdout` is connected to that terminal and
    there are ways to figure out that it is a terminal (`os.isatty()`) and
    which encoding the terminal excepts. At least in most cases. But there
    is no way to tell what encoding a file or pipe should have. So Python
    refuses to guess.

    If an encoding could be determined the `sys.stdout.encoding` attribute is
    set to the name, otherwise it's `None`.

    Ciao,
    Marc 'BlackJack' Rintsch
    Marc 'BlackJack' Rintsch, Mar 10, 2007
    #2
    1. Advertising

  3. Michael B. Trausch wrote:

    > I am having a slight problem with UTF-8 output with Python. I have the
    > following program:
    >
    > x = 0
    >
    > while x < 0x4000:
    > print u"This is Unicode code point %d (0x%x): %s" % (x, x,
    > unichr(x))
    > x += 1
    >
    > This program works perfectly when run directly:
    >
    > mbt@pepper:~/tmp$ python test.py
    > This is Unicode code point 0 (0x0):
    > This is Unicode code point 1 (0x1):
    > This is Unicode code point 2 (0x2):
    > This is Unicode code point 3 (0x3):
    > This is Unicode code point 4 (0x4):
    > This is Unicode code point 5 (0x5):
    > This is Unicode code point 6 (0x6):
    > This is Unicode code point 7 (0x7):
    > This is Unicode code point 8 (0x8):
    > This is Unicode code point 9 (0x9):
    > This is Unicode code point 10 (0xa):
    > (... continued)
    >
    > However, when I attempt to redirect the output to a file:
    >
    > mbt@pepper:~/tmp$ python test.py >f
    > Traceback (most recent call last):
    > File "test.py", line 6, in <module>
    > print u"This is Unicode code point %d (0x%x): %s" % (x, x,
    > unichr(x))
    > UnicodeEncodeError: 'ascii' codec can't encode character u'\x80' in
    > position 39: ordinal not in range(128)
    >
    > This is slightly confusing to me. The output goes all the way to the
    > end of the program when it is not redirected. Why is Python treating
    > the situation differently when the output is redirected? This failure
    > occurs for all redirection, by the way: >, >>, 1>2, pipes, and so forth.
    >
    > Any ideas?


    In complement to Marc reply, you can open a file with a specific encoding
    (see codecs.open() function), and use print >> f,... to fill that file.

    A+

    Laurent.
    Laurent Pointal, Mar 10, 2007
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. JJBW
    Replies:
    1
    Views:
    10,061
    Joerg Jooss
    Apr 24, 2004
  2. =?Utf-8?B?QXNoYQ==?=
    Replies:
    3
    Views:
    416
  3. Arifi Koseoglu
    Replies:
    2
    Views:
    953
    Arifi Koseoglu
    Apr 13, 2004
  4. Jimmy Shaw

    Converting from UTF-16 to UTF-32

    Jimmy Shaw, Jul 31, 2006, in forum: C++
    Replies:
    7
    Views:
    1,303
    P.J. Plauger
    Aug 1, 2006
  5. darrel
    Replies:
    5
    Views:
    465
    =?ISO-8859-1?Q?G=F6ran_Andersson?=
    Apr 14, 2007
Loading...

Share This Page