codecs latin1 unicode standard output file

Marko Faldix · Dec 15, 2003

Hello,

with Python 2.3 I can write umlauts (a,o,u umlaut) to a file with this piece
of code:

import codecs

f = codecs.open("klotentest.txt", "w", "latin-1")
print >>f, unicode("My umlauts are ä, ö, ü", "latin-1")

This works fine. This is not exactly what I wanted to have. I would like to
write this to standard output so that I can use same code to produce output
lines on console or to use this to pipe into file. It was possible before
Python 2.3. Isn't possible anymore with same code?

--
Marko Faldix
M+R Infosysteme
Hubert-Wienen-Str. 24 52070 Aachen
Tel.: 0241-93878-16 Fax.:0241-875095
E-Mail: markopointfaldix@mplusrpointde

Michael Hudson · Dec 15, 2003

Marko Faldix said:
Hello,

with Python 2.3 I can write umlauts (a,o,u umlaut) to a file with this piece
of code:

import codecs

f = codecs.open("klotentest.txt", "w", "latin-1")
print >>f, unicode("My umlauts are Ã¤, Ã¶, Ã¼", "latin-1")

This works fine. This is not exactly what I wanted to have. I would like to
write this to standard output so that I can use same code to produce output
lines on console or to use this to pipe into file. It was possible before
Python 2.3. Isn't possible anymore with same code?

If your locale is setup up in an appropriate way, you should be able
to print latin-1 characters to stdout without any intervention at all.

If that doesn't work, we need more details.

Cheers,
mwh

Marko Faldix · Dec 15, 2003

Hi,

Michael Hudson said:
If your locale is setup up in an appropriate way, you should be able
to print latin-1 characters to stdout without any intervention at all.

If that doesn't work, we need more details.

Cheers,
mwh

I try to describe. It's a Window machine with Python 2.3.2 installed. Using
command line (cmd). Put these lines of code in a file called klotentest1.py:

# -*- coding: iso-8859-1 -*-

print unicode("My umlauts are ä, ö, ü", "latin-1")
print "My umlauts are ä, ö, ü"

Calling this on command line:

klotentest1.py

Indeed, result of first print is as desired, result of second print delivers
strange letters but no error.
Now I call this on command line:

klotentest1.py > klotentest1.txt

This fails:
Traceback (most recent call last):
File "C:\home\marko\moeller_port\moeller_port_exec_svn\klotentest1.py", line
3, in ?
print unicode("My umlauts are õ, ÷, ³", "latin-1")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position
15: ordinal not in range(128)

( By the way: error result is same if I call it this way: python
klotentest1.py > klotentest1.txt )

In my point of view python shouldn't act in different ways whether result is
piped to file or not.

Marko Faldix

Fredrik Lundh · Dec 15, 2003

Marko said:
I try to describe. It's a Window machine with Python 2.3.2 installed. Using
command line (cmd). Put these lines of code in a file called klotentest1.py:

# -*- coding: iso-8859-1 -*-

print unicode("My umlauts are ä, ö, ü", "latin-1")
print "My umlauts are ä, ö, ü"

Calling this on command line:

klotentest1.py

Indeed, result of first print is as desired, result of second print delivers
strange letters but no error.

your console device doesn't use iso-8859-1; it probably uses cp850.
if you print an 8-bit string to the console, Python assumes that you
know what you're doing...

Now I call this on command line:

klotentest1.py > klotentest1.txt

This fails:
Traceback (most recent call last):
File "C:\home\marko\moeller_port\moeller_port_exec_svn\klotentest1.py", line
3, in ?
print unicode("My umlauts are õ, ÷, ³", "latin-1")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position
15: ordinal not in range(128)

In my point of view python shouldn't act in different ways whether result is
piped to file or not.

when you print to a console with a known encoding, Python 2.3 auto-
magically converts Unicode strings to 8-bit strings using the console
encoding.

files don't have an encoding, which is why the second case fails.

also note that in 2.2 and earlier, you example always failed.

</F>

Marko Faldix · Dec 15, 2003

Hi,

Fredrik Lundh said:
your console device doesn't use iso-8859-1; it probably uses cp850.
if you print an 8-bit string to the console, Python assumes that you
know what you're doing...

when you print to a console with a known encoding, Python 2.3 auto-
magically converts Unicode strings to 8-bit strings using the console
encoding.

files don't have an encoding, which is why the second case fails.

also note that in 2.2 and earlier, you example always failed.

</F>

So I just have to use only this:

print "My umlauts are ä, ö, ü"

without any encoding-assignment to use for standard output on console AND
redirecting to file. In latter case, it looks nice with e.g. notepad, just
strange on console, so settings for console are to adjust and not python
code. Right?

Marko Faldix

Martin v. =?iso-8859-15?q?L=F6wis?= · Dec 15, 2003

Marko Faldix said:
print "My umlauts are Ã¤, Ã¶, Ã¼"

without any encoding-assignment to use for standard output on console AND
redirecting to file. In latter case, it looks nice with e.g. notepad, just
strange on console, so settings for console are to adjust and not python
code. Right?

Wrong. On your operating system, notepad.exe and the console use
*different* encodings. If you think this is stupid, please complain to
Microsoft. If you print byte strings, it will come out wrong either in
the terminal, or in notepad - there is *no way* to have the same byte
string show correctly in both encodings.

If you want to output to a file, you should open the file in
locale.getpreferredencoding(). If you want to output to a terminal,
Python should automatically find out what the terminal's encoding is
(to make things worse, the user can override the terminal encoding
on Windows, on a per-terminal basis, using chcp.exe).

Regards,
Martin

Bengt Richter · Dec 15, 2003

Marko said:
Marko said:

I try to describe. It's a Window machine with Python 2.3.2 installed. Using
command line (cmd). Put these lines of code in a file called klotentest1.py: ^^^^[1]

# -*- coding: iso-8859-1 -*- ^^^^^^^^^^[2]

print unicode("My umlauts are ä, ö, ü", "latin-1")
print "My umlauts are ä, ö, ü" ^^^^^^^^^^^^^^^^^^^^^^^^[3]
[...]
Calling this on command line:

klotentest1.py

Indeed, result of first print is as desired, result of second print delivers
strange letters but no error.

Click to expand...

your console device doesn't use iso-8859-1; it probably uses cp850.
if you print an 8-bit string to the console, Python assumes that you
know what you're doing...

I think the OP is suggesting that given [1] & [2], [3] should implicitly carry the [2] info
and be converted for output just like the result of unicode(...) is.

(I know that's not the way it works now, and I know it's not an easy problem ;-)

when you print to a console with a known encoding, Python 2.3 auto-
magically converts Unicode strings to 8-bit strings using the console
encoding.

files don't have an encoding, which is why the second case fails.

I think the OP is thinking files [1] with # -*- coding: iso-8859-1 -*- [2]
_do_ have an encoding, so in some way [3] should be an unambiguous character sequence,
not just a byte sequence (I have to get back to a previous thread with Martin, where
I owe a reply. This same issue is key there). (I realize that's not the way it works now,
and that it's a hard problem, to repeat myself ;-)

Regards,
Bengt Richter

Serge Orlov · Dec 15, 2003

Marko Faldix said:
So I just have to use only this:

print "My umlauts are ä, ö, ü"

without any encoding-assignment to use for standard output on console AND
redirecting to file. In latter case, it looks nice with e.g. notepad, just
strange on console, so settings for console are to adjust and not python
code. Right?

No, the right code is
=============================
# -*- coding: iso-8859-1 -*-
import locale, codecs, sys

if not sys.stdout.isatty():
sys.stdout = codecs.lookup(locale.getpreferedencoding())[3](sys.stdout)

print u"My umlauts are ä, ö, ü"
=============================
The difference between console and file output is that while
there's only one way to output ä on cp850 console, there
are many ways to output the same character to file (latin-1,
utf-8, utf-7, utf-16le, utf-16be, cp850 and maybe more).
So python refuses to guess.
Another rule to follow is to store non-ascii character in
unicode strings. Otherwise either you will have to track
the encodings yourself or assume that all 8-bits strings
in your program have the same encoding. That's not
a good idea. I'm not sure if you will have proper .upper()
and .lower() methods on 8-bit strings. (don't have python
here to check)

-- Serge.

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · Dec 15, 2003

Bengt said:
I think the OP is thinking files [1] with # -*- coding: iso-8859-1 -*- [2]
_do_ have an encoding, so in some way [3] should be an unambiguous character sequence,
not just a byte sequence

The OP could easily overcome this aspect of the problem with a Unicode
literal (and in fact, he originally did convert the string literal to
a Unicode object before further processing).

This does not solve the problem, though: Writing the Unicode object to
a file still gives an encoding error, since he did not specify the
encoding of the file.

Regards,
Martin

encoding latin1 to utf-8	6	Sep 10, 2007
list groups and users of NT domain	3	Aug 13, 2003
Unicode blues in Python3	14	Mar 23, 2010
How to avoid \x{...} when converting unicode to latin1?	3	Jul 21, 2009
q: how to output a unicode string?	5	Apr 24, 2007
Ascii to Unicode.	4	Jul 28, 2010
Python 3.3, gettext and Unicode problems	0	Dec 31, 2012
StringIO + unicode	1	Mar 25, 2008

codecs latin1 unicode standard output file

Marko Faldix

Michael Hudson

Marko Faldix

Fredrik Lundh

Marko Faldix

Martin v. =?iso-8859-15?q?L=F6wis?=

Bengt Richter

Serge Orlov

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads