codecs latin1 unicode standard output file

Discussion in 'Python' started by Marko Faldix, Dec 15, 2003.

  1. Marko Faldix

    Marko Faldix Guest

    Hello,

    with Python 2.3 I can write umlauts (a,o,u umlaut) to a file with this piece
    of code:

    import codecs

    f = codecs.open("klotentest.txt", "w", "latin-1")
    print >>f, unicode("My umlauts are ä, ö, ü", "latin-1")


    This works fine. This is not exactly what I wanted to have. I would like to
    write this to standard output so that I can use same code to produce output
    lines on console or to use this to pipe into file. It was possible before
    Python 2.3. Isn't possible anymore with same code?


    --
    Marko Faldix
    M+R Infosysteme
    Hubert-Wienen-Str. 24 52070 Aachen
    Tel.: 0241-93878-16 Fax.:0241-875095
    E-Mail: markopointfaldix@mplusrpointde
    Marko Faldix, Dec 15, 2003
    #1
    1. Advertising

  2. "Marko Faldix" <> writes:

    > Hello,
    >
    > with Python 2.3 I can write umlauts (a,o,u umlaut) to a file with this piece
    > of code:
    >
    > import codecs
    >
    > f = codecs.open("klotentest.txt", "w", "latin-1")
    > print >>f, unicode("My umlauts are ä, ö, ü", "latin-1")
    >
    >
    > This works fine. This is not exactly what I wanted to have. I would like to
    > write this to standard output so that I can use same code to produce output
    > lines on console or to use this to pipe into file. It was possible before
    > Python 2.3. Isn't possible anymore with same code?


    If your locale is setup up in an appropriate way, you should be able
    to print latin-1 characters to stdout without any intervention at all.

    If that doesn't work, we need more details.

    Cheers,
    mwh

    --
    Also, remember to put the galaxy back when you've finished, or an
    angry mob of astronomers will come round and kneecap you with a
    small telescope for littering.
    -- Simon Tatham, ucam.chat, from Owen Dunn's review of the year
    Michael Hudson, Dec 15, 2003
    #2
    1. Advertising

  3. Marko Faldix

    Marko Faldix Guest

    Hi,

    "Michael Hudson" <> schrieb im Newsbeitrag
    news:...
    > "Marko Faldix" <> writes:
    >
    > > Hello,
    > >
    > > with Python 2.3 I can write umlauts (a,o,u umlaut) to a file with this

    piece
    > > of code:
    > >
    > > import codecs
    > >
    > > f = codecs.open("klotentest.txt", "w", "latin-1")
    > > print >>f, unicode("My umlauts are ä, ö, ü", "latin-1")
    > >
    > >
    > > This works fine. This is not exactly what I wanted to have. I would like

    to
    > > write this to standard output so that I can use same code to produce

    output
    > > lines on console or to use this to pipe into file. It was possible

    before
    > > Python 2.3. Isn't possible anymore with same code?

    >
    > If your locale is setup up in an appropriate way, you should be able
    > to print latin-1 characters to stdout without any intervention at all.
    >
    > If that doesn't work, we need more details.
    >
    > Cheers,
    > mwh



    I try to describe. It's a Window machine with Python 2.3.2 installed. Using
    command line (cmd). Put these lines of code in a file called klotentest1.py:

    # -*- coding: iso-8859-1 -*-

    print unicode("My umlauts are ä, ö, ü", "latin-1")
    print "My umlauts are ä, ö, ü"

    Calling this on command line:

    klotentest1.py

    Indeed, result of first print is as desired, result of second print delivers
    strange letters but no error.
    Now I call this on command line:

    klotentest1.py > klotentest1.txt

    This fails:
    Traceback (most recent call last):
    File "C:\home\marko\moeller_port\moeller_port_exec_svn\klotentest1.py", line
    3, in ?
    print unicode("My umlauts are õ, ÷, ³", "latin-1")
    UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position
    15: ordinal not in range(128)


    ( By the way: error result is same if I call it this way: python
    klotentest1.py > klotentest1.txt )

    In my point of view python shouldn't act in different ways whether result is
    piped to file or not.

    Marko Faldix
    Marko Faldix, Dec 15, 2003
    #3
  4. Marko Faldix wrote:

    > I try to describe. It's a Window machine with Python 2.3.2 installed. Using
    > command line (cmd). Put these lines of code in a file called klotentest1.py:
    >
    > # -*- coding: iso-8859-1 -*-
    >
    > print unicode("My umlauts are ä, ö, ü", "latin-1")
    > print "My umlauts are ä, ö, ü"
    >
    > Calling this on command line:
    >
    > klotentest1.py
    >
    > Indeed, result of first print is as desired, result of second print delivers
    > strange letters but no error.


    your console device doesn't use iso-8859-1; it probably uses cp850.
    if you print an 8-bit string to the console, Python assumes that you
    know what you're doing...

    > Now I call this on command line:
    >
    > klotentest1.py > klotentest1.txt
    >
    > This fails:
    > Traceback (most recent call last):
    > File "C:\home\marko\moeller_port\moeller_port_exec_svn\klotentest1.py", line
    > 3, in ?
    > print unicode("My umlauts are õ, ÷, ³", "latin-1")
    > UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position
    > 15: ordinal not in range(128)
    >
    > In my point of view python shouldn't act in different ways whether result is
    > piped to file or not.


    when you print to a console with a known encoding, Python 2.3 auto-
    magically converts Unicode strings to 8-bit strings using the console
    encoding.

    files don't have an encoding, which is why the second case fails.

    also note that in 2.2 and earlier, you example always failed.

    </F>
    Fredrik Lundh, Dec 15, 2003
    #4
  5. Marko Faldix

    Marko Faldix Guest

    Hi,

    "Fredrik Lundh" <> schrieb im Newsbeitrag
    news:...
    > Marko Faldix wrote:
    >
    > > I try to describe. It's a Window machine with Python 2.3.2 installed.

    Using
    > > command line (cmd). Put these lines of code in a file called

    klotentest1.py:
    > >
    > > # -*- coding: iso-8859-1 -*-
    > >
    > > print unicode("My umlauts are ä, ö, ü", "latin-1")
    > > print "My umlauts are ä, ö, ü"
    > >
    > > Calling this on command line:
    > >
    > > klotentest1.py
    > >
    > > Indeed, result of first print is as desired, result of second print

    delivers
    > > strange letters but no error.

    >
    > your console device doesn't use iso-8859-1; it probably uses cp850.
    > if you print an 8-bit string to the console, Python assumes that you
    > know what you're doing...
    >
    > > Now I call this on command line:
    > >
    > > klotentest1.py > klotentest1.txt
    > >
    > > This fails:
    > > Traceback (most recent call last):
    > > File "C:\home\marko\moeller_port\moeller_port_exec_svn\klotentest1.py",

    line
    > > 3, in ?
    > > print unicode("My umlauts are õ, ÷, ³", "latin-1")
    > > UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in

    position
    > > 15: ordinal not in range(128)
    > >
    > > In my point of view python shouldn't act in different ways whether

    result is
    > > piped to file or not.

    >
    > when you print to a console with a known encoding, Python 2.3 auto-
    > magically converts Unicode strings to 8-bit strings using the console
    > encoding.
    >
    > files don't have an encoding, which is why the second case fails.
    >
    > also note that in 2.2 and earlier, you example always failed.
    >
    > </F>


    So I just have to use only this:

    print "My umlauts are ä, ö, ü"

    without any encoding-assignment to use for standard output on console AND
    redirecting to file. In latter case, it looks nice with e.g. notepad, just
    strange on console, so settings for console are to adjust and not python
    code. Right?


    Marko Faldix
    Marko Faldix, Dec 15, 2003
    #5
  6. "Marko Faldix" <> writes:

    > print "My umlauts are ä, ö, ü"
    >
    > without any encoding-assignment to use for standard output on console AND
    > redirecting to file. In latter case, it looks nice with e.g. notepad, just
    > strange on console, so settings for console are to adjust and not python
    > code. Right?


    Wrong. On your operating system, notepad.exe and the console use
    *different* encodings. If you think this is stupid, please complain to
    Microsoft. If you print byte strings, it will come out wrong either in
    the terminal, or in notepad - there is *no way* to have the same byte
    string show correctly in both encodings.

    If you want to output to a file, you should open the file in
    locale.getpreferredencoding(). If you want to output to a terminal,
    Python should automatically find out what the terminal's encoding is
    (to make things worse, the user can override the terminal encoding
    on Windows, on a per-terminal basis, using chcp.exe).

    Regards,
    Martin
    Martin v. =?iso-8859-15?q?L=F6wis?=, Dec 15, 2003
    #6
  7. On Mon, 15 Dec 2003 12:38:50 +0100, "Fredrik Lundh" <> wrote:

    >Marko Faldix wrote:
    >
    >> I try to describe. It's a Window machine with Python 2.3.2 installed. Using
    >> command line (cmd). Put these lines of code in a file called klotentest1.py:

    ^^^^[1]
    >>
    >> # -*- coding: iso-8859-1 -*-

    ^^^^^^^^^^[2]
    >>
    >> print unicode("My umlauts are ä, ö, ü", "latin-1")
    >> print "My umlauts are ä, ö, ü"

    ^^^^^^^^^^^^^^^^^^^^^^^^[3]
    >>

    [...]
    >> Calling this on command line:
    >>
    >> klotentest1.py
    >>
    >> Indeed, result of first print is as desired, result of second print delivers
    >> strange letters but no error.

    >
    >your console device doesn't use iso-8859-1; it probably uses cp850.
    >if you print an 8-bit string to the console, Python assumes that you
    >know what you're doing...

    I think the OP is suggesting that given [1] & [2], [3] should implicitly carry the [2] info
    and be converted for output just like the result of unicode(...) is.

    (I know that's not the way it works now, and I know it's not an easy problem ;-)
    >
    >> Now I call this on command line:
    >>
    >> klotentest1.py > klotentest1.txt
    >>
    >> This fails:
    >> Traceback (most recent call last):
    >> File "C:\home\marko\moeller_port\moeller_port_exec_svn\klotentest1.py", line
    >> 3, in ?
    >> print unicode("My umlauts are õ, ÷, ³", "latin-1")
    >> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position
    >> 15: ordinal not in range(128)
    >>
    >> In my point of view python shouldn't act in different ways whether result is
    >> piped to file or not.

    >
    >when you print to a console with a known encoding, Python 2.3 auto-
    >magically converts Unicode strings to 8-bit strings using the console
    >encoding.
    >
    >files don't have an encoding, which is why the second case fails.

    I think the OP is thinking files [1] with # -*- coding: iso-8859-1 -*- [2]
    _do_ have an encoding, so in some way [3] should be an unambiguous character sequence,
    not just a byte sequence (I have to get back to a previous thread with Martin, where
    I owe a reply. This same issue is key there). (I realize that's not the way it works now,
    and that it's a hard problem, to repeat myself ;-)

    Regards,
    Bengt Richter
    Bengt Richter, Dec 15, 2003
    #7
  8. Marko Faldix

    Serge Orlov Guest

    "Marko Faldix" <> wrote in message news:brkddv$4evj2$-berlin.de...
    > > > In my point of view python shouldn't act in different ways whether

    > result is
    > > > piped to file or not.

    > >
    > > when you print to a console with a known encoding, Python 2.3 auto-
    > > magically converts Unicode strings to 8-bit strings using the console
    > > encoding.
    > >
    > > files don't have an encoding, which is why the second case fails.
    > >
    > > also note that in 2.2 and earlier, you example always failed.
    > >
    > > </F>

    >
    > So I just have to use only this:
    >
    > print "My umlauts are ä, ö, ü"
    >
    > without any encoding-assignment to use for standard output on console AND
    > redirecting to file. In latter case, it looks nice with e.g. notepad, just
    > strange on console, so settings for console are to adjust and not python
    > code. Right?


    No, the right code is
    =============================
    # -*- coding: iso-8859-1 -*-
    import locale, codecs, sys

    if not sys.stdout.isatty():
    sys.stdout = codecs.lookup(locale.getpreferedencoding())[3](sys.stdout)

    print u"My umlauts are ä, ö, ü"
    =============================
    The difference between console and file output is that while
    there's only one way to output ä on cp850 console, there
    are many ways to output the same character to file (latin-1,
    utf-8, utf-7, utf-16le, utf-16be, cp850 and maybe more).
    So python refuses to guess.
    Another rule to follow is to store non-ascii character in
    unicode strings. Otherwise either you will have to track
    the encodings yourself or assume that all 8-bits strings
    in your program have the same encoding. That's not
    a good idea. I'm not sure if you will have proper .upper()
    and .lower() methods on 8-bit strings. (don't have python
    here to check)

    -- Serge.
    Serge Orlov, Dec 15, 2003
    #8
  9. Bengt Richter wrote:

    > I think the OP is thinking files [1] with # -*- coding: iso-8859-1 -*- [2]
    > _do_ have an encoding, so in some way [3] should be an unambiguous character sequence,
    > not just a byte sequence


    The OP could easily overcome this aspect of the problem with a Unicode
    literal (and in fact, he originally did convert the string literal to
    a Unicode object before further processing).

    This does not solve the problem, though: Writing the Unicode object to
    a file still gives an encoding error, since he did not specify the
    encoding of the file.

    Regards,
    Martin
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=, Dec 15, 2003
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Fritz Bayer
    Replies:
    2
    Views:
    5,800
    Fritz Bayer
    Apr 20, 2005
  2. Ivan Voras

    unicode codecs

    Ivan Voras, Feb 9, 2004, in forum: Python
    Replies:
    8
    Views:
    445
    Ivan Voras
    Feb 9, 2004
  3. Steven Bethard

    singing the praises of unicode and codecs

    Steven Bethard, Dec 10, 2004, in forum: Python
    Replies:
    0
    Views:
    296
    Steven Bethard
    Dec 10, 2004
  4. Jochen Lehmeier
    Replies:
    3
    Views:
    136
    Bo Lindbergh
    Jul 22, 2009
  5. Karl Knechtel
    Replies:
    2
    Views:
    367
    Walter Dörwald
    Jul 10, 2012
Loading...

Share This Page