Unicode conversion in 'print'

Discussion in 'Python' started by Ricardo Bugalho, Jan 13, 2005.

  1. Hello,
    I'm using Python 2.3.4 and I noticed that, when stdout is a terminal, the
    'print' statement converts Unicode strings into the encoding defined by
    the locales instead of the one returned by sys.getdefaultencoding().
    However, I can't find any references to it. Anyone knows where it's
    descrbed?

    Example:

    !/usr/bin/env python
    # -*- coding: utf-8 -*-

    import sys, locale

    print 'Python encoding:', sys.getdefaultencoding()
    print 'System encoding:', locale.getpreferredencoding()
    print 'Test string: ', u'Olá mundo'


    If stdout is a terminal, works fine
    $ python x.py
    Python encoding: ascii
    System encoding: UTF-8
    Test string: Olá mundo

    If I redirect the output to a file, raises an UnicodeEncodeError exception
    $ python x.py > x.txt
    Traceback (most recent call last):
    File "x.py", line 8, in ?
    print 'Test string: ', u'Olá mundo'
    UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 2: ordinal not in range(128)


    --
    Ricardo
    Ricardo Bugalho, Jan 13, 2005
    #1
    1. Advertising

  2. Ricardo Bugalho

    Serge Orlov Guest

    Ricardo Bugalho wrote:
    > Hello,
    > I'm using Python 2.3.4 and I noticed that, when stdout is a

    terminal,
    > the 'print' statement converts Unicode strings into the encoding
    > defined by the locales instead of the one returned by
    > sys.getdefaultencoding().


    Sure. It uses the encoding of you console. Here is explanation why it
    uses locale to get the encoding of console:
    http://www.python.org/moin/PrintFails

    > However, I can't find any references to it. Anyone knows where it's
    > descrbed?


    I've just wrote about it here:
    http://www.python.org/moin/DefaultEncoding

    >
    > Example:
    >
    > !/usr/bin/env python
    > # -*- coding: utf-8 -*-
    >
    > import sys, locale
    >
    > print 'Python encoding:', sys.getdefaultencoding()
    > print 'System encoding:', locale.getpreferredencoding()
    > print 'Test string: ', u'Olá mundo'
    >
    >
    > If stdout is a terminal, works fine
    > $ python x.py
    > Python encoding: ascii
    > System encoding: UTF-8
    > Test string: Olá mundo
    >
    > If I redirect the output to a file, raises an UnicodeEncodeError

    exception
    > $ python x.py > x.txt
    > Traceback (most recent call last):
    > File "x.py", line 8, in ?
    > print 'Test string: ', u'Olá mundo'
    > UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in

    position 2: ordinal not in range(128)
    >


    http://www.python.org/moin/ShellRedirectionFails

    Feel free to reply here if something is not clear, corrections in wiki
    are also welcome.

    Serge.
    Serge Orlov, Jan 13, 2005
    #2
    1. Advertising

  3. Hi,
    thanks for the information. But what I was really looking for was
    informaion on when and why Python started doing it (previously, it always
    used sys.getdefaultencoding())) and why it was done only for 'print' when
    stdout is a terminal instead of always.

    On Thu, 13 Jan 2005 14:33:20 -0800, Serge Orlov wrote:

    > Sure. It uses the encoding of you console. Here is explanation why it uses
    > locale to get the encoding of console:
    > http://www.python.org/moin/PrintFails
    >

    --
    Ricardo
    Ricardo Bugalho, Jan 14, 2005
    #3
  4. Ricardo Bugalho

    Serge Orlov Guest

    Ricardo Bugalho wrote:
    > Hi,
    > thanks for the information. But what I was really looking for was
    > informaion on when and why Python started doing it (previously, it
    > always used sys.getdefaultencoding()))


    I don't have access to any other version except 2.2 at the moment but I
    believe it happened between 2.2 and 2.3 for Windows and UNIX terminals.
    On other unsupported terminals I suspect sys.getdefaultencoding is
    still used. The reason for the change is proper support of unicode
    input/output.


    > and why it was done only for 'print' when
    > stdout is a terminal instead of always.


    The real question is why not *never* use sys.getdefaultencoding()
    for printing. If you leave sys.getdefaultencoding() at Python default
    value ('ascii') you won't need to worry about it <wink>
    sys.getdefaultencoding() is a temporary measure for big projects to
    use within one Python version.

    Serge.
    Serge Orlov, Jan 14, 2005
    #4
  5. Ricardo Bugalho wrote:
    > thanks for the information. But what I was really looking for was
    > informaion on when and why Python started doing it (previously, it always
    > used sys.getdefaultencoding())) and why it was done only for 'print' when
    > stdout is a terminal instead of always.


    It does that since 2.2, in response to many complains that you cannot
    print a Unicode string in interactive mode, unless the Unicode string
    contains only ASCII characters. It does that only if sys.stdout is
    a real terminal, because otherwise it is not possible to determine
    what the encoding of sys.stdout is.

    Regards,
    Martin
    =?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=, Jan 14, 2005
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Spamtrap

    UTF8 to Unicode conversion

    Spamtrap, Jul 30, 2004, in forum: Perl
    Replies:
    6
    Views:
    9,915
    Joe Smith
    Jul 31, 2004
  2. Holger Joukl
    Replies:
    5
    Views:
    532
    Ben Finney
    Dec 13, 2006
  3. keto
    Replies:
    0
    Views:
    935
  4. David Cournapeau

    print a vs print '%s' % a vs print '%f' a

    David Cournapeau, Dec 30, 2008, in forum: Python
    Replies:
    0
    Views:
    348
    David Cournapeau
    Dec 30, 2008
  5. , India
    Replies:
    2
    Views:
    461
    Fraser Ross
    Sep 15, 2009
Loading...

Share This Page