How does Python get the value for sys.stdin.encoding?

Discussion in 'Python' started by RG, Aug 12, 2010.

  1. RG

    RG Guest

    I thought it was hard-coded into the Python executable at compile time,
    but that is apparently not the case:

    [ron@mickey:~]$ python
    Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29)
    [GCC 4.2.1 (Apple Inc. build 5646)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import sys;print sys.stdin.encoding

    UTF-8
    >>> ^D

    [ron@mickey:~]$ echo 'import sys;print sys.stdin.encoding' | python
    None
    [ron@mickey:~]$

    And indeed, trying to pipe unicode into Python doesn't work, even though
    it works fine when Python runs interactively. So how can I make this
    work?

    Thanks,
    rg
    RG, Aug 12, 2010
    #1
    1. Advertising

  2. On Wed, Aug 11, 2010 at 6:21 PM, RG <> wrote:
    > I thought it was hard-coded into the Python executable at compile time,
    > but that is apparently not the case:
    >
    > [ron@mickey:~]$ python
    > Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29)
    > [GCC 4.2.1 (Apple Inc. build 5646)] on darwin
    > Type "help", "copyright", "credits" or "license" for more information.
    >>>> import sys;print sys.stdin.encoding

    > UTF-8
    >>>> ^D

    > [ron@mickey:~]$ echo 'import sys;print sys.stdin.encoding' | python
    > None
    > [ron@mickey:~]$
    >
    > And indeed, trying to pipe unicode into Python doesn't work, even though
    > it works fine when Python runs interactively.  So how can I make this
    > work?
    >


    Sys.stdin and stdout are files, just like any other. There's nothing
    special about them at compile time. When the interpreter starts, it
    checks to see if they are ttys. If they are, then it tries to figure
    out the terminal's encoding based on the environment. The code for
    this is in pythonrun.c if you want to see exactly what it's doing. If
    stdout and stdin aren't ttys, then their encoding stays as None and
    the interpreter will use sys.getdefaultencoding() if you try printing
    Unicode strings.

    By the way, there is no such thing as piping Unicode into Python.
    Unicode is an abstract concept where each character maps to a
    codepoint. Pipes can only deal with bytes. You may be using one of the
    5 encodings capable of holding the entire range of Unicode characters
    (UTF-8, UTF-16 LE, UTF-16 BE, UTF-32 LE, and UTF-32 BE), but that's
    not the same thing as Unicode. You really have to watch your encodings
    when you pass data around between programs. There's no way to avoid
    it.
    Benjamin Kaplan, Aug 12, 2010
    #2
    1. Advertising

  3. RG

    RG Guest

    In article <>,
    Benjamin Kaplan <> wrote:

    > On Wed, Aug 11, 2010 at 6:21 PM, RG <> wrote:
    > > I thought it was hard-coded into the Python executable at compile time,
    > > but that is apparently not the case:
    > >
    > > [ron@mickey:~]$ python
    > > Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29)
    > > [GCC 4.2.1 (Apple Inc. build 5646)] on darwin
    > > Type "help", "copyright", "credits" or "license" for more information.
    > >>>> import sys;print sys.stdin.encoding

    > > UTF-8
    > >>>> ^D

    > > [ron@mickey:~]$ echo 'import sys;print sys.stdin.encoding' | python
    > > None
    > > [ron@mickey:~]$
    > >
    > > And indeed, trying to pipe unicode into Python doesn't work, even though
    > > it works fine when Python runs interactively.  So how can I make this
    > > work?
    > >

    >
    > Sys.stdin and stdout are files, just like any other. There's nothing
    > special about them at compile time. When the interpreter starts, it
    > checks to see if they are ttys. If they are, then it tries to figure
    > out the terminal's encoding based on the environment. The code for
    > this is in pythonrun.c if you want to see exactly what it's doing.


    Thanks. Looks like the magic incantation is:

    export PYTHONIOENCODING='utf-8'

    > By the way, there is no such thing as piping Unicode into Python.


    Yeah, I know. I should have said "piping UTF-8 encoded unicode" or
    something like that.

    > You really have to watch your encodings
    > when you pass data around between programs. There's no way to avoid
    > it.


    Yeah, I keep re-learning that lesson again and again.

    rg
    RG, Aug 12, 2010
    #3
  4. RG

    Anssi Saari Guest

    Benjamin Kaplan <> writes:

    > Sys.stdin and stdout are files, just like any other. There's nothing
    > special about them at compile time. When the interpreter starts, it
    > checks to see if they are ttys. If they are, then it tries to figure
    > out the terminal's encoding based on the environment.


    Just a related question, is looking at sys.stdin.encoding the proper
    way of doing things? I've been working on a script to display some
    email headers, some of which are encoded in MIME to various charsets.

    Until now I have used whatever locale.getdefaultlocale() returns as
    the target encoding, since "it seemed to work". Although on one
    computer the call returns ISO-8859-15 even though I don't quite
    understand why.
    Anssi Saari, Aug 12, 2010
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    2
    Views:
    637
    velle
    Jan 5, 2006
  2. sys.stdin.encoding

    , Dec 11, 2006, in forum: Python
    Replies:
    8
    Views:
    1,167
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
    Dec 11, 2006
  3. Replies:
    0
    Views:
    446
  4. lynvie
    Replies:
    4
    Views:
    729
    Tim Harig
    Jun 12, 2009
  5. Replies:
    5
    Views:
    2,031
Loading...

Share This Page