How does Python get the value for sys.stdin.encoding?

RG · Aug 12, 2010

I thought it was hard-coded into the Python executable at compile time,
but that is apparently not the case:

[ron@mickey:~]$ python
Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.[ron@mickey:~]$ echo 'import sys;print sys.stdin.encoding' | python
None
[ron@mickey:~]$

And indeed, trying to pipe unicode into Python doesn't work, even though
it works fine when Python runs interactively. So how can I make this
work?

Thanks,
rg

Benjamin Kaplan · Aug 12, 2010

I thought it was hard-coded into the Python executable at compile time,
but that is apparently not the case:

[ron@mickey:~]$ python
Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.[ron@mickey:~]$ echo 'import sys;print sys.stdin.encoding' | python
None
[ron@mickey:~]$

And indeed, trying to pipe unicode into Python doesn't work, even though
it works fine when Python runs interactively. So how can I make this
work?

Sys.stdin and stdout are files, just like any other. There's nothing
special about them at compile time. When the interpreter starts, it
checks to see if they are ttys. If they are, then it tries to figure
out the terminal's encoding based on the environment. The code for
this is in pythonrun.c if you want to see exactly what it's doing. If
stdout and stdin aren't ttys, then their encoding stays as None and
the interpreter will use sys.getdefaultencoding() if you try printing
Unicode strings.

By the way, there is no such thing as piping Unicode into Python.
Unicode is an abstract concept where each character maps to a
codepoint. Pipes can only deal with bytes. You may be using one of the
5 encodings capable of holding the entire range of Unicode characters
(UTF-8, UTF-16 LE, UTF-16 BE, UTF-32 LE, and UTF-32 BE), but that's
not the same thing as Unicode. You really have to watch your encodings
when you pass data around between programs. There's no way to avoid
it.

RG · Aug 12, 2010

Benjamin Kaplan said:
I thought it was hard-coded into the Python executable at compile time,
but that is apparently not the case:

[ron@mickey:~]$ python
Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

import sys;print sys.stdin.encoding UTF-8
^D

Click to expand...

[ron@mickey:~]$ echo 'import sys;print sys.stdin.encoding' | python
None
[ron@mickey:~]$

And indeed, trying to pipe unicode into Python doesn't work, even though
it works fine when Python runs interactively. Â So how can I make this
work?

Click to expand...

Sys.stdin and stdout are files, just like any other. There's nothing
special about them at compile time. When the interpreter starts, it
checks to see if they are ttys. If they are, then it tries to figure
out the terminal's encoding based on the environment. The code for
this is in pythonrun.c if you want to see exactly what it's doing.

Thanks. Looks like the magic incantation is:

export PYTHONIOENCODING='utf-8'

By the way, there is no such thing as piping Unicode into Python.

Yeah, I know. I should have said "piping UTF-8 encoded unicode" or
something like that.

You really have to watch your encodings
when you pass data around between programs. There's no way to avoid
it.

Yeah, I keep re-learning that lesson again and again.

rg

Anssi Saari · Aug 12, 2010

Benjamin Kaplan said:
Sys.stdin and stdout are files, just like any other. There's nothing
special about them at compile time. When the interpreter starts, it
checks to see if they are ttys. If they are, then it tries to figure
out the terminal's encoding based on the environment.

Just a related question, is looking at sys.stdin.encoding the proper
way of doing things? I've been working on a script to display some
email headers, some of which are encoded in MIME to various charsets.

Until now I have used whatever locale.getdefaultlocale() returns as
the target encoding, since "it seemed to work". Although on one
computer the call returns ISO-8859-15 even though I don't quite
understand why.

How do I install GMPY 1.11 on a Mac with OS X 10.6 and Python 3.1?	19	Dec 25, 2009
Why can't I set sys.ps1 to a unicode string?	3	Aug 12, 2010
python shell silently ignores termios.tcsetattr()	4	Oct 20, 2010
recv_into(bytearray) complains about a "pinned buffer"	8	Jan 31, 2010
Different byte-code in same major version (2.6.x)?	7	Jun 15, 2010
Pyserial for Python3 in OSX?	0	Feb 16, 2014
Is wsgi ready for prime time?	8	May 17, 2007
Python 2 multiprocessing examples in docs.python.org	0	Feb 1, 2013

How does Python get the value for sys.stdin.encoding?

RG

Benjamin Kaplan

RG

Anssi Saari

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads