Reading from stdin

Luis Zarrabeitia · Oct 7, 2008

I have a problem with this piece of code:

====
import sys
for line in sys.stdin:
print "You said!", line
====

Namely, it seems that the stdin buffers the input, so there is no reply until
a huge amount of text has bin written. The iterator returned by xreadlines
has the same behavior.

The stdin.readline() function doesn't share that behaviour (it returns as soon
as I hit 'enter').

??Is there any way to tell stdin's iterator not to buffer the input? Is it
part of the standard file protocol?

Lawrence D'Oliveiro · Oct 7, 2008

I have a problem with this piece of code:

====
import sys
for line in sys.stdin:
print "You said!", line
====

Namely, it seems that the stdin buffers the input, so there is no reply
until a huge amount of text has bin written. The iterator returned by
xreadlines has the same behavior.

The stdin.readline() function doesn't share that behaviour (it returns as
soon as I hit 'enter').

Perhaps line-buffering simply doesn't apply when you use a file object as an
iterator.

George Sakkis · Oct 7, 2008

Luis said:
I have a problem with this piece of code:

====
import sys
for line in sys.stdin:
print "You said!", line
====

Namely, it seems that the stdin buffers the input, so there is no reply until
a huge amount of text has bin written. The iterator returned by xreadlines
has the same behavior.

The stdin.readline() function doesn't share that behaviour (it returns as soon
as I hit 'enter').

??Is there any way to tell stdin's iterator not to buffer the input? Is it
part of the standard file protocol?

Not an answer to your actual question, but you can keep the 'for' loop
instead of rewriting it with 'while' using the iter(function,
sentinel) idiom:

for line in iter(sys.stdin.readline, ""):
print "You said!", line

George

Luis Zarrabeitia · Oct 8, 2008

Not an answer to your actual question, but you can keep the 'for' loop
instead of rewriting it with 'while' using the iter(function,
sentinel) idiom:

for line in iter(sys.stdin.readline, ""):
print "You said!", line

You're right, it's not an answer to my actual question, but I had completely
forgotten about the 'sentinel' idiom. Many thanks... I was trying to do it
with 'itertools', obviously with no luck.

The question still stands (how to turn off the buffering), but this is a nice
workaround until it gets answered.

Luis Zarrabeitia · Oct 8, 2008

Perhaps line-buffering simply doesn't apply when you use a file object as
an iterator.

You cut out the question you replied to, but left the rest. I got a bit
confused until I remembered that *I* wrote the email

.

Anyway, I changed the program to:

===
buff = file("test")
for line in buff:
print "you said", line
===

where 'test' is a named pipe (mkfifo test) to see if the line-buffering also
happened with a file object, and it does. As with stdin, nothing gets printed
until the end of the file or it receives a huge amount of lines, but
using '.readline()' works immediately. So it seems that the buffering
behavior happens by default on stdin and file. It makes sense, as type(stdin)
is 'file'. I can't test it now, but I think the sockets also do input
buffering. I guess one doesn't notice it on the general case because disk
reading happens too fast to see the delay.

That raises a related question: is there any use-case where is better to lock
the input until a lot of data is received, even when the requested data is
already available? Output buffering is understandable and desired (how do I
turn it off, by the way?), and even that one wont lock unless requested to
lock (flush), but I can't find examples where input buffering helps.

(full example with pipes)
$ mkfifo test
$ cat > test
[write data here]

on another console, just execute the script.

Oh, I forgot:
Linux 2.6.24, python 2.5.2, Debian's standard build. I don't have windows at
hand to try it.

George Sakkis · Oct 8, 2008

You're right, it's not an answer to my actual question, but I had completely
forgotten about the 'sentinel' idiom. Many thanks... I was trying to do it
with 'itertools', obviously with no luck.

The question still stands (how to turn off the buffering), but this is a nice
workaround until it gets answered.

The closest answer I found comes from the docs (http://docs.python.org/
library/stdtypes.html#file-objects):

"""
In order to make a for loop the most efficient way of looping over the
lines of a file (a very common operation), the next() method uses a
hidden read-ahead buffer. As a consequence of using a read-ahead
buffer, combining next() with other file methods (like readline())
does not work right.
"""

I guess the phrasing "hidden read-ahead buffer" implies that buffering
cannot be turned off (or at least it is not intended to even if it's
somehow possible).

George

Luis Zarrabeitia · Oct 8, 2008

"""
In order to make a for loop the most efficient way of looping over the
lines of a file (a very common operation), the next() method uses a
hidden read-ahead buffer. As a consequence of using a read-ahead
buffer, combining next() with other file methods (like readline())
does not work right.
"""

I guess the phrasing "hidden read-ahead buffer" implies that buffering
cannot be turned off (or at least it is not intended to even if it's
somehow possible).

Hmm. I wonder how those optimizations look like. Apparently, readline() cannot
read from that read-ahead buffer, and that by itself sounds bad. Currently,
if you loop a few times with next, you cannot use readline afterwards until
you seek() to an absolute position.

Actually, I think I may be replying to myself here. I imagine that 'next' will
read a block instead of a character, and look for lines in there, and as the
underlying OS likely blocks until the whole block is read, 'next' cannot
avoid it. That doesn't explain, though, why readline() can't use next's
buffer, why next doesn't have a sensible timeout for interactive sessions
(unless the OS doesn't support it), and why the readahead cannot be turned
off.

I think I'll have to stick for now with the iter(function,sentinel) solution.

And I may try to find next()'s implementation... I guess I'll be downloading
python's source when my bandwidth allows it (or find it on a browseable
repository)

On a related note, help(file.read) shows:

=====
read(...)
read([size]) -> read at most size bytes, returned as a string.

If the size argument is negative or omitted, read until EOF is reached.
Notice that when in non-blocking mode, less data than what was requested
may be returned, even if no size parameter was given.
=====

But it doesn't say how to put the file object in non-blocking mode. (I was
trying to put the file object in non-blocking mode to test next()'s
behavior). ??Ideas?

Gabriel Genellina · Oct 14, 2008

And I may try to find next()'s implementation... I guess I'll be
downloading
python's source when my bandwidth allows it (or find it on a browseable
repository)

Try http://svn.python.org - in particular,
http://svn.python.org/view/python/trunk/Objects/fileobject.c

Reading twice from STDIN	6	Dec 1, 2011
parsing email from stdin	0	Oct 8, 2013
Reading from stdin first, then use curses	0	Aug 11, 2013
split lines from stdin into a list of unicode strings	0	Aug 28, 2013
File read from stdin and printed to temp file are not identicial?	5	Sep 17, 2010
reading one byte from stdin	1	Jul 16, 2008
reading filenames from stdin - with umlauts?	18	Jul 27, 2008
Line-by-line processing when stdin is not a tty	20	Aug 11, 2010

Reading from stdin

Luis Zarrabeitia

Lawrence D'Oliveiro

George Sakkis

Luis Zarrabeitia

Luis Zarrabeitia

George Sakkis

Luis Zarrabeitia

Gabriel Genellina

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads