Reading from stdin

  • Thread starter Luis Zarrabeitia
  • Start date
L

Luis Zarrabeitia

I have a problem with this piece of code:

====
import sys
for line in sys.stdin:
print "You said!", line
====

Namely, it seems that the stdin buffers the input, so there is no reply until
a huge amount of text has bin written. The iterator returned by xreadlines
has the same behavior.

The stdin.readline() function doesn't share that behaviour (it returns as soon
as I hit 'enter').

??Is there any way to tell stdin's iterator not to buffer the input? Is it
part of the standard file protocol?
 
L

Lawrence D'Oliveiro

I have a problem with this piece of code:

====
import sys
for line in sys.stdin:
print "You said!", line
====

Namely, it seems that the stdin buffers the input, so there is no reply
until a huge amount of text has bin written. The iterator returned by
xreadlines has the same behavior.

The stdin.readline() function doesn't share that behaviour (it returns as
soon as I hit 'enter').

Perhaps line-buffering simply doesn't apply when you use a file object as an
iterator.
 
G

George Sakkis

Luis said:
I have a problem with this piece of code:

====
import sys
for line in sys.stdin:
print "You said!", line
====

Namely, it seems that the stdin buffers the input, so there is no reply until
a huge amount of text has bin written. The iterator returned by xreadlines
has the same behavior.

The stdin.readline() function doesn't share that behaviour (it returns as soon
as I hit 'enter').

??Is there any way to tell stdin's iterator not to buffer the input? Is it
part of the standard file protocol?

Not an answer to your actual question, but you can keep the 'for' loop
instead of rewriting it with 'while' using the iter(function,
sentinel) idiom:

for line in iter(sys.stdin.readline, ""):
print "You said!", line

George
 
L

Luis Zarrabeitia

Not an answer to your actual question, but you can keep the 'for' loop
instead of rewriting it with 'while' using the iter(function,
sentinel) idiom:

for line in iter(sys.stdin.readline, ""):
print "You said!", line

You're right, it's not an answer to my actual question, but I had completely
forgotten about the 'sentinel' idiom. Many thanks... I was trying to do it
with 'itertools', obviously with no luck.

The question still stands (how to turn off the buffering), but this is a nice
workaround until it gets answered.
 
L

Luis Zarrabeitia

Perhaps line-buffering simply doesn't apply when you use a file object as
an iterator.

You cut out the question you replied to, but left the rest. I got a bit
confused until I remembered that *I* wrote the email :D.

Anyway, I changed the program to:

===
buff = file("test")
for line in buff:
print "you said", line
===

where 'test' is a named pipe (mkfifo test) to see if the line-buffering also
happened with a file object, and it does. As with stdin, nothing gets printed
until the end of the file or it receives a huge amount of lines, but
using '.readline()' works immediately. So it seems that the buffering
behavior happens by default on stdin and file. It makes sense, as type(stdin)
is 'file'. I can't test it now, but I think the sockets also do input
buffering. I guess one doesn't notice it on the general case because disk
reading happens too fast to see the delay.

That raises a related question: is there any use-case where is better to lock
the input until a lot of data is received, even when the requested data is
already available? Output buffering is understandable and desired (how do I
turn it off, by the way?), and even that one wont lock unless requested to
lock (flush), but I can't find examples where input buffering helps.

(full example with pipes)
$ mkfifo test
$ cat > test
[write data here]

on another console, just execute the script.

Oh, I forgot:
Linux 2.6.24, python 2.5.2, Debian's standard build. I don't have windows at
hand to try it.
 
G

George Sakkis

You're right, it's not an answer to my actual question, but I had completely
forgotten about the 'sentinel' idiom. Many thanks... I was trying to do it
with 'itertools', obviously with no luck.

The question still stands (how to turn off the buffering), but this is a nice
workaround until it gets answered.

The closest answer I found comes from the docs (http://docs.python.org/
library/stdtypes.html#file-objects):

"""
In order to make a for loop the most efficient way of looping over the
lines of a file (a very common operation), the next() method uses a
hidden read-ahead buffer. As a consequence of using a read-ahead
buffer, combining next() with other file methods (like readline())
does not work right.
"""

I guess the phrasing "hidden read-ahead buffer" implies that buffering
cannot be turned off (or at least it is not intended to even if it's
somehow possible).

George
 
L

Luis Zarrabeitia

"""
In order to make a for loop the most efficient way of looping over the
lines of a file (a very common operation), the next() method uses a
hidden read-ahead buffer. As a consequence of using a read-ahead
buffer, combining next() with other file methods (like readline())
does not work right.
"""

I guess the phrasing "hidden read-ahead buffer" implies that buffering
cannot be turned off (or at least it is not intended to even if it's
somehow possible).

Hmm. I wonder how those optimizations look like. Apparently, readline() cannot
read from that read-ahead buffer, and that by itself sounds bad. Currently,
if you loop a few times with next, you cannot use readline afterwards until
you seek() to an absolute position.

Actually, I think I may be replying to myself here. I imagine that 'next' will
read a block instead of a character, and look for lines in there, and as the
underlying OS likely blocks until the whole block is read, 'next' cannot
avoid it. That doesn't explain, though, why readline() can't use next's
buffer, why next doesn't have a sensible timeout for interactive sessions
(unless the OS doesn't support it), and why the readahead cannot be turned
off.

I think I'll have to stick for now with the iter(function,sentinel) solution.

And I may try to find next()'s implementation... I guess I'll be downloading
python's source when my bandwidth allows it (or find it on a browseable
repository)

On a related note, help(file.read) shows:

=====
read(...)
read([size]) -> read at most size bytes, returned as a string.

If the size argument is negative or omitted, read until EOF is reached.
Notice that when in non-blocking mode, less data than what was requested
may be returned, even if no size parameter was given.
=====

But it doesn't say how to put the file object in non-blocking mode. (I was
trying to put the file object in non-blocking mode to test next()'s
behavior). ??Ideas?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,053
Latest member
BrodieSola

Latest Threads

Top