Reading from stdin

Discussion in 'Python' started by Luis Zarrabeitia, Oct 7, 2008.

  1. I have a problem with this piece of code:

    ====
    import sys
    for line in sys.stdin:
    print "You said!", line
    ====

    Namely, it seems that the stdin buffers the input, so there is no reply until
    a huge amount of text has bin written. The iterator returned by xreadlines
    has the same behavior.

    The stdin.readline() function doesn't share that behaviour (it returns as soon
    as I hit 'enter').

    ??Is there any way to tell stdin's iterator not to buffer the input? Is it
    part of the standard file protocol?
     
    Luis Zarrabeitia, Oct 7, 2008
    #1
    1. Advertisements

  2. Perhaps line-buffering simply doesn't apply when you use a file object as an
    iterator.
     
    Lawrence D'Oliveiro, Oct 7, 2008
    #2
    1. Advertisements

  3. Not an answer to your actual question, but you can keep the 'for' loop
    instead of rewriting it with 'while' using the iter(function,
    sentinel) idiom:

    for line in iter(sys.stdin.readline, ""):
    print "You said!", line

    George
     
    George Sakkis, Oct 7, 2008
    #3
  4. You're right, it's not an answer to my actual question, but I had completely
    forgotten about the 'sentinel' idiom. Many thanks... I was trying to do it
    with 'itertools', obviously with no luck.

    The question still stands (how to turn off the buffering), but this is a nice
    workaround until it gets answered.
     
    Luis Zarrabeitia, Oct 8, 2008
    #4
  5. You cut out the question you replied to, but left the rest. I got a bit
    confused until I remembered that *I* wrote the email :D.

    Anyway, I changed the program to:

    ===
    buff = file("test")
    for line in buff:
    print "you said", line
    ===

    where 'test' is a named pipe (mkfifo test) to see if the line-buffering also
    happened with a file object, and it does. As with stdin, nothing gets printed
    until the end of the file or it receives a huge amount of lines, but
    using '.readline()' works immediately. So it seems that the buffering
    behavior happens by default on stdin and file. It makes sense, as type(stdin)
    is 'file'. I can't test it now, but I think the sockets also do input
    buffering. I guess one doesn't notice it on the general case because disk
    reading happens too fast to see the delay.

    That raises a related question: is there any use-case where is better to lock
    the input until a lot of data is received, even when the requested data is
    already available? Output buffering is understandable and desired (how do I
    turn it off, by the way?), and even that one wont lock unless requested to
    lock (flush), but I can't find examples where input buffering helps.

    (full example with pipes)
    $ mkfifo test
    $ cat > test
    [write data here]

    on another console, just execute the script.

    Oh, I forgot:
    Linux 2.6.24, python 2.5.2, Debian's standard build. I don't have windows at
    hand to try it.
     
    Luis Zarrabeitia, Oct 8, 2008
    #5
  6. The closest answer I found comes from the docs (http://docs.python.org/
    library/stdtypes.html#file-objects):

    """
    In order to make a for loop the most efficient way of looping over the
    lines of a file (a very common operation), the next() method uses a
    hidden read-ahead buffer. As a consequence of using a read-ahead
    buffer, combining next() with other file methods (like readline())
    does not work right.
    """

    I guess the phrasing "hidden read-ahead buffer" implies that buffering
    cannot be turned off (or at least it is not intended to even if it's
    somehow possible).

    George
     
    George Sakkis, Oct 8, 2008
    #6
  7. Hmm. I wonder how those optimizations look like. Apparently, readline() cannot
    read from that read-ahead buffer, and that by itself sounds bad. Currently,
    if you loop a few times with next, you cannot use readline afterwards until
    you seek() to an absolute position.

    Actually, I think I may be replying to myself here. I imagine that 'next' will
    read a block instead of a character, and look for lines in there, and as the
    underlying OS likely blocks until the whole block is read, 'next' cannot
    avoid it. That doesn't explain, though, why readline() can't use next's
    buffer, why next doesn't have a sensible timeout for interactive sessions
    (unless the OS doesn't support it), and why the readahead cannot be turned
    off.

    I think I'll have to stick for now with the iter(function,sentinel) solution.

    And I may try to find next()'s implementation... I guess I'll be downloading
    python's source when my bandwidth allows it (or find it on a browseable
    repository)

    On a related note, help(file.read) shows:

    =====
    read(...)
    read([size]) -> read at most size bytes, returned as a string.

    If the size argument is negative or omitted, read until EOF is reached.
    Notice that when in non-blocking mode, less data than what was requested
    may be returned, even if no size parameter was given.
    =====

    But it doesn't say how to put the file object in non-blocking mode. (I was
    trying to put the file object in non-blocking mode to test next()'s
    behavior). ??Ideas?
     
    Luis Zarrabeitia, Oct 8, 2008
    #7
  8. Gabriel Genellina, Oct 14, 2008
    #8
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.