Checking for EOF in stream

Discussion in 'Python' started by GiBo, Feb 19, 2007.

  1. GiBo

    GiBo Guest

    Hi!

    Classic situation - I have to process an input stream of unknown length
    until a I reach its end (EOF, End Of File). How do I check for EOF? The
    input stream can be anything from opened file through sys.stdin to a
    network socket. And it's binary and potentially huge (gigabytes), thus
    "for line in stream.readlines()" isn't really a way to go.

    For now I have roughly:

    stream = sys.stdin
    while True:
    data = stream.read(1024)
    process_data(data)
    if len(data) < 1024: ## (*)
    break

    I smell a fragile point at (*) because as far as I know e.g. network
    sockets streams may return less data than requested even when the socket
    is still open.

    I'd better like something like:

    while not stream.eof():
    ...

    but there is not eof() method :-(

    This is probably a trivial problem but I haven't found a decent solution.

    Any hints?

    Thanks!

    GiBo
     
    GiBo, Feb 19, 2007
    #1
    1. Advertisements

  2. if len(data) == 0:
    break #EOF
     
    Grant Edwards, Feb 20, 2007
    #2
    1. Advertisements

  3. GiBo

    GiBo Guest

    Right, not a big difference though. Isn't there a cleaner / more
    intuitive way? Like using some wrapper objects around the streams or
    something?

    GiBo
     
    GiBo, Feb 20, 2007
    #3
  4. Read the documentation... For a true file object:
    read([size]) ... An empty string is returned when EOF is encountered
    immediately.
    All the other "file-like" objects (like StringIO, socket.makefile, etc)
    maintain this behavior.
    So this is the way to check for EOF. If you don't like how it was spelled,
    try this:

    if data=="": break

    If your data is made of lines of text, you can use the file as its own
    iterator, yielding lines:

    for line in stream:
    process_line(line)
     
    Gabriel Genellina, Feb 20, 2007
    #4
  5. A file is at EOF when read() returns ''. The above is the
    cleanest, simplest, most direct way to do what you specified.
    Everybody does it that way, and everybody recognizes what's
    being done.

    It's also the "standard, Pythonic" way to do it.
    You can do that, but then you're mostly just obfuscating
    things.
     
    Grant Edwards, Feb 20, 2007
    #5
  6. GiBo

    Jon Ribbens Guest

    How about:

    if not data: break

    ? ;-)
     
    Jon Ribbens, Feb 20, 2007
    #6
  7. GiBo

    Nathan Guest

    Not to beat a dead horse, but I often do this:

    data = f.read(bufsize):
    while data:
    # ... process data.
    data = f.read(bufsize)


    -The only annoying bit it the duplicated line. I find I often follow
    this pattern, and I realize python doesn't plan to have any sort of
    do-while construct, but even still I prefer this idiom. What's the
    concensus here?

    What about creating a standard binary-file iterator:

    def blocks_of(infile, bufsize = 1024):
    data = infile.read(bufsize)
    if data:
    yield data


    -the use would look like this:

    for block in blocks_of(myfile, bufsize = 2**16):
    process_data(block) # len(block) <= bufsize...
     
    Nathan, Feb 20, 2007
    #7
  8. GiBo

    Nathan Guest


    (ahem), make that iterator something that works, like:

    def blocks_of(infile, bufsize = 1024):
    data = infile.read(bufsize)
    while data:
    yield data
    data = infile.read(bufsize)
     
    Nathan, Feb 20, 2007
    #8
  9. GiBo

    kousue Guest

    Could you use xreadlines()? It's a lazily-evaluated stream reader.
    Well it depends on a lot of things. Is the stream blocking or non-
    blocking (on sockets and some other sorts of streams, you can pick
    this yourself)? What are the underlying semantics (reliable-and-
    blocking TCP or dropping-and-unordered-UDP)? Unfortunately, you really
    need to just know what you're working with (and there's really no
    better solution; trying to hide the underlying semantics under a
    proscribed overlaid set of semantics can only lead to badness in the
    long run).
    For your case, it's not so hard:
    http://pyref.infogami.com/EOFError says "read() and readline() methods
    of file objects return an empty string when they hit EOF." so you
    should assume that if something is claiming to be a file-like object
    that it will work this way.
    So:
    stream = sys.stdin
    while True:
    data = stream.read(1024)
    if data=="":
    break
    process_data(data)
     
    kousue, Feb 27, 2007
    #9
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.