Iteration on file reading

P

Paul Watson

for line in sys.stdin:

Does this statement cause all of stdin to be read before the loop begins?

I may need to read several GB and I do not want to swamp the machine's
memory.
 
A

Andrew Dalke

Paul Watson
for line in sys.stdin:

Does this statement cause all of stdin to be read before the loop begins?

No. It will read a block of text at a time and break that block
into lines. This gives great performance and is scalable to
large files (so long as you can can afford to keep that extra
block around). However, it's lousy for interactive work.

Andrew
(e-mail address removed)
 
P

Paul McGuire

Try a generator. This will just read a line at a time.
-- Paul

<code>
from sys import stdin

def lineReader( strm ):
while 1:
yield strm.readline().rstrip("\n")

for f in lineReader( stdin ):
print ">>> " + f
</code>
 
A

Andrew Dalke

Paul McGuire:
def lineReader( strm ):
while 1:
yield strm.readline().rstrip("\n")

for f in lineReader( stdin ):
print ">>> " + f

You can simplify that with the iter builtin.

for f in iter(stdin.readline, ""):
print ">>> " + f

(Hmm... maybe I should test it? Naaaaahhh.)

Andrew
(e-mail address removed)
 
A

Alex Martelli

Andrew said:
Paul McGuire:

You can simplify that with the iter builtin.

for f in iter(stdin.readline, ""):
print ">>> " + f

(Hmm... maybe I should test it? Naaaaahhh.)

There is a difference in behavior: the readline method
returns a line WITH a trailing \n, which then gets
printed, giving a "double-spaced" effect. Sure, you
can strip the \n in the loop body, but if you always
want a sequence of newline-stipped lines, that is
somewhat repetitious. If the use of readline is
mandated (i.e., no direct looping on the file for one
reason or another), my favourite way of expression is:

def linesof(somefile):
for line in iter(somefile.readline, ''):
yield line.rstrip('\n')

not as concise as either of the above, but, I think,
a wee little bit clearer.


Alex
 
J

Jeremy Fincher

Paul Watson said:
for line in sys.stdin:

Does this statement cause all of stdin to be read before the loop begins?

I may need to read several GB and I do not want to swamp the machine's
memory.

Have you considered simply inputting this into an interactive
interpreter and seeing if it swamps the machine's memory?

Jeremy
 
A

Andrew Dalke

Alex:
There is a difference in behavior: the readline method
returns a line WITH a trailing \n, which then gets
printed, giving a "double-spaced" effect. Sure, you
can strip the \n in the loop body, ....

Quite true.

As it turns out, the OP wanted to know about

for line in sys.stdin:

The post to which I replied changed the spec to
remove the newline, but the main point was to
use a generator ... which could if desired to extra
work to get rid of the "\n". It could just have
easily converted everything to uppercase or done
rot13 conversion on the text.

My reply meant to point out that the iter builtin
can be used to turn a "function returns the next
object each time it's called and a sentinel when
it's done" into an iterable. I just left out the extra
work his code did since it wasn't needed by the OP.

Andrew
(e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top