Difference between readlines() and iterating on a file object?

R

Richard

Hi,

Can anyone tell me what the difference is between

for line in file.readlines( ):

and

for line in file:

where file is a file object returned from an open( ) call?

I thought that they did the same thing, but the code I am using it in has
this line called more than once on the same file object and the second time
it is ran gives different results for each.

What is the difference in implementation?

Cheers

Rich
 
C

Christopher T King

Can anyone tell me what the difference is between

for line in file.readlines( ):

and

for line in file:

where file is a file object returned from an open( ) call?

The first form slurps every line in the file into a list, and then goes
through each item in the list in turn.

The second form skips the middleman, and simply goes through each line of
the file in turn (no interim list is created). In this context, file is
acting as a generator. Because a list isn't created, this form is both
faster and consumes less memory, overall making it much more efficient
than .readlines().
I thought that they did the same thing, but the code I am using it in has
this line called more than once on the same file object and the second time
it is ran gives different results for each.

Assuming you don't prematurely exit the for loop or access the file in
another manner while looping, both forms should give identical results.
Otherwise...
What is the difference in implementation?

Because first form slurps everything in at once, repeated calls to it
(with no intervening seek()s) will always return an empty list, whether
the for loop was stopped prematurely or not.

On the other hand, since the second form only reads one line at a time
(using file.next()), if the for loop is stopped prematurely (e.g. via
break), subsequent invocations will pick up right where the previous one
left off.

Hope this helps.
 
D

Duncan Booth

Hi,

Can anyone tell me what the difference is between

for line in file.readlines( ):

reads the entire file into memory and splits it up into a list of lines
then iterates over the list. If you break from the loop, tough you've lost
any lines that were read but you didn't handle.
and

for line in file:

reads part of the file and strips off one line at a time. Never creates a
list. Reads more only when it runs out of the block it read. If you break
from the loop you can do another 'for line in file' and get the remaining
lines.
 
M

Michael Hudson

Duncan Booth said:
reads the entire file into memory and splits it up into a list of lines
then iterates over the list. If you break from the loop, tough you've lost
any lines that were read but you didn't handle.


reads part of the file and strips off one line at a time. Never creates a
list. Reads more only when it runs out of the block it read. If you break
from the loop you can do another 'for line in file' and get the remaining
lines.

But this last part only works the way you expect in 2.3, I think.

Cheers,
mwh
 
R

Roy Smith

Christopher T King said:
Assuming you don't prematurely exit the for loop or access the file in
another manner while looping, both forms should give identical results.
Otherwise...

Well, there is a corner case if some external process is writing to the
file while you're reading it. The "in file.readlines():" version will
get a snapshot of the file at the time you read it, while the "in file:"
version will do a sequence of reads over time.

Not that I think this is what's going on in the OP's case, but it's
something to be aware of.
 
H

Hari Pulapaka

Duncan said:
reads the entire file into memory and splits it up into a list of lines
then iterates over the list. If you break from the loop, tough you've lost
any lines that were read but you didn't handle.




reads part of the file and strips off one line at a time. Never creates a
list. Reads more only when it runs out of the block it read. If you break
from the loop you can do another 'for line in file' and get the remaining
lines.

However, one thing that bit me was that you cant use f.tell() to get the
current position of the line in the file. If you use "for line in
fileobject:" and then you first line is fileobject.tell() that will
return the end of file position and not the position of the next line.
Might be a bit counter-intuitive.

I am learning to be a better python programmer and I have written this
small program to parse Mail box files and display emails which match the
specified text. Any comments on this will appreciated. I know I can read
the whole file using readlines(), not sure if that is good idea?


Batigol:~/pgrep hari$ cat pgrep.py
import sys

hits = {}
lines = {}
count = 0
emailstart = "From -"

def build(f, str):

global count, hits, lines

f.seek(0)
start_email = 0
end_email = 0
pointers = []
str_matched = []
found = 0

line = f.readline()

while line != '':
if line.find(emailstart) != -1:
# Start of Mail
start_email = f.tell()
if found == 1:
#print "From - inside found "
pointers.append(end_email)
found = 0
hits[count] = pointers
lines[count] = str_matched
count += 1
pointers = []
str_matched = []

if line.find(str) != -1:
# Found string
#print "Found string: "
#print "count", count
if len(pointers) == 0:
pointers.append(start_email)
found = 1
str_matched.append(line)
#lines[count] = line

end_email = f.tell()
line = f.readline()

def display(f):
global count, hits, lines

if count == 0:
sys.stdout.write("Not found! \n")
sys.stdout.flush()
sys.exit(0)

sys.stdout.write("#: Line Contents\n")
for i in range(count):
for j in range(len(lines)):
choice = "%s: %s" %(i, lines[j])
sys.stdout.write(choice)

sys.stdout.write("Enter # of email to display: ")
sys.stdout.flush()
input = sys.stdin.readline()
try:
i = int(input.strip())
f.seek(hits[0])
while f.tell() != hits[1]:
sys.stdout.write(f.readline())
except:
sys.stderr.write("Invalid choice\n")

sys.stdout.flush()

if __name__ == "__main__":
try:
f = file(sys.argv[1], "r")
except:
sys.stdout.write("Error opening file\n")
sys.exit(1)

build(f, sys.argv[2])
response = 'n'
#print response
while response == 'n':
display(f)
sys.stdout.write("Do you want to quit, y or n? ")
sys.stdout.flush()
response = sys.stdin.readline().strip()

f.close()
sys.exit(0)



Thanks,

Hari
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top