first, second, etc line of text file

D

Daniel Nogradi

A very simple question: I currently use a cumbersome-looking way of
getting the first, second, etc. line of a text file:

for i, line in enumerate( open( textfile ) ):
if i == 0:
print 'First line is: ' + line
elif i == 1:
print 'Second line is: ' + line
.......
.......

I thought about f = open( textfile ) and then f[0], f[1], etc but that
throws a TypeError: 'file' object is unsubscriptable.

Is there a simpler way?
 
J

Jeff

Files should be iterable on their own:

filehandle = open('/path/to/foo.txt')
for line in filehandle:
# do something...

But you could also do a generic lines = filehandle.readlines(), which
returns a list of all lines in the file, but that's a bit memory
hungry.
 
G

George Sakkis

A very simple question: I currently use a cumbersome-looking way of
getting the first, second, etc. line of a text file:

for i, line in enumerate( open( textfile ) ):
if i == 0:
print 'First line is: ' + line
elif i == 1:
print 'Second line is: ' + line
.......
.......

I thought about f = open( textfile ) and then f[0], f[1], etc but that
throws a TypeError: 'file' object is unsubscriptable.

Is there a simpler way?

If all you need is sequential access, you can use the next() method of
the file object:

nextline = open(textfile).next
print 'First line is: %r' % nextline()
print 'Second line is: %r' % nextline()
....

For random access, the easiest way is to slurp all the file in a list
using file.readlines().

HTH,
George
 
D

Daniel Nogradi

Thanks all! I think I will stick to my original method because the
files can be quite large and without reading the whole file into
memory probably enumerate( open( textfile ) ) is the only way to
access an arbitrary Nth line.
 
J

James Stroud

Daniel said:
A very simple question: I currently use a cumbersome-looking way of
getting the first, second, etc. line of a text file:

for i, line in enumerate( open( textfile ) ):
if i == 0:
print 'First line is: ' + line
elif i == 1:
print 'Second line is: ' + line
.......
.......

I thought about f = open( textfile ) and then f[0], f[1], etc but that
throws a TypeError: 'file' object is unsubscriptable.

Is there a simpler way?

This is the same logic but less cumbersome, if that's what you mean:

to_get = [0, 3, 7, 11, 13]
got = dict((i,s) for (i,s) in enumerate(open(textfile)) if i in to_get)
print got[3]

This would probably be the best way for really big files and if you know
all of the lines you want ahead of time. If you need to access the file
multiple times at arbitrary positions, you may need to seek(0), cache
lines already read, or slurp the whole thing, which has already been
suggested.

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
 
P

Paul Rubin

Daniel Nogradi said:
A very simple question: I currently use a cumbersome-looking way of
getting the first, second, etc. line of a text file:

for i, line in enumerate( open( textfile ) ):
if i == 0:
print 'First line is: ' + line
elif i == 1:
print 'Second line is: ' + line
.......
.......

from itertools import islice
first_five_lines = list(islice(open(textfile), 5))

print 'first line is', first_five_lines[0]
print 'second line is', first_five_lines[1]
....
 
G

Gabriel Genellina

Daniel said:
A very simple question: I currently use a cumbersome-looking way of
getting the first, second, etc. line of a text file:

to_get = [0, 3, 7, 11, 13]
got = dict((i,s) for (i,s) in enumerate(open(textfile)) if i in to_get)
print got[3]

This would probably be the best way for really big files and if you know
all of the lines you want ahead of time.

But it still has to read the complete file (altough it does not keep the
unwanted lines).
Combining this with Paul Rubin's suggestion of itertools.islice I think we
get the best solution:
got = dict((i,s) for (i,s) in
enumerate(islice(open(textfile),max(to_get)+1)) if i in to_get)
 
D

Daniel Nogradi

A very simple question: I currently use a cumbersome-looking way of
getting the first, second, etc. line of a text file:

for i, line in enumerate( open( textfile ) ):
if i == 0:
print 'First line is: ' + line
elif i == 1:
print 'Second line is: ' + line
.......
.......

I thought about f = open( textfile ) and then f[0], f[1], etc but that
throws a TypeError: 'file' object is unsubscriptable.

Is there a simpler way?

If all you need is sequential access, you can use the next() method of
the file object:

nextline = open(textfile).next
print 'First line is: %r' % nextline()
print 'Second line is: %r' % nextline()
...

For random access, the easiest way is to slurp all the file in a list
using file.readlines().

Thanks! This looks the best, I only need the first couple of lines
sequentially so don't need to read in the whole file ever.
 
N

Neil Cerutti

For random access, the easiest way is to slurp all the file in
a list using file.readlines().

A lazy evaluation scheme might be useful for random access that
only slurps as much as you need.

class LazySlurper(object):
r""" Lazily read a file using readline, allowing random access to the
results with __getitem__.
... "Line 0\n"
... "Line 1\n"
... "Line 2\n"
... "Line 3\n"
... "Line 4\n"
... "Line 5\n"
... "Line 6\n"
... "Line 7\n")
>>> slurper = LazySlurper(infile)
>>> print slurper[0], Line 0
>>> print slurper[5], Line 5
>>> print slurper[1], Line 1
>>> infile.close()
"""
def __init__(self, fileobj):
self.fileobj = fileobj
self.upto = 0
self.lines = []
self._readupto(0)
def _readupto(self, n):
while self.upto <= n:
line = self.fileobj.readline()
if line == "":
break
self.lines.append(line)
self.upto += 1
def __getitem__(self, n):
self._readupto(n)
return self.lines[n]
 
M

Mike

A very simple question: I currently use a cumbersome-looking way of
getting the first, second, etc. line of a text file:
for i, line in enumerate( open( textfile ) ):
if i == 0:
print 'First line is: ' + line
elif i == 1:
print 'Second line is: ' + line
.......
.......
I thought about f = open( textfile ) and then f[0], f[1], etc but that
throws a TypeError: 'file' object is unsubscriptable.
Is there a simpler way?
If all you need is sequential access, you can use the next() method of
the file object:
nextline = open(textfile).next
print 'First line is: %r' % nextline()
print 'Second line is: %r' % nextline()
...
For random access, the easiest way is to slurp all the file in a list
using file.readlines().

Thanks! This looks the best, I only need the first couple of lines
sequentially so don't need to read in the whole file ever.

if you only ever need the first few lines of a file, why not keep it
simple and do something like this?

mylines = open("c:\\myfile.txt","r").readlines()[:5]

that will give you the first five lines of the file. Replace 5 with
whatever number you need. next will work, too, obviously, but won't
that use of next hold the file open until you are done with it? Or,
more specifically, since you do not have a file object at all, won't
you have to wait until the function goes out of scope to release the
file? Would that be a problem? Or am I just being paranoid?
 
S

Steve Holden

Mike said:
A very simple question: I currently use a cumbersome-looking way of
getting the first, second, etc. line of a text file:
for i, line in enumerate( open( textfile ) ):
if i == 0:
print 'First line is: ' + line
elif i == 1:
print 'Second line is: ' + line
.......
.......
I thought about f = open( textfile ) and then f[0], f[1], etc but that
throws a TypeError: 'file' object is unsubscriptable.
Is there a simpler way?
If all you need is sequential access, you can use the next() method of
the file object:
nextline = open(textfile).next
print 'First line is: %r' % nextline()
print 'Second line is: %r' % nextline()
...
For random access, the easiest way is to slurp all the file in a list
using file.readlines().
Thanks! This looks the best, I only need the first couple of lines
sequentially so don't need to read in the whole file ever.

if you only ever need the first few lines of a file, why not keep it
simple and do something like this?

mylines = open("c:\\myfile.txt","r").readlines()[:5]

that will give you the first five lines of the file. Replace 5 with
whatever number you need. next will work, too, obviously, but won't
that use of next hold the file open until you are done with it? Or,
more specifically, since you do not have a file object at all, won't
you have to wait until the function goes out of scope to release the
file? Would that be a problem? Or am I just being paranoid?
Unfortunately the expression

f.readlines()[:5]

reads the whole file in and generates a list of the lines just so it can
slice the first five off. Compare that, on a large file, with something like

[f.next() for _ in range(5)]

and I think you will see that the latter is significantly better in
almost all respects.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
--------------- Asciimercial ------------------
Get on the web: Blog, lens and tag the Internet
Many services currently offer free registration
----------- Thank You for Reading -------------
 
G

Gabriel Genellina

En Thu, 23 Apr 2009 18:50:06 -0300, Scott David Daniels

[nice recipe to retrieve only certain lines of a file]

I think your time machine needs an adjustment, it spits things almost two
years later :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,158
Latest member
Vinay_Kumar Nevatia
Top