reading specific lines of a file

Y

Yi Xing

Hi All,

I want to read specific lines of a huge txt file (I know the line #).
Each line might have different sizes. Is there a convenient and fast
way of doing this in Python? Thanks.

Yi Xing
 
P

Pierre Quentel

If the line number of the first line is 0 :

source=open('afile.txt')
for i,line in enumerate(source):
if i == line_num:
break
print line

Pierre
 
P

Piet van Oostrum

Yi Xing said:
YX> Hi All,
YX> I want to read specific lines of a huge txt file (I know the line #). Each
YX> line might have different sizes. Is there a convenient and fast way of
YX> doing this in Python? Thanks.

Not fast. You have to read all preceding lines.
If you have to do this many times while the file does not change, you could
build an index into the file.
 
B

Bill Pursell

Yi said:
I want to read specific lines of a huge txt file (I know the line #).
Each line might have different sizes. Is there a convenient and fast
way of doing this in Python? Thanks.

#!/usr/bin/env python

import os,sys
line = int(sys.argv[1])
path = sys.argv[2]
os.system("sed -n %dp %s"%(line,path))


Some might argue that this is not really doing
it in Python. In fact, I would argue that! But if
you're at a command prompt and you want to
see line 7358, it's much easier to type
% sed -n 7358p
than it is to write the python one-liner.
 
S

Simon Forman

Yi said:
Hi All,

I want to read specific lines of a huge txt file (I know the line #).
Each line might have different sizes. Is there a convenient and fast
way of doing this in Python? Thanks.

Yi Xing

I once had to do a lot of random access of lines in a multi gigabyte
log file. I found that a very fast way to do this was to build an
index file containing the int offset in bytes of each line in the log
file.

I could post the code if you're interested.

Peace,
~Simon
 
M

Marc 'BlackJack' Rintsch

I want to read specific lines of a huge txt file (I know the line #).
Each line might have different sizes. Is there a convenient and fast
way of doing this in Python? Thanks.

Don't know how efficient the `linecache` module in the standard library is
implemented but you might have a look at it.

Ciao,
Marc 'BlackJack' Rintsch
 
N

Nick Vatamaniuc

Yi,
Use the linecache module. The documentation states that :
"""
The linecache module allows one to get any line from any file, while
attempting to optimize internally, using a cache, the common case where
many lines are read from a single file.'sys:x:3:3:sys:/dev:/bin/sh\012'
"""

Please note that you cannot really skip over the lines unless each has
a fixed known size. (and if all lines have a fixed, known size then
they can be considered as 'records' and you can use seek() and other
random access magic. That is why sometimes it is a lot faster to use
fixed length rows in a database => increase the speed of search but at
the expense of wasted space! - but this is a another topic for another
discussion...).

So the point is that you won't be able to jump to line 15000 without
reading lines 0-14999. You can either iterate over the rows by yourself
or simply use the 'linecache' module like shown above. If I were you I
would use the linecache, but of course you don't mention anything about
the context of your project so it is hard to say.

Hope this helps,
Nick Vatamaniuc
 
J

John Machin

Yi,
Use the linecache module.

Yi, *don't* use the linecache module without carefully comparing the
documentation and the implementation with your requirements.

You will find that you have the source code on your computer -- mine
(Windows box) is at c:\Python24\Lib\linecache.py. When you read right
down to the end (it's not a large file, only 108 lines), you'll find this:

try:
fp = open(fullname, 'rU')
lines = fp.readlines()
fp.close()
except IOError, msg:
## print '*** Cannot open', fullname, ':', msg
return []
size, mtime = stat.st_size, stat.st_mtime
cache[filename] = size, mtime, lines, fullname

Looks like it's caching the *whole* of *each* file. Not unreasonable
given it appears to have been written to get source lines to include in
tracebacks.

It might just not be what you want if as you say you have "a huge txt
file". How many megabytes is "huge"?

Cheers,
John

The documentation states that :
 
F

Fredrik Lundh

Bill said:
Some might argue that this is not really doing
it in Python. In fact, I would argue that! But if
you're at a command prompt and you want to
see line 7358, it's much easier to type
% sed -n 7358p
than it is to write the python one-liner.

'sed' is not recognized as an internal or external command,
operable program or batch file.

</F>
 
L

Lawrence D'Oliveiro

Yi Xing said:
I want to read specific lines of a huge txt file (I know the line #).
Each line might have different sizes. Is there a convenient and fast
way of doing this in Python? Thanks.

file("myfile.txt").readlines()[LineNr]

Convenient, yes. Fast, no. :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,682
Members
48,796
Latest member
Greg L.

Latest Threads

Top