efficient text file search.

noro · Sep 11, 2006

Is there a more efficient method to find a string in a text file then:

f=file('somefile')
for line in f:
if 'string' in line:
print 'FOUND'

?

BTW:
does "for line in f: " read a block of line to te memory or is it
simply calls f.readline() many times?

thanks
amit

Luuk · Sep 11, 2006

noro said:
Is there a more efficient method to find a string in a text file then:

f=file('somefile')
for line in f:
if 'string' in line:
print 'FOUND'

yes, more efficient would be:
grep (http://www.gnu.org/software/grep/)

noro · Sep 11, 2006

via python...

yes, more efficient would be:
grep (http://www.gnu.org/software/grep/)

Luuk · Sep 11, 2006

noro said:
via python...

ok, a more serious answer:

some googling turned op the following.
Second paragraph of chapter 14 of http://www.amk.ca/python/2.1/

a.. The speed of line-oriented file I/O has been improved because people
often complain about its lack of speed, and because it's often been used as
a naïve benchmark. The readline() method of file objects has therefore been
rewritten to be much faster. The exact amount of the speedup will vary from
platform to platform depending on how slow the C library's getc() was, but
is around 66%, and potentially much faster on some particular operating
systems. Tim Peters did much of the benchmarking and coding for this change,
motivated by a discussion in comp.lang.python.
A new module and method for file objects was also added, contributed by Jeff
Epler. The new method, xreadlines(), is similar to the existing xrange()
built-in. xreadlines() returns an opaque sequence object that only supports
being iterated over, reading a line on every iteration but not reading the
entire file into memory as the existing readlines() method does. You'd use
it like this:

for line in sys.stdin.xreadlines():
# ... do something for each line ...
...
For a fuller discussion of the line I/O changes, see the python-dev summary
for January 1-15, 2001 at http://www.amk.ca/python/dev/2001-01-1.html.

Kent Johnson · Sep 11, 2006

noro said:
Is there a more efficient method to find a string in a text file then:

f=file('somefile')
for line in f:
if 'string' in line:
print 'FOUND'

Probably better to read the whole file at once if it isn't too big:
f = file('somefile')
data = f.read()
if 'string' in data:
print 'FOUND'

Ant · Sep 11, 2006

noro said:
Is there a more efficient method to find a string in a text file then:

f=file('somefile')
for line in f:
if 'string' in line:
print 'FOUND'

break
^^^
Add a 'break' after the print statement - that way you won't have to
read the entire file unless the string isn't there. That's probably not
the sort of advice you're after though

Can't see why reading the entire file in as the other poster suggested
would help, and seeing as "for line in f:" is now regarded as the
pythonic way of working with lines of text in a file, then I'd assume
that the implementation would be at least as fast as "for line in
f.xreadlines(): "

John Machin · Sep 11, 2006

Luuk wrote:
[snip]

some googling turned op the following.
Second paragraph of chapter 14 of http://www.amk.ca/python/2.1/ [snip]
For a fuller discussion of the line I/O changes, see the python-dev summary
for January 1-15, 2001 at http://www.amk.ca/python/dev/2001-01-1.html.

That is *HISTORY*. That is Python 2.1. That is the year 2001.
xreadlines is as dead as a dodo.

Luuk · Sep 11, 2006

John Machin said:
Luuk wrote:
[snip]

some googling turned op the following.
Second paragraph of chapter 14 of http://www.amk.ca/python/2.1/ [snip]
For a fuller discussion of the line I/O changes, see the python-dev
summary
for January 1-15, 2001 at http://www.amk.ca/python/dev/2001-01-1.html.

Click to expand...

That is *HISTORY*. That is Python 2.1. That is the year 2001.
xreadlines is as dead as a dodo.

Thats's why i started my reply with:
"some googling turned op the following."
i did not state that further googling was unneeded ;-)

George Sakkis · Sep 11, 2006

noro said:
Is there a more efficient method to find a string in a text file then:

f=file('somefile')
for line in f:
if 'string' in line:
print 'FOUND'

?

Is this something you want to do only once for a given file ? The
replies so far seem to imply so and in this case I doubt that you can
do anything more efficient. OTOH, if the same file is to be searched
repeatedly for different strings, an appropriate indexing scheme can
speed things up considerably on average.

George

noro · Sep 11, 2006

OK, am not sure why, but

fList=file('somefile').read()
if fList.find('string') != -1:
print 'FOUND'

works much much faster.

it is strange since i thought 'for line in file('somefile')' is
optemized and read pages to the memory,
i guess not..

Sion Arrowsmith · Sep 12, 2006

noro said:
OK, am not sure why, but

fList=file('somefile').read()
if fList.find('string') != -1:
print 'FOUND'

works much much faster.

it is strange since i thought 'for line in file('somefile')' is
optemized and read pages to the memory,

Step back and think about what each is doing at a high level of
description: file.read reads the contents of the file into memory
in one go, end of story. file.[x]readlines reads (some or all of)
the contents of the file into memeory, does a linear searches on it
for end of line characters, and copies out the line(s) into some
new bits of memory. Line-by-line processing has a *lot* more work
to do (unless you're read()ing a really big file which is going to
make heavy demands on memory allocation) and it should be no
surprise that it's slower.

Problem Splitting Text String	2	Dec 29, 2022
Php combine identical lines in text file	4	Oct 11, 2023
parse a csv file into a text file	29	Feb 6, 2014
Help with importing from multiple files and printing lines in designated spot to spit out one file.	1	Jan 16, 2023
Search and replace text in XML file?	5	Jul 28, 2012
Find and count strings of text from multiple files	17	Dec 16, 2021
Command Line Arguments	0	Mar 7, 2023
How to sort a CSV file with merge sort JAVA	7	May 6, 2021

efficient text file search.

noro

Luuk

noro

Luuk

Kent Johnson

Ant

John Machin

Luuk

George Sakkis

noro

Sion Arrowsmith

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads