"Faster" I/O in a script

kalakouentin · Jun 2, 2008

I use python in order to analyze my data which are in a text form. The
script is fairly simple. It reads a line form the input file, computes
what it must compute and then write it it to a buffer/list. When the
whole reading file is processed (essential all lines) then the
algorithms goes ahead and writes them one by one on the output file.
It works fine. But because of the continuous I/O it takes a lot of
time to execute.
I think that the output phase is more or less optimized. (A loop that
reads the solutions list sequentially and puts "/n" in the appropriate
intervals). Do you know a way to actually load my data in a more
"batch-like" way so I will avoid the constant line by line reading?
I guess I could read and store the whole text in a list with each cell
being being a line and then process each line one by one again but I
don't really think that would offer me a significant time gain.
Thanx in advance for the time reading this.
Pantelis

miller.paul.w · Jun 3, 2008

Do you know a way to actually load my data in a more
"batch-like" way so I will avoid the constant line by line reading?

If your files will fit in memory, you can just do

text = file.readlines()

and Python will read the entire file into a list of strings named
'text,' where each item in the list corresponds to one 'line' of the
file.

Gary Herron · Jun 3, 2008

If your files will fit in memory, you can just do

text = file.readlines()

and Python will read the entire file into a list of strings named
'text,' where each item in the list corresponds to one 'line' of the
file.

No that won't help. That has to do *all* the same work (reading blocks
and finding line endings) as the iterator PLUS allocate and build a list.

Better to just use the iterator.

for line in file:
...

Gary Herron

Kris Kennaway · Jun 4, 2008

Gary said:
No that won't help. That has to do *all* the same work (reading blocks
and finding line endings) as the iterator PLUS allocate and build a list.
Better to just use the iterator.

for line in file:
...

Actually this *can* be much slower. Suppose I want to search a file to
see if a substring is present.

st = "some substring that is not actually in the file"
f = <50 MB log file>

Method 1:

for i in file(f):
if st in i:
break

--> 0.472416 seconds

Method 2:

Read whole file:

fh = file(f)
rl = fh.read()
fh.close()

--> 0.098834 seconds

"st in rl" test --> 0.037251 (total: .136 seconds)

Method 3:

mmap the file:

mm = mmap.mmap(fh.fileno(), 0, mmap.MAP_SHARED, mmap.PROT_READ)
"st in mm" test --> 3.589938 (<-- see my post the other day)

mm.find(st) --> 0.186895

Summary:

If you can afford the memory, it can be more efficient (more than 3
times faster in this example) to read the file into memory and process
it at once (if possible).

Mmapping the file and processing it at once is roughly as fast (I didnt
measure the difference carefully), but has the advantage that if there
are parts of the file you do not touch you don't fault them into memory.
You could also play more games and mmap chunks at a time to limit the
memory use (but you'd have to be careful with mmapping that doesn't
match record boundaries).

Kris

I want to code whatsapp phone number validator. The script will	1	Feb 20, 2023
Simple I/O problem can't get solved	6	Jun 21, 2013
Help With a Script	5	Jul 10, 2021
Help wanted to modify Gimp Script-fu : will pay	0	Aug 26, 2022
How can I make this piece of code even faster?	18	Jul 20, 2013
Reducing cache/buffer for faster display	11	Sep 27, 2012
Python Fast I/o	5	Jan 14, 2014
I Need Fix In Code	1	Apr 12, 2023

"Faster" I/O in a script

kalakouentin

miller.paul.w

Gary Herron

Kris Kennaway

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads