Multiprocessing and file I/O

Infinity77 · May 24, 2009

Hi All,

I am trying to speed up some code which reads a bunch of data from
a disk file. Just for the fun of it, I thought to try and use parallel
I/O to split the reading of the file between multiple processes.
Although I have been warned that concurrent access by multiple
processes to the same file may actually slow down the reading of the
file, I was curious to try some timings by varying the number of
processes which read the file. I know almost nothing of
multiprocessing, so I was wondering if anyone had some very simple
snippet of code which demonstrates how to read a file using
multiprocessing.

My idea was to create a "big" file by doing:

fid = open("somefile.txt", "wb")
fid.write("HELLO\n"*1e7)
fid.close()

and then using fid.seek() to point every process I start to a position
inside the file and start reading from there. For example, with 4
processes and a 10 MB file, I would tell the first process to read
from byte 0 to byte 2.5 million, the second one from 2.5 million to 5
million and so on. I just have an academic curiosity :-D

Any suggestion is very welcome, either to the approach or to the
actual implementation. Thank you for your help.

Andrea.

Igor Katson · May 24, 2009

Infinity77 said:
Hi All,

I am trying to speed up some code which reads a bunch of data from
a disk file. Just for the fun of it, I thought to try and use parallel
I/O to split the reading of the file between multiple processes.
Although I have been warned that concurrent access by multiple
processes to the same file may actually slow down the reading of the
file, I was curious to try some timings by varying the number of
processes which read the file. I know almost nothing of
multiprocessing, so I was wondering if anyone had some very simple
snippet of code which demonstrates how to read a file using
multiprocessing.

My idea was to create a "big" file by doing:

fid = open("somefile.txt", "wb")
fid.write("HELLO\n"*1e7)
fid.close()

and then using fid.seek() to point every process I start to a position
inside the file and start reading from there. For example, with 4
processes and a 10 MB file, I would tell the first process to read
from byte 0 to byte 2.5 million, the second one from 2.5 million to 5
million and so on. I just have an academic curiosity :-D

Any suggestion is very welcome, either to the approach or to the
actual implementation. Thank you for your help.

Andrea.

If the thing you would want to speed up is the processing of the file
(and not the IO), I would make one process actually read the file, and
feed the other processes with the data from the file through a queue.

Infinity77 · May 24, 2009

Hi Igor,

If the thing you would want to speed up is the processing of the file
(and not the IO), I would make one process actually read the file, and
feed the other processes with the data from the file through a queue.

No, the processing of the data is fast enough, as it is very simple.
What I was asking is if anyone could share an example of using
multiprocessing to read a file, along the lines I described above.

Andrea.

Paul Boddie · May 24, 2009

No, the processing of the data is fast enough, as it is very simple.
What I was asking is if anyone could share an example of using
multiprocessing to read a file, along the lines I described above.

Take a look at this section in an article about multi-threaded
processing of large files:

http://effbot.org/zone/wide-finder.htm#a-multi-threaded-python-solution

Paul

Infinity77 · May 25, 2009

Hi Paul & All,

Take a look at this section in an article about multi-threaded
processing of large files:

http://effbot.org/zone/wide-finder.htm#a-multi-threaded-python-solution

Thank you for the pointer, I have read the article and the follow-ups
with much interest... it's unfortunate Python is no more on the first
place though :-D
I'll see if I can come up with a faster implementation of my (f2py-
fortran-based) Python module using multiprocessing.

Thank you.

Andrea.

multiprocessing	1	Jul 9, 2013
multiprocessing & more	3	Feb 13, 2011
Digging into multiprocessing	0	Aug 12, 2013
multiprocessing in a while loop?	0	May 6, 2014
Multiprocessing / threading confusion	11	Sep 5, 2013
Different behavior with multiprocessing	5	Mar 4, 2013
multiprocessing: child process race to answer	4	Nov 1, 2013
multiprocessing problems	4	Jan 19, 2010

Multiprocessing and file I/O

Infinity77

Igor Katson

Infinity77

Paul Boddie

Infinity77

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads