multiprocessing eats memory

Max Ivanov · Sep 25, 2008

I'm playing with pyprocessing module and found that it eats lot's of
memory. I've made small test case to show it. I pass ~45mb of data to
worker processes and than get it back slightly modified. At any time
in main process there are shouldn't be no more than two copies of data
(one original data and one result). I run it on 8-core server and top
shows me that main process eats ~220 Mb and worker processes eats 90
-150 mb. Isn't it too much?

Small test-case is uploaded to pastebin: http://pastebin.ca/1210523

Istvan Albert · Sep 26, 2008

At any time in main process there are shouldn't be no more than two copies of data
(one original data and one result).

From the looks of it you are storing a lots of references to various
copies of your data via the async set.

redbaron · Sep 26, 2008

From the looks of it you are storing a lots of references to various
copies of your data via the async set.

How could I avoid of storing them? I need something to check does it
ready or not and retrieve results if ready. I couldn't see the way to
achieve same result without storing asyncs set.

MRAB · Sep 26, 2008

How could I avoid of storing them? I need something to check does it
ready or not and retrieve results if ready. I couldn't see the way to
achieve same result without storing asyncs set.

You could give each worker process an ID and then have them put the ID
into a queue to signal to the main process when finished.

BTW, your test-case modifies the asyncs set while iterating over it,
which is a bad idea.

redbaron · Sep 26, 2008

You could give each worker process an ID and then have them put the ID
into a queue to signal to the main process when finished.

And how could I retrieve result from worker process without async?

BTW, your test-case modifies the asyncs set while iterating over it,
which is a bad idea.

My fault, there was list(asyncs) originally.

Istvan Albert · Sep 27, 2008

How could I avoid of storing them? I need something to check does it
ready or not and retrieve results if ready. I couldn't see the way to
achieve same result without storing asyncs set.

It all depends on what you are trying to do. The issue that you
originally brought up is that of memory consumption.

When processing data in parallel you will use up as much memory as
many datasets you are processing at any given time. If you need to
reduce memory use then you need to start fewer processes and use some
mechanism to distribute the work on them as they become free. (see
recommendation that uses Queues)

redbaron · Sep 27, 2008

When processing data in parallel you will use up as muchmemoryas

many datasets you are processing at any given time.

Worker processes eats 2-4 times more than I pass to them.

If you need to
reducememoryuse then you need to start fewer processes and use some
mechanism to distribute the work on them as they become free. (see
recommendation that uses Queues)

I don't understand how could I use Queue here? If worker process
finish computing, it puts its' id into Queue, in main process I
retrieve that id and how could I retrieve result from worker process
then?

multiprocessing	1	Jul 9, 2013
Multiprocessing / threading confusion	11	Sep 5, 2013
Digging into multiprocessing	0	Aug 13, 2013
Multiprocessing pool with custom process class	0	Dec 17, 2013
python-daemon interaction with multiprocessing (secure-smtpd)	4	May 7, 2014
multiprocessing / forking memory usage	3	May 26, 2009
Multiprocessing, shared memory vs. pickled copies	21	Apr 4, 2011
multiprocessing & more	3	Feb 13, 2011

multiprocessing eats memory

Max Ivanov

Istvan Albert

redbaron

MRAB

redbaron

Istvan Albert

redbaron

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads