running functions in parallel on multiple processors

Michael Schmitt · Nov 3, 2003

Hello.

What is the usual way for running functions in parallel on a
multiple-processor machine. Actually I want to run a single computationally
expensive function with different parameter sets.
Running the functions in different threads doesn't seem to work, because of
the global interpreter lock.
Would it help to fork processes, which run the single function with a given
parameter set? Is there any simple way, how this forked worker process can
report its result back to the controlling process?

Thanks.
Best regards,
Michael

Alex Martelli · Nov 3, 2003

Michael said:
What is the usual way for running functions in parallel on a
multiple-processor machine. Actually I want to run a single
computationally expensive function with different parameter sets.
Running the functions in different threads doesn't seem to work, because
of the global interpreter lock.
Would it help to fork processes, which run the single function with a
given parameter set? Is there any simple way, how this forked worker
process can report its result back to the controlling process?

Forked processes could indeed perform whatever computations you
need, and then report their results by writing them to a socket
which the controlling process reads (there are many other IPC
mechanisms, but sockets are often simplest where applicable).

Alex

Jon Franz · Nov 3, 2003

Michael,
I may have something laying around that would be useful for you -
its a module I wrote that makes forked multi-process programing
very easy, since each process accesses a shared data-store
automatically. I haven't released it due to a lack of time to write
documentation, but it sounds like it may be the sort of thing you could use.
It's called remoteD, and it works like this:

import remoteD, time

SharedD = remoteD.initShare()

def child_function(Shared, arg1, arg2):
# the first arg will be the Shared
# dictionary-like object
# put shared data into the dictionary whenever you want
Shared["myresult"] = 5

SharedD.newProc(child_function, [arg1, arg2])

while not SharedD.has_key("myresult"):
time.sleep(0.2)

print "The other process got " + SharedD["myresult"] + " as the answer"

-------------------
stubShare objects, which are created by initShare() or newProc (which
puts the newly created sharestub as the first arg, ahead of your own in
the argument list for your function), act like dictionaries. .has_key(),
..keys() and del all work fine. You can also lock the whole share
temporarily
by simply calling .Lock(), and later .UnLock() on any stubShare object.
Anything python object that can be pickled can be stored in a share.

Behind the scenes, the first call to initShare() forks a server process that
holds
the shared data and accepts connections from share stub objects.
initShare()
returns a stubShare object in the calling process. The server will comit
suicide after a couple of seconds without any connected stubShares,
so you don't need to clean it up explicitly. (You can also force the
server to stay alive, but thats a different topic)
Fork is required.
By default, initShare() uses IP sockets, but you can easily tell it to use
unix sockets, which are much faster:

SharedD.initShare(sType=remoteD.UNIXSOCK)

the 'port' argument is overidden for use with unixsockets - so you can
choose to name your socket yourself, instead of using the default
'7450':

ShareD.initShare(port='myfile',sType=remoteD.UNIXSOCK)

you can also use the createShareServer function and stubShare class
themselves to share data across machines.

As for scalability - I've had hundreds of child processes running
and sharing data with this (unixsocks), but I have no hard numbers
on whether the overhead involved with the stubShare objects slowed
things down greatly. I will say this:
Avoid repeated references to the shared data - assigning to a local variable
will perform a deepcopy, and will be faster. So do things like the
following
to avoiding hitting the shared data every operation:

myValue = SharedD['remoteValue']
myValue += 5
# other manipulations of myValue here
# much later, when you are done:
SharedD['remoteValue'] = myValue

Anyway, I'll end up writing better documentation and doing an official
release
on sourceforge later this week - but for now you can download it at:
http://www.neurokode.com/remoteD.tar
I hope this helps, feel free to bug me with questions.

~Jon Franz
NeuroKode Labs, LLC

----- Original Message -----
From: "Michael Schmitt" <[email protected]>
To: <[email protected]>
Sent: Monday, November 03, 2003 8:42 AM
Subject: running functions in parallel on multiple processors

Ulrich Petri · Nov 3, 2003

Jon Franz said:
Michael,
I may have something laying around that would be useful for you -
its a module I wrote that makes forked multi-process programing
very easy, since each process accesses a shared data-store
automatically. I haven't released it due to a lack of time to write
documentation, but it sounds like it may be the sort of thing you could use.
It's called remoteD, and it works like this:

Anyway, I'll end up writing better documentation and doing an official
release
on sourceforge later this week - but for now you can download it at:
http://www.neurokode.com/remoteD.tar
I hope this helps, feel free to bug me with questions.

Whoa this is great stuff!
I was looking for sth. like this for a long time...
Thanks.

running Python2 Python3 parallel concurrent	3	Mar 31, 2011
Multiple Processors on Windows 64-bit	9	Oct 14, 2008
Python Integrated Parallel Pipeline EnviRonment: PIPPER	2	Feb 5, 2009
Parallel/Multiprocessing script design question	4	Sep 13, 2007
Embedding multiple interpreters	14	Dec 6, 2013
executing multiple functions in background simultaneously	9	Jan 14, 2009
parallel computing in perl?	9	Sep 13, 2007
What's with "long running processes" ?	1	Sep 17, 2007

running functions in parallel on multiple processors

Michael Schmitt

Alex Martelli

Jon Franz

Ulrich Petri

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads