multiprocessing vs thread performance

M

mk

Hello everyone,

After reading http://www.python.org/dev/peps/pep-0371/ I was under
impression that performance of multiprocessing package is similar to
that of thread / threading. However, to familiarize myself with both
packages I wrote my own test of spawning and returning 100,000 empty
threads or processes (while maintaining at most 100 processes / threads
active at any one time), respectively.

The results I got are very different from the benchmark quoted in PEP
371. On twin Xeon machine the threaded version executed in 5.54 secs,
while multiprocessing version took over 222 secs to complete!

Am I doing smth wrong in code below? Or do I have to use
multiprocessing.Pool to get any decent results?

# multithreaded version


#!/usr/local/python2.6/bin/python

import thread
import time

class TCalc(object):

def __init__(self):
self.tactivnum = 0
self.reslist = []
self.tid = 0
self.tlock = thread.allocate_lock()

def testth(self, tid):
if tid % 1000 == 0:
print "== Thread %d working ==" % tid
self.tlock.acquire()
self.reslist.append(tid)
self.tactivnum -= 1
self.tlock.release()

def calc_100thousand(self):
tid = 1
while tid <= 100000:
while self.tactivnum > 99:
time.sleep(0.01)
self.tlock.acquire()
self.tactivnum += 1
self.tlock.release()
t = thread.start_new_thread(self.testth, (tid,))
tid += 1
while self.tactivnum > 0:
time.sleep(0.01)


if __name__ == "__main__":
tc = TCalc()
tstart = time.time()
tc.calc_100thousand()
tend = time.time()
print "Total time: ", tend-tstart



# multiprocessing version

#!/usr/local/python2.6/bin/python

import multiprocessing
import time


def testp(pid):
if pid % 1000 == 0:
print "== Process %d working ==" % pid

def palivelistlen(plist):
pll = 0
for p in plist:
if p.is_alive():
pll += 1
else:
plist.remove(p)
p.join()
return pll

def testp_100thousand():
pid = 1
proclist = []
while pid <= 100000:
while palivelistlen(proclist) > 99:
time.sleep(0.01)
p = multiprocessing.Process(target=testp, args=(pid,))
p.start()
proclist.append(p)
pid += 1
print "=== Main thread waiting for all processes to finish ==="
for p in proclist:
p.join()

if __name__ == "__main__":
tstart = time.time()
testp_100thousand()
tend = time.time()
print "Total time:", tend - tstart
 
J

janislaw

Hello everyone,

After readinghttp://www.python.org/dev/peps/pep-0371/I was under
impression that performance of multiprocessing package is similar to
that of thread / threading. However, to familiarize myself with both
packages I wrote my own test of spawning and returning 100,000 empty
threads or processes (while maintaining at most 100 processes / threads
active at any one time), respectively.

The results I got are very different from the benchmark quoted in PEP
371. On twin Xeon machine the threaded version executed in 5.54 secs,
while multiprocessing version took over 222 secs to complete!

Am I doing smth wrong in code below? Or do I have to use
multiprocessing.Pool to get any decent results?

Oooh, 100000 processes! You're fortunate that your OS handled them in
finite time.

[quick browsing through the code]

Ah, so there are 100 processes at time. 200secs still don't sound
strange.

JW
 
M

mk

janislaw said:
Ah, so there are 100 processes at time. 200secs still don't sound
strange.

I ran the PEP 371 code on my system (Linux) on Python 2.6.1:

Linux SLES (9.156.44.174) [15:18] root ~/tmp/src # ./run_benchmarks.py
empty_func.py

Importing empty_func
Starting tests ...
non_threaded (1 iters) 0.000005 seconds
threaded (1 threads) 0.000235 seconds
processes (1 procs) 0.002607 seconds

non_threaded (2 iters) 0.000006 seconds
threaded (2 threads) 0.000461 seconds
processes (2 procs) 0.004514 seconds

non_threaded (4 iters) 0.000008 seconds
threaded (4 threads) 0.000897 seconds
processes (4 procs) 0.008557 seconds

non_threaded (8 iters) 0.000010 seconds
threaded (8 threads) 0.001821 seconds
processes (8 procs) 0.016950 seconds

This is very different from PEP 371. It appears that the PEP 371 code
was written on Mac OS X. The conclusion I get from comparing above costs
sis that OS X must have very low cost of creating the process, at least
when compared to Linux, not that multiprocessing is a viable alternative
to thread / threading module. :-(
 
A

Aaron Brady

Hello everyone,

After readinghttp://www.python.org/dev/peps/pep-0371/I was under
impression that performance of multiprocessing package is similar to
that of thread / threading. However, to familiarize myself with both
packages I wrote my own test of spawning and returning 100,000 empty
threads or processes (while maintaining at most 100 processes / threads
active at any one time), respectively.

The results I got are very different from the benchmark quoted in PEP
371. On twin Xeon machine the threaded version executed in 5.54 secs,
while multiprocessing version took over 222 secs to complete!

Am I doing smth wrong in code below? Or do I have to use
multiprocessing.Pool to get any decent results?

I'm running a 1.6 GHz. I only ran 10000 empty threads and 10000 empty
processes. The threads were the ones you wrote. The processes were
empty executables written in a lower language, also run 100 at a time,
started with 'subprocess', not 'multiprocessing'. The threads took
1.2 seconds. The processes took 24 seconds.

The processes you wrote had only finished 3000 after several minutes.
 
J

Jarkko Torppa

janislaw said:
Ah, so there are 100 processes at time. 200secs still don't sound
strange.

I ran the PEP 371 code on my system (Linux) on Python 2.6.1:

Linux SLES (9.156.44.174) [15:18] root ~/tmp/src # ./run_benchmarks.py
empty_func.py

Importing empty_func
Starting tests ...
non_threaded (1 iters) 0.000005 seconds
threaded (1 threads) 0.000235 seconds
processes (1 procs) 0.002607 seconds

non_threaded (2 iters) 0.000006 seconds
threaded (2 threads) 0.000461 seconds
processes (2 procs) 0.004514 seconds

non_threaded (4 iters) 0.000008 seconds
threaded (4 threads) 0.000897 seconds
processes (4 procs) 0.008557 seconds

non_threaded (8 iters) 0.000010 seconds
threaded (8 threads) 0.001821 seconds
processes (8 procs) 0.016950 seconds

This is very different from PEP 371. It appears that the PEP 371 code
was written on Mac OS X.

On the PEP371 it says "All benchmarks were run using the following:
Python 2.5.2 compiled on Gentoo Linux (kernel 2.6.18.6)"

On my iMac 2.3Ghz dualcore. python 2.6

iTaulu:src torppa$ python run_benchmarks.py empty_func.py
Importing empty_func
Starting tests ...
non_threaded (1 iters) 0.000002 seconds
threaded (1 threads) 0.000227 seconds
processes (1 procs) 0.002367 seconds

non_threaded (2 iters) 0.000003 seconds
threaded (2 threads) 0.000406 seconds
processes (2 procs) 0.003465 seconds

non_threaded (4 iters) 0.000004 seconds
threaded (4 threads) 0.000786 seconds
processes (4 procs) 0.006430 seconds

non_threaded (8 iters) 0.000006 seconds
threaded (8 threads) 0.001618 seconds
processes (8 procs) 0.012841 seconds

With python2.5 and pyProcessing-0.52

iTaulu:src torppa$ python2.5 run_benchmarks.py empty_func.py
Importing empty_func
Starting tests ...
non_threaded (1 iters) 0.000003 seconds
threaded (1 threads) 0.000143 seconds
processes (1 procs) 0.002794 seconds

non_threaded (2 iters) 0.000004 seconds
threaded (2 threads) 0.000277 seconds
processes (2 procs) 0.004046 seconds

non_threaded (4 iters) 0.000005 seconds
threaded (4 threads) 0.000598 seconds
processes (4 procs) 0.007816 seconds

non_threaded (8 iters) 0.000008 seconds
threaded (8 threads) 0.001173 seconds
processes (8 procs) 0.015504 seconds
 
M

mk

Jarkko said:
On the PEP371 it says "All benchmarks were run using the following:
Python 2.5.2 compiled on Gentoo Linux (kernel 2.6.18.6)"

Right... I overlooked that. My tests I quoted above were done on SLES
10, kernel 2.6.5.
With python2.5 and pyProcessing-0.52

iTaulu:src torppa$ python2.5 run_benchmarks.py empty_func.py
Importing empty_func
Starting tests ...
non_threaded (1 iters) 0.000003 seconds
threaded (1 threads) 0.000143 seconds
processes (1 procs) 0.002794 seconds

non_threaded (2 iters) 0.000004 seconds
threaded (2 threads) 0.000277 seconds
processes (2 procs) 0.004046 seconds

non_threaded (4 iters) 0.000005 seconds
threaded (4 threads) 0.000598 seconds
processes (4 procs) 0.007816 seconds

non_threaded (8 iters) 0.000008 seconds
threaded (8 threads) 0.001173 seconds
processes (8 procs) 0.015504 seconds

There's smth wrong with numbers posted in PEP. This is what I got on
4-socket Xeon (+ HT) with Python 2.6.1 on Debian (Etch), with kernel
upgraded to 2.6.22.14:


non_threaded (1 iters) 0.000004 seconds
threaded (1 threads) 0.000159 seconds
processes (1 procs) 0.001067 seconds

non_threaded (2 iters) 0.000005 seconds
threaded (2 threads) 0.000301 seconds
processes (2 procs) 0.001754 seconds

non_threaded (4 iters) 0.000006 seconds
threaded (4 threads) 0.000581 seconds
processes (4 procs) 0.003906 seconds

non_threaded (8 iters) 0.000009 seconds
threaded (8 threads) 0.001148 seconds
processes (8 procs) 0.008178 seconds
 
G

Gabriel Genellina

Yes!

The problem with your code is that you never start more than one
process at once in the multiprocessing example. Just check ps when it
is running and you will see.

Oh, very good analysis! Those results were worriying me a little.
 
J

James Mills


HI :)
Does anybody know any tutorial for python 2.6 multiprocessing? Or bunch of
good example for it? I am trying to break a loop to run it over multiple
core in a system. And I need to return an integer value as the result of the
process an accumulate all of them. the examples that I found there is no
return for the process.

You communicate with the process in one of several
ways:
* Semaphores
* Locks
* PIpes

I prefer to use Pipes which act much like sockets.
(in fact they are).

Read the docs and let us know how you go :)
I'm actually implementing multiprocessing
support into circuits (1) right now...

cheers
James

1. http://trac.softcircuit.com.au/circuits/
 
G

Gabriel Genellina

En Wed, 07 Jan 2009 23:05:53 -0200, James Mills
You communicate with the process in one of several
ways:
* Semaphores
* Locks
* PIpes

The Pool class provides a more abstract view that may be better suited in
this case. Just create a pool, and use map_async to collect and summarize
the results.

import string
import multiprocessing

def count(args):
(lineno, line) = args
print "This is %s, processing line %d\n" % (
multiprocessing.current_process().name, lineno),
result = dict(letters=0, digits=0, other=0)
for c in line:
if c in string.letters: result['letters'] += 1
elif c in string.digits: result['digits'] += 1
else: result['other'] += 1
# just to make some "random" delay
import time; time.sleep(len(line)/100.0)
return result

if __name__ == '__main__':

summary = dict(letters=0, digits=0, other=0)

def summary_add(results):
# this is called with a list of results
for result in results:
summary['letters'] += result['letters']
summary['digits'] += result['digits']
summary['other'] += result['other']

# count letters on this same script
f = open(__file__, 'r')

pool = multiprocessing.Pool(processes=6)
# invoke count((lineno, line)) for each line in the file
pool.map_async(count, enumerate(f), 10, summary_add)
pool.close() # no more jobs
pool.join() # wait until done
print summary
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top