Adding a Par construct to Python?

Terry Reedy · May 18, 2009

I can understand you having a preference, but you may have to choose
between fighting over that method or achieving results. I agree with ...

Good luck with that. The GIL is not going away any time soon (or
probably ever) and as long as CPython is the "official"
implementation, there are almost zero chances of adding syntax support
for this. Besides, Guido and other py-devs are not particularly keen
on threads as a parallelization mechanism.

Parallel processes can run on multiple processors as well as multiple
cores within a processor. Some problems, like massive search, require
multiple disks (or memories, or IO ports) as well as multiple processing
units. There is debate over how useful massively multicore processors
will actually be and for which types of problems.

tjr

jeremy · May 18, 2009

Good luck with that. The GIL is not going away any time soon (or
probably ever) and as long as CPython is the "official"
implementation, there are almost zero chances of adding syntax support
for this. Besides, Guido and other py-devs are not particularly keen
on threads as a parallelization mechanism.

George

Hi George,

The GIL is not going away any time soon (or probably ever) and as long as CPython is
the "official" implementation, there are almost zero chances of adding syntax support
for this.

What concerns me about this statement is that, if it is true, Python
risks falling behind when other languages which can exploit multicore
effectively start to come to the fore. I know that Microsoft is
actively researching in this area and they are hoping that F# will
offer good ways to exploit multi-core architectures.

As I understand it the reason for the GIL is to prevent problems with
garbage collection in multi-threaded applications. Without it the
reference count method is prone to race conditions. However it seems
like a fairly crude mechanism to solve this problem. Individual
semaphores could be used for each object reference counter, as in
Java.

Jeremy

jeremy · May 18, 2009

I can understand you having a preference, but you may have to choose
between fighting over that method or achieving results. I agree with ....

Parallel processes can run on multiple processors as well as multiple
cores within a processor. Some problems, like massive search, require
multiple disks (or memories, or IO ports) as well as multiple processing
units. There is debate over how useful massively multicore processors
will actually be and for which types of problems.

tjr

Hi Terry,

Parallel processes can run on multiple processors as well as multiple
cores within a processor. Some problems, like massive search, require
multiple disks (or memories, or IO ports) as well as multiple processing
units. There is debate over how useful massively multicore processors
will actually be and for which types of problems.

I agree with this. My approach is in the same space as OpenMP - a
simple way for users to define shared memory parallelism. There is no
reason why it would not work with multiple disks or IO ports on the
same shared memory server. However for distributed memory hardware it
would be a non-starter. In that case we would need something like the
Message Passing Interface.

Jeremy

Paul Boddie · May 18, 2009

Thanks for your responses to my original questions.

Thanks for your interesting response!

Paul, thanks for explaining about the pprocess module which appears
very useful. I presume that this is using multiple operating system
processes rather than threads which would probably imply that it is
suitable for coarse grained parallel programming rather than fine-
grained because of overhead in starting up new processes and sharing
objects. (How is that done, by the way?). It probably has advantages
and disadvantages compared with thread based parallelism.

Communication via interprocess channels is done using pickled objects.
For large data volumes, the most efficient approach can actually
involve the filesystem. My opinion on the threads vs. processes debate
centres on the relative efficiency of creating processes on systems
which have evolved to minimise the cost of doing so: people seem to
believe that forking a process creates an entirely new copy, but this
is unlikely to be the case on most modern, popular general-purpose
operating systems.

My suggestion is primarily about using multiple threads and sharing
memory - something akin to the OpenMP directives that one of you has
mentioned. To do this efficiently would involve removing the Global
Interpreter Lock, or switching to Jython or Iron Python as you
mentioned.

One could always share memory using processes: I think the ease of
doing so is the only argument for using threads, and the caveats
involved obviously lead us back to the global interpreter lock (in the
threaded situation).

However I *do* actually want to add syntax to the language. I think
that 'par' makes sense as an official Python construct - we already
have had this in the Occam programming language for twenty-five years.
The reason for this is ease of use. I would like to make it easy for
amateur programmers to exploit natural parallelism in their
algorithms. For instance somebody who wishes to calculate a property
of each member from a list of chemical structures using the Python
Daylight interface: with my suggestion they could potentially get a
massive speed up just by changing 'for' to 'par' or 'map' to 'pmap'.
(Or map with a parallel keyword argument set as suggested). At present
they would have to manually chop up their work and run it as multiple
processes in order to achieve the same - fine for expert programmers
but not reasonable for people working in other domains who wish to use
Python as a utility because of its fantastic productivity and ease of
use.

Well, you have to know that the components actually lend themselves to
parallelisation, and the "chopping up" of the work would be similar to
that involved with one of the parallel processing libraries today: you
call something in each iteration that actually goes off with its own
piece of the data and runs in parallel.

Let me clarify what I think par, pmap, pfilter and preduce would mean
and how they would be implemented. A par loop is like a for loop,
however the programmer is saying that the order in which the
iterations are performed doesn't matter and they might be performed in
parallel.

Actually, with pprocess's abstractions, the iteration ordering doesn't
really matter, either, although the processes are dispatched in order.
It's when the results are collected that the ordering matters: the
difference between maps and queues. (I suppose it's similar with other
libraries.)

The python system then has the option to allocate a number
of threads to the task and share out the iterations accordingly
between the threads. (It might be that the programmer should be
allowed to explictly define the number of threads to use or can
delegate that decision to the system). Parallel pmap and pfilter would
be implemented in much the same way, although the resultant list might
have to be reassembled from the partial results returned from each
thread. As people have pointed out, parallel reduce is a tricky option
because it requires the binary operation to be associative in which
case it can be parallelised by calculating the result using a tree-
based evaluation strategy.

The sharing out of tasks to threads or execution contexts is done
using various abstractions in pprocess which probably resemble the
"pool" abstraction in the multiprocessing library: only as tasks are
completed are the free resources allocated to new tasks. A parallel
reduce abstraction doesn't exist in pprocess, but you could roll your
own.

I have used all of OpenMP, MPI, and Occam in the past. OpenMP adds
parallelism to programs by the use of special comment strings, MPI by
explicit calls to library routines, and Occam by explicit syntactical
structures. Each has its advantages. I like the simplicity of OpenMP,
the cross-language portability of MPI and the fact the concurrency is
built in to the Occam language. What I am proposing here is a hybrid
of the OpenMP and Occam approaches - a change to the language which is
very natural and yet is easy for programmers to understand.
Concurrency is generally regarded as the hardest concept for
programmers to grasp.

It's interesting to hear your experiences on this topic. I still don't
see the need for special syntax; if you were to propose more pervasive
changes, like arbitrary evaluation ordering for function arguments or
the ability to prevent/detect side-effects, the motivation would be
clearer, I think.

Paul

Steven D'Aprano · May 19, 2009

However I *do* actually want to add syntax to the language. I think that
'par' makes sense as an official Python construct - we already have had
this in the Occam programming language for twenty-five years. The reason
for this is ease of use. I would like to make it easy for amateur
programmers to exploit natural parallelism in their algorithms. For
instance somebody who wishes to calculate a property of each member from
a list of chemical structures using the Python Daylight interface: with
my suggestion they could potentially get a massive speed up just by
changing 'for' to 'par' or 'map' to 'pmap'. (Or map with a parallel
keyword argument set as suggested). At present they would have to
manually chop up their work and run it as multiple processes in order to
achieve the same - fine for expert programmers but not reasonable for
people working in other domains who wish to use Python as a utility
because of its fantastic productivity and ease of use.

There seems to be some discrepancy between this, and what you wrote in
your first post:

"There would be no locking and it would be the programmer's
responsibility to ensure that the loop was truly parallel and correct."

So on the one hand, you want par to be utterly simple-minded and to do no
locking. On the other hand you want it so simple to use that amateurs can
mechanically replace 'for' with 'par' in their code and everything will
Just Work, no effort or thought required. But those two desires are
inconsistent.

Concurrency is an inherently complicated problem: deadlocks and race
conditions abound, and are notoriously hard to reproduce, let alone
debug. If par is simple, and does no locking, then the programmer needs
to worry about those complications. If you want programmers to ignore
those complications, either (1) par needs to be very complicated and
smart, to do the Right Thing in every case, or (2) you're satisfied if
par produces buggy code when used in the fashion you recommend.

The third option is, make par really simple and put responsibility on the
user to write code which is concurrent. I think that's the right
solution, but it means a simplistic "replace `for` with `par` and your
code will run faster" will not work. It might run faster three times out
of five, but the other two times it will hang in a deadlock, or produce
incorrect results, or both.

Carl Banks · May 19, 2009

As I understand it the reason for the GIL is to prevent problems with
garbage collection in multi-threaded applications.

Not really. It's main purpose to prevent context switches from
happening in the middle of some C code (essentially it makes all C
code a "critical" section, except when the C code explicitly releases
the GIL). This tremendously simplifies the task of writing extension
modules, not to mention the Python core.

Without it the
reference count method is prone to race conditions. However it seems
like a fairly crude mechanism to solve this problem. Individual
semaphores could be used for each object reference counter, as in
Java.

Java doesn't use reference counting.

Individual locks on the refcounts would be prohibitively expensive in
Python, a cost that Java doesn't have.

Even if you decided to accept the penalty and add locking to
refcounts, you still have to be prepared for context switching at any
time when writing C code, which means in practice you have to lock any
object that's being accessed--that's in addition to the refcount lock.

I am not disagreeing with your assessment in general, it would be
great if Python were better able to take advantage of multiple cores,
but it's not as simple a thing as you're making it out to be.

Carl Banks

Steven D'Aprano · May 19, 2009

Let me clarify what I think par, pmap, pfilter and preduce would mean
and how they would be implemented.

[...]

Just for fun, I've implemented a parallel-map function, and done a couple
of tests. Comments, criticism and improvements welcome!

import threading
import Queue
import random
import time

def f(arg): # Simulate a slow function.
time.sleep(0.5)
return 3*arg-2

class PMapThread(threading.Thread):
def __init__(self, clients):
super(PMapThread, self).__init__()
self._clients = clients
def start(self):
super(PMapThread, self).start()
def run(self):
while True:
try:
data = self._clients.get_nowait()
except Queue.Empty:
break
target, where, func, arg = data
result = func(arg)
target[where] = result

class VerbosePMapThread(threading.Thread):
def __init__(self, clients):
super(VerbosePMapThread, self).__init__()
print "Thread %s created at %s" % (self.getName(), time.ctime())
def start(self):
super(VerbosePMapThread, self).start()
print "Thread %s starting at %s" % (self.getName(), time.ctime())
def run(self):
super(VerbosePMapThread, self).run()
print "Thread %s finished at %s" % (self.getName(), time.ctime())

def pmap(func, seq, verbose=False, numthreads=4):
size = len(seq)
results = [None]*size
if verbose:
print "Initiating threads"
thread = VerbosePMapThread
else:
thread = PMapThread
datapool = Queue.Queue(size)
for i in xrange(size):
datapool.put( (results, i, f, seq) )
threads = [PMapThread(datapool) for i in xrange(numthreads)]
if verbose:
print "All threads created."
for t in threads:
t.start()
# Block until all threads are done.
while any([t.isAlive() for t in threads]):
if verbose:
time.sleep(0.25)
print results
return results

And here's the timing results:
20.490942001342773

jeremy · May 19, 2009

There seems to be some discrepancy between this, and what you wrote in
your first post:

"There would be no locking and it would be the programmer's
responsibility to ensure that the loop was truly parallel and correct."

So on the one hand, you want par to be utterly simple-minded and to do no
locking. On the other hand you want it so simple to use that amateurs can
mechanically replace 'for' with 'par' in their code and everything will
Just Work, no effort or thought required. But those two desires are
inconsistent.

Concurrency is an inherently complicated problem: deadlocks and race
conditions abound, and are notoriously hard to reproduce, let alone
debug. If par is simple, and does no locking, then the programmer needs
to worry about those complications. If you want programmers to ignore
those complications, either (1) par needs to be very complicated and
smart, to do the Right Thing in every case, or (2) you're satisfied if
par produces buggy code when used in the fashion you recommend.

The third option is, make par really simple and put responsibility on the
user to write code which is concurrent. I think that's the right
solution, but it means a simplistic "replace `for` with `par` and your
code will run faster" will not work. It might run faster three times out
of five, but the other two times it will hang in a deadlock, or produce
incorrect results, or both.

Hi Steven,

you want it so simple to use that amateurs can mechanically replace 'for' with 'par' in their
code and everything will Just Work, no effort or thought required.

Yes I do want the par construction to be simple, but of course you
can't just replace a for loop with a par loop in the general case.
This issue arises when people use OpenMP: you can take a correct piece
of code, add a comment to indicate that a loop is 'parallel', and if
you get it wrong the code with no longer work correctly. With my 'par'
construct the programmer's intention is made explicit in the code,
rather than by a compiler directive and so I think that is clearer
than OpenMP.

As I wrote before, concurrency is one of the hardest things for
professional programmers to grasp. For 'amateur' programmers we need
to make it as simple as possible, and I think that a parallel loop
construction and the dangers that lurk within would be reasonably
straightforward to explain: there are no locks to worry about, no
message passing. The only advanced concept is the 'sync' keyword,
which would be used to rendezvous all the threads. That would only be
used to speed up certain codes in order to avoid having to repeatedly
shut down and start up gangs of threads.

Jeremy

jeremy · May 19, 2009

Not really. It's main purpose to prevent context switches from
happening in the middle of some C code (essentially it makes all C
code a "critical" section, except when the C code explicitly releases
the GIL). This tremendously simplifies the task of writing extension
modules, not to mention the Python core.

Java doesn't use reference counting.

Individual locks on the refcounts would be prohibitively expensive in
Python, a cost that Java doesn't have.

Even if you decided to accept the penalty and add locking to
refcounts, you still have to be prepared for context switching at any
time when writing C code, which means in practice you have to lock any
object that's being accessed--that's in addition to the refcount lock.

I am not disagreeing with your assessment in general, it would be
great if Python were better able to take advantage of multiple cores,
but it's not as simple a thing as you're making it out to be.

Carl Banks

Hi Carl,

I am not disagreeing with your assessment in general, it would be
great if Python were better able to take advantage of multiple cores,
but it's not as simple a thing as you're making it out to be.

Thanks for explaining a few things to me. So it would seem that
replacing the GIL with something which allows better scalability of
multi-threaded applications, would be very complicated. The paper by
Jesse Nolle which I referenced in my original posting includes the
following:

"In 1999 Greg Stein created a patch set for the interpreter that
removed the GIL, but added granular locking around sensitive
interpreter operations. This patch set had the direct effect of
speeding up threaded execution, but made single threaded execution two
times slower."

Source: http://jessenoller.com/2009/02/01/python-threads-and-the-global-interpreter-lock/

That was ten years ago - do you have any idea as to how things have
been progressing in this area since then?

Jeremy

Terry Reedy · May 19, 2009

"In 1999 Greg Stein created a patch set for the interpreter that
removed the GIL, but added granular locking around sensitive
interpreter operations. This patch set had the direct effect of
speeding up threaded execution, but made single threaded execution two
times slower."

Source: http://jessenoller.com/2009/02/01/python-threads-and-the-global-interpreter-lock/

That was ten years ago - do you have any idea as to how things have
been progressing in this area since then?

The slowdown may be less severe, but known attempts to replace, rather
than merely remove, GIL slow down single thread execution. Otherwise,
GIL wouild be gone. No one has been willing to maintain a multi-thread
fork/branch.

jeremy · May 19, 2009

Let me clarify what I think par, pmap, pfilter and preduce would mean
and how they would be implemented.

Click to expand...

[...]

Just for fun, I've implemented a parallel-map function, and done a couple
of tests. Comments, criticism and improvements welcome!

import threading
import Queue
import random
import time

def f(arg): # Simulate a slow function.
time.sleep(0.5)
return 3*arg-2

class PMapThread(threading.Thread):
def __init__(self, clients):
super(PMapThread, self).__init__()
self._clients = clients
def start(self):
super(PMapThread, self).start()
def run(self):
while True:
try:
data = self._clients.get_nowait()
except Queue.Empty:
break
target, where, func, arg = data
result = func(arg)
target[where] = result

class VerbosePMapThread(threading.Thread):
def __init__(self, clients):
super(VerbosePMapThread, self).__init__()
print "Thread %s created at %s" % (self.getName(), time.ctime())
def start(self):
super(VerbosePMapThread, self).start()
print "Thread %s starting at %s" % (self.getName(), time.ctime())
def run(self):
super(VerbosePMapThread, self).run()
print "Thread %s finished at %s" % (self.getName(), time.ctime())

def pmap(func, seq, verbose=False, numthreads=4):
size = len(seq)
results = [None]*size
if verbose:
print "Initiating threads"
thread = VerbosePMapThread
else:
thread = PMapThread
datapool = Queue.Queue(size)
for i in xrange(size):
datapool.put( (results, i, f, seq) )
threads = [PMapThread(datapool) for i in xrange(numthreads)]
if verbose:
print "All threads created."
for t in threads:
t.start()
# Block until all threads are done.
while any([t.isAlive() for t in threads]):
if verbose:
time.sleep(0.25)
print results
return results

And here's the timing results:

20.490942001342773

Hi Steven,

I am impressed by this - it shows the potential speedup that pmap
could give. Although the GIL would be a problem as things for speed up
of pure Python code. Do Jython and Iron Python include the threading
module?

Jeremy

jeremy · May 19, 2009

On said:
On 17 May, 13:05, (e-mail address removed) wrote:> From a user point of view I think that adding a 'par' construct to

...actually, thinking about this further, I think it would be good to
add a 'sync' keyword which causes a thread rendezvous within a
parallel loop. This would allow parallel loops to run for longer in
certain circumstances without having the overhead of stopping and
restarting all the threads, e.g.

par i in list:
for j in iterations:
updatePartion(i)
sync
commitBoundaryValues(i)
sync

This example is a typical iteration over a grid, e.g. finite elements,
calculation, where the boundary values need to be read by neighbouring
partitions before they are updated. It assumes that the new values of
the boundary values are stored in temporary variables until they can
be safely updated.

Jeremy

I have coded up a (slightly) more realistic example. Here is a code to
implement the numerical solution to Laplace's equation, which can
calculate the values of a potential field across a rectangular region
given fixed boundary values:

xmax = 200
ymax = 200
niterations = 200

# Initialisation
old=[[0.0 for y in range(ymax)] for x in range(xmax)]
for x in range(xmax):
old[x][0] = 1.0
for y in range(ymax):
old[0][y] = 1.0
new=[[old[x][y] for y in range(ymax)] for x in range(xmax)]

# Iterations
for i in range(1,100):
print "Iteration: ", i
for x in range(1,ymax-1):
for y in range(1, xmax-1):
new[x][y] = \
0.25*(old[x-1][y] + old[x+1][y] + old[x][y-1] + old[x-1][y])
# Swap over the new and old storage arrays
tmp = old
old = new
new = tmp

# Print results
for y in range(ymax):
for x in range(xmax):
print str(old[x][y]).rjust(7),
print

In order to parallelise this with the par construct would require a
single alteration to the iteration section:

for i in range(1,100):
print "Iteration: ", i
par x in range(1,ymax-1):
for y in range(1, xmax-1):
new[x][y] = \
0.25*(old[x-1][y] + old[x+1][y] + old[x][y-1] + old[x-1][y])
# Swap over the new and old storage arrays
tmp = old
old = new
new = tmp

The par command tells python that it may choose to fire up multiple
threads and automatically partition the data between them. So, for
instance, if there were ten threads created each one would work on a
sub-range of x values: thread 1 takes x from 1 to 100, thread 2 takes
x from 101 to 200, etc.

However with this approach there is an overhead in each iteration of
starting up the threads and shutting them down again. Depending on the
impact of this overhead, it might be better to keep the threads
running between iterations by modifying the code like this, adding a
'sync' command to synchronise the threads at the end of each iteration
and also making sure that only one of the threads performs the swap of
the data arrays.

par x in range(1,ymax-1):
for i in range(1,100):
if __thread__ == 0: print "Iteration: ", i
for y in range(1, xmax-1):
new[x][y] = \
0.25*(old[x-1][y] + old[x+1][y] + old[x][y-1] + old[x-1][y])
# Swap over the new and old storage arrays
sync
if __thread__ == 0:
tmp = old
old = new
new = tmp
sync

Jeremy

Diez B. Roggisch · May 19, 2009

Hi Steven,

I am impressed by this - it shows the potential speedup that pmap
could give. Although the GIL would be a problem as things for speed up
of pure Python code. Do Jython and Iron Python include the threading
module?

Jython does, and AFAIK IronPython also. Jython also has no GIL I think,
for IronPython I don't know that.

Diez

Adam Olsen · May 19, 2009

Thanks for explaining a few things to me. So it would seem that
replacing the GIL with something which allows better scalability of
multi-threaded applications, would be very complicated. The paper by
Jesse Nolle which I referenced in my original posting includes the
following:

"In 1999 Greg Stein created a patch set for the interpreter that
removed the GIL, but added granular locking around sensitive
interpreter operations. This patch set had the direct effect of
speeding up threaded execution, but made single threaded execution two
times slower."

Source:http://jessenoller.com/2009/02/01/python-threads-and-the-global-inter...

That was ten years ago - do you have any idea as to how things have
been progressing in this area since then?

https://launchpad.net/python-safethread

Aaron Brady · May 20, 2009

From a user point of view I think that adding a 'par' construct to
Python for parallel loops would add a lot of power and simplicity,
e.g.

par i in list:
updatePartition(i)

There would be no locking and it would be the programmer's
responsibility to ensure that the loop was truly parallel and correct.

The intention of this would be to speed up Python execution on multi-
core platforms. Within a few years we will see 100+ core processors as
standard and we need to be ready for that.

There could also be parallel versions of map, filter and reduce
provided.

BUT...none of this would be possible with the current implementation
of Python with its Global Interpreter Lock, which effectively rules
out true parallel processing.

See:http://jessenoller.com/2009/02/01/python-threads-and-the-global-inter....

What do others think?

Jeremy Martin

Hi, joining late.

My first guess is that you haven't considered the existing syntactic
variants.

for i in list:
fireandforget( updatePartition )( i )

for i in list:
with fireandforget( ) as x:
x.call( updatePartition )( i )

@fireandforget
def fun( i ):
updatePartition( i )
for i in list:
fun( i )

Are you suggesting only syntactic sugar, or a true semantic change to
make parallel programming easier?

Steven D'Aprano · May 20, 2009

Yes I do want the par construction to be simple, but of course you can't
just replace a for loop with a par loop in the general case.

But that's exactly what you said you wanted people to be able to do:

"with my suggestion they could potentially get a massive speed up just by
changing 'for' to 'par' or 'map' to 'pmap'."

I am finding this conversation difficult because it seems to me you don't
have a consistent set of requirements.

This issue
arises when people use OpenMP: you can take a correct piece of code, add
a comment to indicate that a loop is 'parallel', and if you get it wrong
the code with no longer work correctly.

How will 'par' be any different? It won't magically turn code with
deadlocks into bug-free code.

With my 'par' construct the
programmer's intention is made explicit in the code, rather than by a
compiler directive and so I think that is clearer than OpenMP.

A compiler directive is just as clear about the programmer's intention as
a keyword. Possibly even more so.

#$ PARALLEL-LOOP
for x in seq:
do(x)

Seems pretty obvious to me. (Not that I'm suggesting compiler directives
is a good solution to this problem.)

As I wrote before, concurrency is one of the hardest things for
professional programmers to grasp. For 'amateur' programmers we need to
make it as simple as possible,

The problem is that "as simple as possible" is Not Very Simple. There's
no getting around the fact that concurrency is inherently complex. In
some special cases, you can keep it simple, e.g. parallel-map with a
function that has no side-effects. But in the general case, no, you can't
avoid dealing with the complexity, at least a little bit.

and I think that a parallel loop
construction and the dangers that lurk within would be reasonably
straightforward to explain: there are no locks to worry about, no
message passing.

It's *already* easy to explain. And having explained it, you still need
to do something about it. You can't just say "Oh well, I've had all the
pitfalls explained to me, so now I don't have to actually do anything
about avoiding those pitfalls". You still need to actually avoid them.
For example, you can choose one of four tactics:

(1) the loop construct deals with locking;

(2) the caller deals with locking;

(3) nobody deals with locking, therefore the code is buggy and risks
deadlocks; or

(4) the caller is responsible for making sure he never shares data while
looping over it.

I don't think I've missed any possibilities. You have to pick one of
those four.

The only advanced concept is the 'sync' keyword, which
would be used to rendezvous all the threads. That would only be used to
speed up certain codes in order to avoid having to repeatedly shut down
and start up gangs of threads.

So now you want a second keyword as well.

Steven D'Aprano · May 20, 2009

Let me clarify what I think par, pmap, pfilter and preduce would mean
and how they would be implemented.

Click to expand...

[...]

Just for fun, I've implemented a parallel-map function, and done a
couple of tests. Comments, criticism and improvements welcome!

Click to expand...

My only comment would be that your "slow function" might not be a very
simulation for the general-case, since it uses time.sleep() which
releases the GIL:

I didn't expect my code to magically overcome fundamental limitations of
the CPython interpreter

Any Python function that isn't calling a library function written in C
that releases the GIL won't show any speedup will it?

Not necessarily. Here's another function, that uses a loop instead of
sleep.

def g(arg, SIZE=8*10**6):
# Default SIZE is chosen so that on my machine, the loop
# takes approximately 0.5 second.
for x in xrange(SIZE):
pass
return 3*arg-2

20.268381118774414

However, if the function is fast enough that the GIL doesn't get a chance
to be released often enough, then pmap() is a pessimation:
.... return 3*arg+2
....19.960626125335693

Concurrency doesn't come for free. There are setup and teardown costs,
and management costs. Presumably if pmap() was written more cleverly, in
C, those costs could be reduced, but not eliminated.

I don't have a
multi-core machine to try it on, but what happens when you replace your
"slow function" code with something that actually burns CPU using
pure-Python code instead of blocking on a timer in the OS?

Two simple work-arounds are:

* use Jython or IronPython; or

* insert time.sleep(0.000001) into your function at various points.

Paul Rubin · May 20, 2009

Steven D'Aprano said:
(4) the caller is responsible for making sure he never shares data while
looping over it.

I don't think I've missed any possibilities. You have to pick one of
those four.

I wonder if the compiler can check that a given function doesn't
change any data. Then:

@pure
def f(x):
return x*sqrt(x) + 3 # does not mutate any data

@pure
def g(x): ... # likewise

s = parallel_dot_product(parallel_map(f, vec), parallel_map(g,vec))

Aaron Brady · May 20, 2009

I wonder if the compiler can check that a given function doesn't
change any data. Then:

@pure
def f(x):
return x*sqrt(x) + 3 # does not mutate any data

@pure
def g(x): ... # likewise

s = parallel_dot_product(parallel_map(f, vec), parallel_map(g,vec))

You can do the basics of this using the 'ast' module. Just check that
no nodes in the ast tree are Assign nodes, including augmented
assign. Then 'f' is defined as:

f= pure( '''
return x*sqrt(x) + 3 # does not mutate any data
''' )

Untested. However, you do need to check that the subchildren don't
make mutations. This adds hooks to the AST, since names can change at
run-time. Also, it's only defined for functions that call functions
that are all pure Python and defined using 'pure' as well, without
inspecting the bytecode. At this point one might look into
metaprogramming and script autogeneration.

The 'pure' wrapper doesn't fundamentally change the semantics of the
function; it is more like a pre- and post-condition check.

Steven D'Aprano · May 20, 2009

You can do the basics of this using the 'ast' module. Just check that
no nodes in the ast tree are Assign nodes, including augmented assign.
Then 'f' is defined as:

f= pure( '''
return x*sqrt(x) + 3 # does not mutate any data
''' )

Untested.

Can you explain how you can tell that there are no side-effects from
x*sqrt(x)+3 ? What if I have this?

class Funny(object):
def __add__(self, other):
global parrot
parrot += 1
return 5 + other

x = Funny()

First steps in setting up VSCode to work with Python.	2	Mar 13, 2023
Possibly better loop construct, also labels+goto important and on the fly compiler idea.	82	Oct 17, 2013
Processing in Python help	0	Aug 31, 2022
Trying to modify a list in Python, receiving "variable not defined" error	3	Feb 6, 2023
how to construct a binary-tree using python?	4	May 6, 2006
subprocess.Popen and multiprocessing fails to execute external program	0	Jan 10, 2013
Python Front-end to GCC	108	Oct 20, 2013
Would a Python 2.8 help you port to Python 3?	1	Jun 3, 2014

Adding a Par construct to Python?

Terry Reedy

jeremy

jeremy

Paul Boddie

Steven D'Aprano

Carl Banks

Steven D'Aprano

jeremy

jeremy

Terry Reedy

jeremy

jeremy

Diez B. Roggisch

Adam Olsen

Aaron Brady

Steven D'Aprano

Steven D'Aprano

Paul Rubin

Aaron Brady

Steven D'Aprano

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads