Parallelization on muli-CPU hardware?

Guest · Oct 6, 2004

Reading the above, one might be tempted to conclude that presence of the

Linux is aiming at the future when everyone has 4 or 8
cores/hyperthreads and I think that is the right decision. Fine
grained locking will come to python one day I'm sure.

One of the biggest problems with the GIL for me iswhen Python is embeded
inside some multi-threaded program, like for example inside Apache2 with
the "worker MPM" (a multithreaded apache process model).

But since some operations release the GIL, maybe its not a big issue even
then.

Corey Coughlin · Oct 6, 2004

Speaking of multiprocessor architectures, here's a link I ran across
last month:

http://blogs.sun.com/roller/page/jonathan/20040910#the_difference_between_humans_and

It describes Sun's next chip, the Niagra project, a single chip with 8
processor cores each capable of running 4 threads, for a total of 32
threads. And you can bet they're coming up with multichip servers.
This is actually technology from a company that Sun bought, and now
they're kind of staking the future of the company on it. (Especially
given the Millenium chip fiasco.) But they're not alone by any
strech, AMD says it'll be shipping a multicore opteron next year, and
Intel is already out the HT technology, and they'll be doing
multicores soon too. The future is here, we should be prepared.

------ Corey

Aahz · Oct 6, 2004

I wonder if Python could be changed to use thread local storage? That
might allow for multiple interpreter instances without the GIL (I've
never looked at the actual code so I'm just hypothesizing). I took a
quick look at Lua today and it has no problems with creating multiple
instances of the interpreter, so it definately is a solvable problem.

The problem is that CPython doesn't have thread-local storage. All
objects are created and stored on the heap, and any CPython code can
access any object at any time, thanks to the glories of introspection.
I'm not sure what compromises Jython and IronPython make.

Aahz · Oct 6, 2004

But if only one thread per interpreter could run at a time, in multicore
two (or more) thread can't run at the same time. In performance the
result is almost the same : if the server has nothing else than running
python (not real i know, just to be simple), only one core work...

Not very good for Zope, as multicore are for the near future...

Depends where it bogs down. If it's CPU, it may be an issue. If it's
I/O, CPython already releases the GIL and Zope benefits.

Steve Holden · Oct 6, 2004

Neil said:
Steve Holden:

It is more likely you had a machine that featured 'hyperthreading' which
is much less than multiple cores. Somewhere between two sets of registers
and two processors.

Aah, the penny drops and I realize you are indeed correct. It's using
hyperthreading.

Did you measure a real performance increase, that is, elapsed time to
completion of job? Many benchmarks show minimal or even negative performance
improvements for hyperthreading. Relying on secondary indicators such as CPU
busyness can be misleading.

No, I *was* actually measuring elapsed time to completion, and I was
surprised that the speedup was indeed just about linear. Not often you
come across a task that can be cleanly partitioned in that way.

regards
Steve

Thomas Bellman · Oct 7, 2004

Andreas Kostyrka said:
Well, "GarbageCollection" isn't really an option for a portable ANSI C
program, isn't it?

Depends on what you mean. It is perfectly possible to write a
garbage collection system in portable ANSI C. It will of course
only handle memory that has been properly registered with the
garbage collector, not any plain memory just received from
malloc(). For an example of such a system, look at the Lisp
interpreter in GNU Emacs. Or why not the C-Python interpreter,
which features garbage collection since version 2.0?

Daniel Dittmar · Oct 7, 2004

P.M. said:
I wonder if Python could be changed to use thread local storage? That
might allow for multiple interpreter instances without the GIL (I've
never looked at the actual code so I'm just hypothesizing). I took a
quick look at Lua today and it has no problems with creating multiple
instances of the interpreter, so it definately is a solvable problem.

Most VMs don't use thread local storage, but rather keep all state in
one object that is passed through all the functions. This is probably
solvable in Python.

But this would only allow to run multiple Python VMs in the same
process. These VMs couldn't communicate through Python objects, but only
through IPC. Each VM would still require a ILL (Interpreter Local Lock).
It wouldn't be that different from actually having multiple processes.

Zope wouldn't benefit at all.
Apache with threads + mod_python could benefit, but caching session
state in Python variables becomes much harder.

Daniel

Aahz · Oct 7, 2004

My question is, how can I best parallelize the running of separate
autonomous Python scripts within this app? Can I run multiple
interpreters in separate threads within a single process? In past
newsgroup messages I've seen advice that the only way to get
scalability, due to the GIL, is to use an IPC mechanism between
multiple distinct processes running seperate interpreters. Is this
still true or are there better ways to accomplish this?

I'm not sure where in the thread to hang this, so I went back to the
root to post this reminder:

One critical reason for the GIL is to support CPython's ability to call
random C libraries with little effort. Too many C libraries are not
thread-safe, let alone thread-hot. Forcing libraries that wish to
participate in threading to use Python's GIL-release mechanism is the
only safe approach.

Daniel Dittmar · Oct 7, 2004

Aahz said:
One critical reason for the GIL is to support CPython's ability to call
random C libraries with little effort. Too many C libraries are not
thread-safe, let alone thread-hot. Forcing libraries that wish to
participate in threading to use Python's GIL-release mechanism is the
only safe approach.

Most (or many) wrappers around C libs are generated by SWIG, boost, SIP
and what not. It can't be that difficult to generate code instead so
that entering extension code acquires a lock end leaving the code
releases it. And writing an extension by hand is verbose enough, having
to add the locking code wouldn't really multiply the effort.

One could even add a flag to definitions of native methods that this
method is reentrant. If this flag isn't set, then the interpreter would
acquire and release the lock.

Daniel

Alex Martelli · Oct 7, 2004

Aahz said:
The problem is that CPython doesn't have thread-local storage. All

In 2.4 it does -- see threading.local documentation at
<http://www.python.org/dev/doc/devel/lib/module-threading.html> (and
even better, the docstring of the new _threading_local module).

Alex

Daniel Dittmar · Oct 7, 2004

Aahz said:
The problem is that CPython doesn't have thread-local storage. All

I'm sure P.M. meant that the Python C API uses thread local storage
instead of global/static variables.

Daniel

Aahz · Oct 7, 2004

In 2.4 it does -- see threading.local documentation at
<http://www.python.org/dev/doc/devel/lib/module-threading.html> (and
even better, the docstring of the new _threading_local module).

IIUC, that's not thread-local storage in the sense that I'm using the
term (and which I believe is standard usage). Values created with
thread-local storage module are still allocated on the heap, and it's
still possible to use introspection to access thread-local data in
another thread.

Don't get me wrong; I think it's a brilliant addition to Python.
Unfortunately, it doesn't help with the real issues with making the
Python core free-threaded (or anything more fine-grained than the GIL).

Alex Martelli · Oct 7, 2004

Aahz said:
IIUC, that's not thread-local storage in the sense that I'm using the
term (and which I believe is standard usage). Values created with
thread-local storage module are still allocated on the heap, and it's
still possible to use introspection to access thread-local data in
another thread.

Is it...? Maybe I'm missing something...:

import threading

made_in_main = threading.local()
made_in_main.foo = 23
def f():
print 'foo in made_in_main is', getattr(made_in_main, 'foo', None)

print 'in main thread:',
f()
t = threading.Thread(target=f)
print 'in subthread:',
t.start()
t.join()
print 'back in main thread:',
f()

What I see is:

kallisti:~/cb/little_neat_things alex$ python2.4 lots.py
in main thread: foo in made_in_main is 23
in subthread: foo in made_in_main is None
back in main thread: foo in made_in_main is 23

so how does a subthread introspect to 'break the rules'...? (preferably
in a platform-independent way rather than by taking advantage of quirks
of implementation or one or another platform)

Don't get me wrong; I think it's a brilliant addition to Python.
Unfortunately, it doesn't help with the real issues with making the
Python core free-threaded (or anything more fine-grained than the GIL).

I'm not claiming it does, mind you! It's just that it DOES seem to me
to be a "true" implementation of the "thread-specific storage" design
pattern, and I don't know of any distinction between that DP and the
synonym "thread-local storage" (Schmidt et al used both interchangeably,
if I'm not mistaken).

Alex

Aahz · Oct 7, 2004

Is it...? Maybe I'm missing something...:

If you look at the code for _threading_local, you'll see that it depends
on __getattribute__() to acquire a lock and patch in the current
thread's local state. IIRC from the various threads about securing
Python, there are ways to break __getattribute__(), but I don't remember
any off-hand (and don't have time to research).

Alex Martelli · Oct 7, 2004

Aahz said:
If you look at the code for _threading_local, you'll see that it depends
on __getattribute__() to acquire a lock and patch in the current
thread's local state. IIRC from the various threads about securing
Python, there are ways to break __getattribute__(), but I don't remember
any off-hand (and don't have time to research).

I believe you're thinking of the minimal portable version: I believe
that for such systems as Windows and Linux (together probably 99% of
Python users, and i speak as a Mac fan...!-) there are specific
implementations that use faster and probably unbreakable approaches.

Alex

Bryan Olson · Oct 8, 2004

Alex said:
> Aahz said:

>>If you look at the code for _threading_local, you'll see that it depends
>>on __getattribute__() to acquire a lock and patch in the current
>>thread's local state. IIRC from the various threads about securing
>>Python, there are ways to break __getattribute__() [...]

Click to expand...

>
> I believe you're thinking of the minimal portable version: I believe
> that for such systems as Windows and Linux (together probably 99% of
> Python users, and i speak as a Mac fan...!-) there are specific
> implementations that use faster and probably unbreakable approaches.

Are we looking at the same question? A language that makes safe
threading logically possible is easy; C does that. 'Unbreakable'
is a high standard, but one that a high-level language should
meet. No matter how badly the Python programmer blows it,
Python itself should not crash, nor otherwise punt to arbitrary
behavior.

Now look at implementing Python without a global interpreter
lock (GIL). To work safely, before we rely on any condition,
we must check that the condition is true; for example we do not
access an array element until we've checked that the subscript is
within the current array size. Python does a reasonable job of
safety checking. In the presence of OS-level multi-tasking,
Python could still fail catastrophically. We might be suspended
after the check; then another thread might change the size of
the array; then the former thread might be restored before
accessing the element.

So what's the solution? A global-interpreter-lock works; that's
what Python has now; but we lose much of the power of multi-
processing systems. We could give every object its own lock,
but the locking overhead would defeat the purpose (and inter-
object conditions can be trickier than they might look).

Maybe every object, at any particular time, belongs to just one
thread. That thread can do what it wants with the object, but
any other has to coordinate before accessing the object. I
don't know the current results on that idea.

Functional programming languages point out a possible facility:
threads can safely share non-updatable values. In Python,
ensuring that immutable objects never contain references to
mutable objects is a bigger change than we might reasonably
expect to impose.

Guido did not impose the GIL lightly. This is a hard problem.

Paul Rubin · Oct 8, 2004

Bryan Olson said:
So what's the solution? A global-interpreter-lock works; that's
what Python has now; but we lose much of the power of multi-
processing systems. We could give every object its own lock,
but the locking overhead would defeat the purpose (and inter-
object conditions can be trickier than they might look).

Giving every object its own lock is basically what Java does. Is it
really so bad, if done right? Acquiring or releasing the lock can be
just one CPU instruction on most reasonable processors. I've been
wondering about this for a while.

Alex Martelli · Oct 8, 2004

Bryan Olson said:
Are we looking at the same question? A language that makes safe

Probably not: I'm focusing on (and answering, I hope) the specific
assertion "CPython doesn't have thread-local storage", an assertion
which I think is wrong for 2.4. Even if we do agree it does, that
doesn't necessarily mean such TLS (which CPython exposes to Python
programs) is in the least helpful in implementing CPython itself, of
course; nevertheless it seems reasonable to me to try and answer
assertions which I believe are wrong, even when those assertions may not
be directly relevant to a thread's "Subject".

Guido did not impose the GIL lightly. This is a hard problem.

Sure. I wonder what (e.g.) IronPython or Ruby do about it -- never
studied the internals of either, yet.

Alex

Aahz · Oct 8, 2004

Sure. I wonder what (e.g.) IronPython or Ruby do about it -- never
studied the internals of either, yet.

Ruby doesn't have OS-level threading IIRC.

Bryan Olson · Oct 10, 2004

Paul said:
>

>
> Giving every object its own lock is basically what Java does. Is it
> really so bad, if done right? Acquiring or releasing the lock can be
> just one CPU instruction on most reasonable processors. I've been
> wondering about this for a while.

I'm not really up-to-date on modern multi-processor support.
Back in grad school I read some papers on cache coherence, and I
don't know how well the problems have been solved. The issue
was that a single processor can support a one-instruction lock
(in the usual no-contention case) simply by supplying an
uninterruptable read-and-update instruction, but on a multi-
processor, all the processors have respect the lock.

Parallelization on multi-CPU hardware?	0	Oct 5, 2004
Overcoming python performance penalty for multicore CPU	19	Feb 2, 2010
speed performances / hardware / cpu	8	Nov 16, 2006
Embedding multiple interpreters	14	Dec 6, 2013
multi-CPU, GIL, threading on linux	0	Jun 14, 2005
The future of "frozen" types as the number of CPU cores increases	27	Feb 16, 2010
Advice regarding multiprocessing module	0	Mar 11, 2013
Module missing when embedding?	0	Dec 12, 2013

Parallelization on muli-CPU hardware?

Guest

Corey Coughlin

Aahz

Aahz

Steve Holden

Thomas Bellman

Daniel Dittmar

Aahz

Daniel Dittmar

Alex Martelli

Daniel Dittmar

Aahz

Alex Martelli

Aahz

Alex Martelli

Bryan Olson

Paul Rubin

Alex Martelli

Aahz

Bryan Olson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads