The Future of Python Threading

Justin T. · Aug 10, 2007

There may be something to this. On the other hand, there's no _guarantee_
that code written with greenlets will work with pre-emptive threading instead
of cooperative threading. There might be a tendency on the part of developers
to try to write code which will work with pre-emptive threading, but it's just
that - a mild pressure towards a particular behavior. That's not sufficient
to successfully write correct software (where "correct" in this context means
"works when used with pre-emptive threads", of course).

Agreed. Stackless does include a preemptive mode, but if you don't use
it, then you don't need to worry about locking at all. It would be
quite tricky to get around this, but I don't think it's impossible.
For instance, you could just automatically lock anything that was not
a local variable. Or, if you required all tasklets in one object to
run in one thread, then you would only have to auto-lock globals.

One also needs to consider the tasks necessary to really get this integration
done. It won't change very much if you just add greenlets to the standard
library. For there to be real consequences for real programmers, you'd
probably want to replace all of the modules which do I/O (and maybe some
that do computationally intensive things) with versions implemented using
greenlets. Otherwise you end up with a pretty hard barrier between greenlets
and all existing software that will probably prevent most people from changing
how they program.

If the framework exists to efficiently multi-thread python, I assume
that the module maintainers will slowly migrate over if there is a
performance benefit there.

Then you have to worry about the other issues greenlets introduce, like
invisible context switches, which can make your code which _doesn't_ use
pre-emptive threading broken.

Not breaking standard python code would definitely be priority #1 in
an experiment like this. I think that by making the changes at the
core we could achieve it. A standard program, after all, is just 1
giant tasklet.

All in all, it seems like a wash to me. There probably isn't sufficient
evidence to answer the question definitively either way, though. And trying
to make it work is certainly one way to come up with such evidence.

::Sigh:: I honestly don't see myself having time to really do anything
more than experiment with this. Perhaps I will try to do that though.
Sometimes I do grow bored of my other projects.

Justin

Beorn · Aug 10, 2007

Btw, although overly simple (single CPU system!), this benchmark is
pretty interesting:

http://muharem.wordpress.com/2007/07/31/erlang-vs-stackless-python-a-first-benchmark/

About the GIL:

I think I've heard Guido say the last attempt at removing the Global
Interpreter Lock (GIL) resulted in a Python that was much slower...
which kind-of beats the purpose. I don't think it's feasible to
remove the GIL in CPython; the best hope of a GIL-free Python might
be PyPy.

The general trend seems to be that it's hard enough to program single-
thread programs correctly, adding to that extreme concurrency
awareness as required when you're programming with threads means it's
practically impossible for most programmers to get the programs
right. The shared-nothing model seems to be a very workable way to
scale for many programs (at least web apps/services).

Seun Osewa · Aug 11, 2007

I think I've heard Guido say the last attempt at removing the Global

Interpreter Lock (GIL) resulted in a Python that was much slower...

What is it about Python that makes a thread-safe CPython version much
slower? Why doesn'ttrue threading slow down other languages like Perl
and Java?

I'm thinking it might be the reference counting approach to memory
management... but what do you guys think is the reason?

Paul Rubin · Aug 11, 2007

Seun Osewa said:
What is it about Python that makes a thread-safe CPython version much
slower?...
I'm thinking it might be the reference counting approach to memory
management...

Yes. In the implementation that was benchmarked, if I understand
correctly, every refcount had its own lock, that was acquired and
released every time the refcount was modified.

fdu.xiaojf · Aug 11, 2007

Justin said:
Uh oh, my ulterior motives have been discovered!

I'm aware of Erlang, but I don't think it's there yet. For one thing,
it's not pretty enough. It also doesn't have the community support
that a mainstream language needs. I'm not saying it'll never be
adequate, but I think that making python into an Erlang competitor
while maintaining backwards compatibility with the huge amount of
already written python software will make python a very formidable
choice as languages adapt more and more multi-core support. Python is
in a unique position as its actually a flexible enough language to
adapt to a multi-threaded environment without resorting to terrible
hacks.

Justin

Multi-core or multi-CPU computers are more and more popular, especially in
scientific computation. It will be a great great thing if multi-core support
can be added to python.

Bryan Olson · Aug 11, 2007

Justin said:
True, but Python seems to be the *best* place to tackle this problem,
at least to me. It has a large pool of developers, a large standard
library, it's evolving, and it's a language I like . Languages that
seamlessly support multi-threaded programming are coming, as are
extensions that make it easier on every existent platform. Python has
the opportunity to lead that change.

I have to disagree. A dynamic scripting language, even a great
one such as Python, is not the vehicle to lead an advance of
pervasive threading. The features that draw people to Python
do not play nicely with optimizing multi-core efficiency; they
didn't play all that well with single-core efficiency either.
Python is about making good use of our time, not our machines'
time.

Finer-grain locking sounds good, but realize how many items
need concurrency control. Python offers excellent facilities
for abstracting away complexity, so we programmers do not
mentally track all the possible object interactions at once.
In a real application, objects are more intertwingled than
we realize. Whatever mistakes a Python programmer makes,
the interpreter must protect its own data structures
against all possible race errors.

Run-time flexibility is a key Python feature, and it
necessarily implies that most bindings are subject to
change. Looking at a Python function, we tend to focus on
the named data objects, and forget how many names Python
is looking up at run time. How often do we take and release
locks on all the name-spaces that might effect execution?

The GIL both sucks and rocks. Obviously cores are becoming
plentiful and the GIL limits how we can exploit them. On
the other hand, correctness must dominate efficiency. We
lack exact reasoning on what we must lock; with the GIL,
we err on the side of caution and correctness. Instead of
trying to figure out what we need to lock, we default to
locking everything, and try to figure out what we can
safely release.

[Steve Holden:]

I knew somebody was going to say that! I'm pretty busy, but I'll see
if I can find some time to look into it.

If you want to lead fixing Python's threading, consider
first delivering a not-so-grand but definite
improvement. Have you seen how the 'threading' module
in Python's standard library implements timeouts?

Cameron Laird · Aug 11, 2007

What is it about Python that makes a thread-safe CPython version much
slower? Why doesn'ttrue threading slow down other languages like Perl
and Java?

I'm thinking it might be the reference counting approach to memory
management... but what do you guys think is the reason?

Crudely, Perl threading is fragile, and Java requires
coding at a lower level.

Memory management is indeed important, and arguably
deserves at least as much attention as multi-core
accomodations. I know of no easy gains from here
on, just engineering trade-offs.

Cameron Laird · Aug 11, 2007

.
.
.

There's nothing "undocumented" about IPC. It's been around as a
technique for decades. Message passing is as old as the hills.

.
.
.
.... and has significant successes to boast, including
the highly-reliable and high-performing QNX real-time
operating system, and the already-mentioned language
Erlang.

Kay Schluehr · Aug 11, 2007

Have you checked out the processing [1] package? I've currently the
impression that people want to change the whole language before they
checkout a new package. It would be nice to read a review.

[1] http://cheeseshop.python.org/pypi/processing

Nick Craig-Wood · Aug 11, 2007

Bjoern Schliessmann said:
Nick Craig-Wood wrote:
[GIL]

That is certainly true. However the point being is that running
on 2 CPUs at once at 95% efficiency is much better than running on
only 1 at 99%...

Click to expand...

How do you define this percent efficiency?

Those are hypothetical numbers. I guess that a finely locked python
will spend a lot more time locking and unlocking individual objects
than it currently does locking and unlocking the GIL. This is for two
reasons

1) the GIL will be in cache at all times and therefore "hot" and quick
to access

2) much more locking and unlocking of each object will need to be
done.

Strange, in my programs, I don't need any "real" concurrency (they
are network servers and scripts). Or do you mean "the future of
computing hardware is multi-core"? That indeed may be true.

I meant the latter. I agree with you though that not all programs
need to be multi-threaded. Hence my proposal for two python binaries.

So, how much performance gain would you get? Again, managing
fine-grained locking can be much more work than one simple lock.

Assuming that you are not IO bound, but compute bound and that compute
is being done in python then you'll get a speed up proportional to how
many processors you have, minus a small amount for locking overhead.

How do you compare a byte code interpreter to a monolithic OS
kernel?

In this (locking) respect they are quite similar actually. You can
think of kernel code as being the python interpreter (BKL vs GIL),
user space being as C extensions running with the GIL unlocked and
calling back into the python interpreter / kernel.

From where do you take this certainty? For example, if the program
in question involves mostly IO access, there will be virtually no
gain. Multithreading is not Performance.

Yes you are right of course. IO bound tasks don't benefit from
multi-threading. In fact usually the reverse. Twisted covers this
ground extremely well in my experience. However IO bound tasks
probably aren't taxing your quad core chip either...

Also, C extensions can release the GIL for long-running
computations.

Provided they stay in C. If they call any python stuff then they need
to take it again.

Bjoern Schliessmann · Aug 11, 2007

Nick said:
Bjoern Schliessmann <[email protected]>

Assuming that you are not IO bound, but compute bound and that
compute is being done in python then you'll get a speed up
proportional to how many processors you have, minus a small amount
for locking overhead.

Assuming this: Agreed.

Regards,

Björn

Ben Sizer · Aug 11, 2007

This is simply not true. Firstly, there's a well defined difference
between 'process' and a 'thread' and that is that processes have
private memory spaces. Nobody says "process" when they mean threads of
execution within a shared memory space and if they do they're wrong.

I'm afraid that a lot of what students will be taught does exactly
this, because the typical study of concurrency is in relation to
contention for shared resources, whether that be memory, a file, a
peripheral, a queue, etc. One example I have close to hand is
'Principles of Concurrent and Distributed Programming', which has no
mention of the term 'thread'. It does have many examples of several
processes accessing shared objects, which is typically the focus of
most concurrent programming considerations.

The idea that processes have memory space completely isolated from
other processes is both relatively recent and not universal across all
platforms. It also requires you to start treating memory as
arbitrarily different from other resources which are typically
shared.

And no, "most" academic study isn't limited to shared memory spaces.
In fact, almost every improvement in concurrency has been moving
*away* from simple shared memory - the closest thing to it is
transactional memory, which is like shared memory but with
transactional semantics instead of simple sharing.

I think I wasn't sufficiently clear; research may well be moving in
that direction, but you can bet that the typical student with their
computer science or software engineering degree will have been taught
far more about how to use synchronisation primitives within a program
than how to communicate between arbitrary processes.

There's nothing "undocumented" about IPC. It's been around as a
technique for decades. Message passing is as old as the hills.

I didn't say undocumented, I said underdocumented. The typical
programmer these days comes educated in at least how to use a mutex or
semaphore, and will probably look for that capability in any language
they use. They won't be thinking about creating an arbitrary message
passing system and separating their project out into separate
programs, even if that has been what UNIX programmers have chosen to
do since 1969. There are a multitude of different ways to fit IPC into
a system, but only a few approaches to threading, which also happen to
coincide quite closely to how low-level OS functionality handles
processes meaning you tend to get taught the latter. That's why it's
useful for Python to have good support for it.

There's nothing that Python does to make IPC hard, either. There's
nothing in the standard library yet, but you may be interested in Pyro
(http://pyro.sf.net) or Parallel Python
(http://www.parallelpython.com/). It's not erlang, but it's not hard
either. At least, it's not any harder than using threads and locks.

Although Pyro is good in what it does, simple RPC alone doesn't solve
most of the problems that typical threading usage does. IPC is useful
for the idea of submitting jobs in the background but it doesn't fit
so well to situations where there are parallel loops both acting on a
shared resource. Say you have a main thread and a network reading
thread - given a shared queue for the data, you can safely do this by
adding just 5 lines of code: 2 locks, 2 unlocks, and a call to start
the networking thread. Implementing that using RPC will be more
complex, or less efficient, or probably both.

Ant · Aug 11, 2007

What is it about Python that makes a thread-safe CPython version much
slower? Why doesn'ttrue threading slow down other languages like Perl
and Java?

I have no idea about Perl - but there's probably some hideous black
magic going on ;-)

As for Java, making code thread safe *does* slow down the code. It is
the very reason that the language designers made the collections API
non-thread safe by default (you have to wrap the standard collections
in a synchronised wrapper to make them thread safe).

Seun Osewa · Aug 13, 2007

Have you checked out the processing [1] package? I've currently the
impression that people want to change the whole language before they
checkout a new package. It would be nice to read a review.

[1]http://cheeseshop.python.org/pypi/processing

Sounds interesting. How does it work? tell us about it!

Seun Osewa · Aug 22, 2007

Yes, but if you reduce the coupling between threads in Java (by using
the recommended Python approach of communicating with Queues) you get
the full speed of all the cores in your CPU. i wonder why we can't
have this in Java; it will be very good for servers!

embedded python and threading	7	Jul 26, 2013
threading	1	Apr 6, 2014
Multi-threading in Python vs Java	7	Oct 11, 2013
The future of Python immutability	50	Sep 3, 2009
Trouble with Multi-threading	8	Dec 10, 2013
Please help with Threading	13	May 18, 2013
Future standard GUI library	51	May 18, 2013
The future of "frozen" types as the number of CPU cores increases	27	Feb 16, 2010

The Future of Python Threading

Justin T.

Beorn

Seun Osewa

Paul Rubin

fdu.xiaojf

Bryan Olson

Cameron Laird

Cameron Laird

Kay Schluehr

Nick Craig-Wood

Bjoern Schliessmann

Ben Sizer

Ant

Seun Osewa

Seun Osewa

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads