Status of Python threading support (GIL removal)?

K

Kay Schluehr

Maybe, yes. But many different concurrency models are supported by a larger
number of programming languages in one way or another, so the choice of an
appropriate library is often sufficient - and usually a lot easier than
using the 'most appropriate' programming language. Matter of available
skills, mostly. There's usually a lot less code to be written that deals
with concurrency than code that implements what the person paying you makes
money with, so learning a new library may be worth it, while learning a new
language may not.

Stefan

This implies that people stay defensive concerning concurrency ( like
me right now ) and do not embrace it like e.g. Erlang does. Sometimes
there is a radical change in the way we design applications and a
language is the appropriate medium to express it succinctly.
Concurrency is one example, writing GUIs and event driven programs in
a declarative style ( Flex, WPF, JavaFX ) is another one. In
particular the latter group shows that new skills are adopted rather
quickly.

I don't see that a concurrency oriented language has really peaked
though yet.
 
C

Carl Banks

    Carl> Maybe you don't intend to sound like you're saying "shut up and
    Carl> use C", but to me, that's how you come off.  If you're going to
    Carl> advise someone to use C, at least try to show some understanding
    Carl> for their concerns--it would go a long way.

Then you haven't been listening.  This topic comes up over and over and over
again.  It's a well-known limitation of the implementation.  Poking people
in the eye with it over and over doesn't help.  The reasons for the
limitation are explained every time the topic is raised.  In the absence of
a GIL-less CPython interpreter you are simply going to have to look
elsewhere for performance improvements I'm afraid.  Yes, I'll drag out the
same old saws:

    * code hot spots in C or C++

    * use tools like Pyrex, Cython, Psyco or Shed Skin

    * for array procesing, use numpy, preferably on top of a recent enough
      version of Atlas which does transparent multi-threading under the
      covers

    * use multiple processes

    * rewrite your code to use more efficient algorithms

I don't write those out of ignorance for your plight.  It's just that if you
want a faster Python program today you're going to have to look elsewhere
for your speedups.

Just for the record, I am not taking issue with the advice itself
(except that you forgot "use Jython/IronPython which have no GIL").
I'm not even saying that Python was wrong for having the GIL.

All I'm saying is that [this is not aimed specifically at you] this
advice can be delivered with more respect for the complainer's
problem, and less fanboy-like knee-jerk defensiveness of Python.


Carl Banks
 
O

OdarR

Add:
Carl, Olivier & co. - You guys know exactly what I wanted.
Others: Going back to C++ isn't what I had in mind when I started
initial testing for my project.

Do you think multiprocessing can help you seriously ?
Can you benefit from multiple cpu ?

did you try to enhance your code with numpy ?

Olivier
(installed a backported multiprocessing on his 2.5.1 Python, but need
installation of Xcode first)
 
J

Jure Erznožnik

Do you think multiprocessing can help you seriously ?
Can you benefit from multiple cpu ?

did you try to enhance your code with numpy ?

Olivier
(installed a backported multiprocessing on his 2.5.1 Python, but need
installation of Xcode first)

Multithreading / multiprocessing can help me with my problem. As you
know, database reading is typically I/O bound so it helps to put it in
a separate thread. I might not even notice the GIL if I used SQL
access in the first place. As it is, DBFPY is pretty CPU intensive
since it's a pure Python DBF implementation.
To continue: the second major stage (summary calculations) is
completely CPU bound. Using numpy might or might not help with it.
Those are simple calculations, mostly additions. I try not to put the
entire database in arrays to save memory and so I mostly just add
counters where I can. Soe functions simply require arrays, but they
are more rare, so I guess I'm safe with that. You wouldn't believe how
complex some reports can be. Threading + memory saving is a must and
even so, I'll probably have to implement some sort of serialization
later on, so that the stuff can run on more memory constrained
devices.
The third major stage, rendering engine, is again mostly CPU bound,
but at the same time it's I/O bound as well when outputting the
result.

All three major parts are more or less independent from each other and
can run simultaneously, just with a bit of a delay. I can perform
calculations while waiting for the next record and I can also start
rendering immediately after I have all the data for the first group
available.

I may use multiprocessing, but I believe it introduces more
communication overhead than threads and am so reluctant to go there.
Threads were perfect, other stuff wasn't. To make things worse, no
particular extension / fork / branch helps me here. So if I wanted to
just do the stuff in Python, I'd have to move to Jthon or IronPython
and hope cPython eventually improves in this area. I do actually need
cPython since the other two aren't supported on all platforms my
company intends to support.

The main issue I currently have with GIL is that execution time is
worse when I use threading. Had it been the same, I wouldn't worry too
much about it. Waiting for a permenent solution would be much easier
then...
 
H

Hendrik van Rooyen

Kay Schluehr said:
This implies that people stay defensive concerning concurrency ( like
me right now ) and do not embrace it like e.g. Erlang does. Sometimes
there is a radical change in the way we design applications and a
language is the appropriate medium to express it succinctly.
Concurrency is one example, writing GUIs and event driven programs in
a declarative style ( Flex, WPF, JavaFX ) is another one. In
particular the latter group shows that new skills are adopted rather
quickly.

I don't see that a concurrency oriented language has really peaked
though yet.

I think that this is because (like your link has shown) the problem
is really not trivial, and also because the model that can bring
sanity to the party (independent threads/processes that communicate
with queued messages) is seen as inefficient at small scale.

- Hendrik
 
S

Stefan Behnel

Christian said:
Hard computations gain more speed from carefully crafted C or Fortran
code that utilizes features like the L1 and L2 CPU cache, SIMD etc. or
parallelized algorithms. If you start sharing values between multiple
cores you have a serious problem.

Oh, and use NumPy for the job ;) [...]
It *is* a well known limitation of Python. All the nice 'n shiny syntax
and features are coming with a cost. Python is a powerful language and
good tool for lots of stuff. But Python is and will never become the
übertool that solves every problem perfectly. At some point you need a
different tool to get the raw power of your machine. C (and perhaps
Fortran) are the weapons of choice for number crunching.

Well, and there's always Cython to the rescue when you need it.

Stefan
 
S

Stefan Behnel

Jure said:
Sorry, no intent to offend anyone here. Flame wars are not my thing.

I have shown my benchmarks. See first post and click on the link.
That's the reason I started this discussion.

All I'm saying is that you can get threading benefit, but only if the
threading in question is implemented in C plugin.
I have yet to see pure Python code which does take advantage of
multiple cores. From what I read about GIL, this is simply impossible
by design.

Well, CPython is written in C. So running Python code in CPython will
necessarily run C code (whatever "plugin" means in your post above). If
that C code frees the GIL or not depends on the parts of CPython or
external packages that you use. And there are many parts that free the GIL
and will thus benefit (sometimes heavily) from threading and
multiple-cores, and there are also many parts that do not free the GIL and
will therefore not (or likely not) benefit from multiple-cores.

Claiming that "pure Python code does not free the GIL" in the context of
CPython when you define "pure Python code" as code that does not depend on
C code is plain flawed.

Stefan
 
J

Jeremy Sanders

Jesse said:
Sorry, you're incorrect. I/O Bound threads do in fact, take advantage
of multiple cores.

I don't know whether anyone else brought this up, but it looks
like Python has problems with even this form of threading

http://www.dabeaz.com/python/GIL.pdf

It's certainly a very interesting read if you're interested in
this subject.
 
A

Aahz

So, recently I started writing a part of this new system in Python. A
report generator to be exact. Let's not go into existing offerings,
they are insufficient for our needs.

First I started on a few tests. I wanted to know how the reporting
engine will behave if I do this or that. One of the first tests was,
naturally, threading. The reporting engine itself will have separate,
semi-independent parts that can be threaded well, so I wanted to test
that.

This is not something that I would expect Python threads to provide a
performance boost for. I would expect that if it were a GUI app, it
would improve responsiveness, properly designed. If performance were a
goal, I would start by profiling it under a single-threaded design and
see where the hotspots were, then either choose one of several options
for improving performance or go multi-process.

Note that I'm generally one of the Python thread boosters (unlike some
people who claim that Python threads are worthless), but I also never
claim that Python threads are good for CPU-intensive operations (which
report generation is), *except* for making GUI applications more
responsive.
 
P

Paul Rubin

Hendrik van Rooyen said:
I think that this is because (like your link has shown) the problem
is really not trivial, and also because the model that can bring
sanity to the party (independent threads/processes that communicate
with queued messages) is seen as inefficient at small scale.

That style works pretty well in Python and other languages. The main
gripe about it for Python is the subject of this thread, i.e. the GIL.
 
H

Hendrik van Rooyen

Paul Rubin said:
That style works pretty well in Python and other languages. The main
gripe about it for Python is the subject of this thread, i.e. the GIL.

I have found that if you accept it, and sprinkle a few judicious
time.sleep(short_time)'s around, things work well. Sort of choosing
yourself when the thread gives up its turn.

- Hendrik
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,152
Latest member
LorettaGur
Top