Status of Python threading support (GIL removal)?

Jesse Noller · Jun 19, 2009

Incorrect. They take advantage of OS threading support where another
thread can run while one is blocked for I/O.
That is not equal to running on multiple cores (though it actually
does do that, just that cores are all not well utilized - sum(x) <
100% of one core).
You wil get better performance running on single core because of the
way GIL is implemented in all cases.

No. That's simply incorrect. I (and others) have seen significant
increases in threaded, I/O bound code using multiple cores. It's not
perfect, but it is faster.

Aahz · Jun 20, 2009

I do aggree though that threading is important. Regardless of any
studies showing that threads suck, they are here and they offer
relatively simple concurrency. IMHO they should never have been
crippled like this. Even though GIL solves access violations, it's not
the right approach. It simply kills all threading benefits except for
the situation where you work with multiple I/O blocking threads.
That's just about the only situation where this problem is not
apparent.

NumPy?

Aahz · Jun 20, 2009

I wish Pythonistas would be more willing to acknowledge the (few)
drawbacks of the language (or implementation, in this case) instead of
all this rationalization.

Please provide more evidence that Pythonistas are unwilling to
acknowledge the drawbacks of the GIL. I think you're just spreading FUD.
The problem is that discussions about the GIL almost invariably include
false statements about the GIL, and I'm certainly not going to hamper my
writing to always include the caveats about GIL drawbacks while
correcting the wrong information.

Aahz · Jun 20, 2009

Incorrect. They take advantage of OS threading support where another
thread can run while one is blocked for I/O. That is not equal to
running on multiple cores (though it actually does do that, just that
cores are all not well utilized - sum(x) < 100% of one core). You wil
get better performance running on single core because of the way GIL is
implemented in all cases.

You should put up or shut up -- I've certainly seen multi-core speedup
with threaded software, so show us your benchmarks!

Ross Ridge · Jun 20, 2009

Jesse Noller said:
Sorry, you're incorrect. I/O Bound threads do in fact, take advantage
of multiple cores.

Incorrect. They take advantage of OS threading support where another
thread can run while one is blocked for I/O. That is not equal to
running on multiple cores (though it actually does do that, just that
cores are all not well utilized - sum(x) < 100% of one core). You wil
get better performance running on single core because of the way GIL is
implemented in all cases.

Aahz said:
You should put up or shut up -- I've certainly seen multi-core speedup
with threaded software, so show us your benchmarks!

By definition an I/O bound thread isn't CPU bound so won't benefit from
improved CPU resources.

Ross Ridge

Jure Erznožnik · Jun 20, 2009

You should put up or shut up -- I've certainly seen multi-core speedup
with threaded software, so show us your benchmarks!
--

Sorry, no intent to offend anyone here. Flame wars are not my thing.

I have shown my benchmarks. See first post and click on the link.
That's the reason I started this discussion.

All I'm saying is that you can get threading benefit, but only if the
threading in question is implemented in C plugin.
I have yet to see pure Python code which does take advantage of
multiple cores. From what I read about GIL, this is simply impossible
by design.

But I'm not disputing the fact that cPython as a whole can take
advantage of multiple cores. There certainly are built-in objects that
work as they should.

Lie Ryan · Jun 20, 2009

Jure said:
Sorry, no intent to offend anyone here. Flame wars are not my thing.

I have shown my benchmarks. See first post and click on the link.
That's the reason I started this discussion.

All I'm saying is that you can get threading benefit, but only if the
threading in question is implemented in C plugin.
I have yet to see pure Python code which does take advantage of
multiple cores. From what I read about GIL, this is simply impossible
by design.

But I'm not disputing the fact that cPython as a whole can take
advantage of multiple cores. There certainly are built-in objects that
work as they should.

I never used threading together with I/O intensively before, but I heard
that I/O operations releases the GIL, and such they're similar to
GIL-releasing C extensions which makes it possible to benefit from
multicore in I/O bound pure python code.

Perhaps we should have more built-in/stdlib operations that can release
GIL safely to release GIL by default? And perhaps some builtin/stdlib
should receive an optional argument that instruct them to release GIL
and by passing this argument, you're making a contract that you wouldn't
do certain things that would disturb the builtin/stdlib's operations;
the specifics of what operations are prohibited would be noted on their
docs.

Piet van Oostrum · Jun 20, 2009

Jure Erzno¸nik said:
JE> Sorry, just a few more thoughts:
JE> Does anybody know why GIL can't be made more atomic? I mean, use
JE> different locks for different parts of code?
JE> This way there would be way less blocking and the plugin interface
JE> could remain the same (the interpreter would know what lock it used
JE> for the plugin, so the actual function for releasing / reacquiring the
JE> lock could remain the same)
JE> On second thought, forget this. This is probably exactly the cause of
JE> free-threading reduced performance. Fine-graining the locks increased
JE> the lock count and their implementation is rather slow per se.

The major obstacles are the refcounts. It would mean that each
refcounted object would need a separate lock including constants like 0,
1, True, None. You would have much more locking and unlocking then with
the GIL. So to get rid of it first the refcounting has to go. But that
is not sufficient.

When the GIL will be removed I suspect that many concurrent programs
will start to fail subtly because they make assumptions about the
atomicity of the operations. In CPython each bytecode is atomic, but
when there is no GIL this is no longer true.

Java has defined this quite rigorously in its memory model (although
there first memory model had subtle bugs): Read and write operations on
32-bit quantities are atomic, others not. This means that on a 64-bit
system, where pointers are 64-bit even assignments on variables of
object types are not atomic and have to be protected with
synchronized. (My info is a few years old, so in the meantime it may
have changed.) In a GIL-less python the same would be true. Of course the
hardware that your program is running on could have atomic read/writes
on 64-bit quantities but if you rely upon that your program may no
longer be portable.

Piet van Oostrum · Jun 20, 2009

Jure Erzno¸nik said:
JE> I have shown my benchmarks. See first post and click on the link.
JE> That's the reason I started this discussion.

JE> All I'm saying is that you can get threading benefit, but only if the
JE> threading in question is implemented in C plugin.
JE> I have yet to see pure Python code which does take advantage of
JE> multiple cores. From what I read about GIL, this is simply impossible
JE> by design.

In fact, at least theoretically, your original application could benefit
from multiple cores. Take this scenario:

You have one thread that reads from a file. On each read the GIL is
released until the read has completed. In the meantime the other thread
could do some CPU-intensive work. Now if the read comes entirely from
the O.S. cache it is also CPU-bound. So there a second core comes in
handy. Now usually these reads will not consume so much CPU so it will
probably be hardly noticeable. But if you would have some kind of
CPU-intensive User-space File System, for example with compression
and/or encryption and the data is in memory you might notice it. In this
example all your application code is written in Python.

Piet van Oostrum · Jun 20, 2009

Ross Ridge said:
RR> By definition an I/O bound thread isn't CPU bound so won't benefit from
RR> improved CPU resources.

But doing I/O is not the same as being I/O bound. And Python allows
multithreading when a thread does I/O even if that thread is not I/O
bound but CPU bound. See my other posting for an example.

Carl Banks · Jun 20, 2009

http://svn.python.org/view/python/trunk/Modules/posixmodule.c?revisio...

Search for Py_BEGIN_ALLOW_THREADS / Py_END_ALLOW_THREADS

Hard computations gain more speed from carefully crafted C or Fortran
code that utilizes features like the L1 and L2 CPU cache, SIMD etc. or
parallelized algorithms. If you start sharing values between multiple
cores you have a serious problem.

Oh, and use NumPy for the job

It *is* a well known limitation of Python. All the nice 'n shiny syntax
and features are coming with a cost. Python is a powerful language and
good tool for lots of stuff. But Python is and will never become the
übertool that solves every problem perfectly. At some point you need a
different tool to get the raw power of your machine. C (and perhaps
Fortran) are the weapons of choice for number crunching.

This is the narrowminded attitude that irritates me.

Here's the thing: not everyone complaining about the GIL is trying to
get the "raw power of their machines." They just want to take
advantage of multiple cores so that their Python program runs
faster.

It would be rude and presumptuous to tell such a person, "Well, GIL
isn't that big of a deal because if you want speed you can always
rewrite it in C to take advantage of multiple cores".

Carl Banks

Carl Banks · Jun 20, 2009

Please provide more evidence that Pythonistas are unwilling to
acknowledge the drawbacks of the GIL.

I will not, since I was not making an assertion, but an observation
that some Pythonistas ignore (and even outright dismiss) claims that
Python has a drawback by countering with a suggestion that they should
be doing it some other, often more obtuse, way anyway--and expressing
a wish that I could observe this less often.

Carl Banks

Kay Schluehr · Jun 20, 2009

You might want to read about "The Problem with Threads":

http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf

and then decide to switch to an appropriate concurrency model for your use
case.

and to a programming language that supports it.

OdarR · Jun 20, 2009

Here's the thing: not everyone complaining about the GIL is trying to
get the "raw power of their machines." They just want to take
advantage of multiple cores so that their Python program runs
faster.

It would be rude and presumptuous to tell such a person, "Well, GIL
isn't that big of a deal because if you want speed you can always
rewrite it in C to take advantage of multiple cores".

merci Carl, you expressed what I would like to tell.

Olivier

skip · Jun 20, 2009

Carl> Here's the thing: not everyone complaining about the GIL is trying
Carl> to get the "raw power of their machines." They just want to take
Carl> advantage of multiple cores so that their Python program runs
Carl> faster.

If their code is CPU-bound it's likely that rewriting critical parts in C or
using packages like numpy would improve there performance with or without
multi-threading. For people who aren't used to C there are tools like Pyrex
and Cython which provide a middle road.

Skip

Stefan Behnel · Jun 20, 2009

Kay said:
and to a programming language that supports it.

Maybe, yes. But many different concurrency models are supported by a larger
number of programming languages in one way or another, so the choice of an
appropriate library is often sufficient - and usually a lot easier than
using the 'most appropriate' programming language. Matter of available
skills, mostly. There's usually a lot less code to be written that deals
with concurrency than code that implements what the person paying you makes
money with, so learning a new library may be worth it, while learning a new
language may not.

Stefan

Carl Banks · Jun 21, 2009

Carl> Here's the thing: not everyone complaining about the GIL is trying
Carl> to get the "raw power of their machines." They just want to take
Carl> advantage of multiple cores so that their Python program runs
Carl> faster.

If their code is CPU-bound it's likely that rewriting critical parts in C or
using packages like numpy would improve there performance with or without
multi-threading. For people who aren't used to C there are tools like Pyrex
and Cython which provide a middle road.

Once again you miss the point.

I'm sure you think you're trying to be helpful, but you're coming off
as really presumptuous with this casual dismissal of their concerns.

There are many, many valid reasons why people don't want to stray from
Pure Python. Converting to C, Fortran, Pyrex, etc. has a significant
cost (in both implementation and maintenance): much more than the cost
of parallelizing pure Python code. I guess what really bothers me
about this is how easily people throw out "shut up and use C" for some
things, especially things that quite reasonably appear to be a silly
limitation.

Maybe you don't intend to sound like you're saying "shut up and use
C", but to me, that's how you come off. If you're going to advise
someone to use C, at least try to show some understanding for their
concerns--it would go a long way.

Carl Banks

Jure Erznožnik · Jun 21, 2009

Look, guys, here's the thing:
In the company I work at we decided to rewrite our MRP system in
Python. I was one of the main proponents of it since it's nicely cross
platform and allows for quite rapid application development. The
language and it's built in functions are simply great. The opposition
was quite strong, especially since the owner cheered for it - .net.

So, recently I started writing a part of this new system in Python. A
report generator to be exact. Let's not go into existing offerings,
they are insufficient for our needs.

First I started on a few tests. I wanted to know how the reporting
engine will behave if I do this or that. One of the first tests was,
naturally, threading. The reporting engine itself will have separate,
semi-independent parts that can be threaded well, so I wanted to test
that.

The rest you know if you read the two threads I started on this group.

Now, the core of the new application is designed so that it can be
clustered so it's no problem if we just start multiple instances on
one server, say one for each available core.

The other day, a coworker of mine said something like: what?!? you've
been using Python for two days "already" and you already say it's got
a major fault?
I kinda aggreed with him, especially since this particular coworker
programmed strictly in Python for the last 6 months (and I haven't due
to other current affairs). There was no way my puny testing could
reveal such a major drawback. As it turns out, I was right. I have
programmed enough threading to have tried enough variations which all
reveal the GIL. Which I later confirmed through searching on the web.

My purpose with developing the reporting engine in Python was twofold:
learn Python as I go and create a native solution which will work out-
of-the-box for all systems we decide to support. Making the thing open
source while I'm at it was a side-bonus.

However:
Since the testing revealed this, shall we say "problem", I am tempted
to just use plain old C++ again. Furthermore, I was also not quite
content with the speed of arithmetic processing of the python engine.
I created some simple aggregating objects that only performed two
additions per pass. Calling them 200K times took 4 seconds. This is
another reason why I'm beginning to think C++ might be a better
alternative. I must admit, had the GIL issue not popped up, I'd just
take the threading benefits and forget about it.

But both things together, I'm thinking I need to rethink my strategy
again.
I may at some point decide that learning cross platform programming is
worth a shot and just write a Python plugin for the code I write. The
final effect will be pretty much the same, only faster. Perhaps I will
even manage to get close to Crystal Reports speed, though I highly
doubt that. But in the end, my Python skill will suffer. I still have
an entire application (production support) to develop in it.

Thanks for all the information and please don't flame each other.
I already get the picture that GIL is a hot subject.

Jure Erznožnik · Jun 21, 2009

Add:
Carl, Olivier & co. - You guys know exactly what I wanted.
Others: Going back to C++ isn't what I had in mind when I started
initial testing for my project.

skip · Jun 21, 2009

Carl> I'm sure you think you're trying to be helpful, but you're coming
Carl> off as really presumptuous with this casual dismissal of their
Carl> concerns.

My apologies, but in most cases there is more than one way to skin a cat.

Trust me, if removing the global interpreter lock was easy, or probably even
it was simply hard, it almost certainly would have been done by now.
Continuing to harp on this particular aspect of the CPython implementation
doesn't help.

Carl> I guess what really bothers me about this is how easily people
Carl> throw out "shut up and use C" for some things, especially things
Carl> that quite reasonably appear to be a silly limitation.

You completely misunderstand I think. People don't throw out "shut up and
use C" out of ignorance. In fact, I don't believe I've ever read a response
which took that tone. The practical matter is that there has so far been no
acceptable patch to CPython which gets rid of the global interpreter lock.
Extremely smart people have tried. More than once. If Guido knew then (20
years ago) what he knows now:

* that the chip manufacturers would have run out of clock speed
improvements for a few years and resorted to multi-core CPUs as a way
to make their computers "faster"

* that garbage collection algorithms would have improved as much as they
have in the past twenty years

I suspect he might well have considered garbage collection instead of
reference counting as a way to reclaim unreferenced memory and we might have
a GIL-less CPython implementation today.

Carl> Maybe you don't intend to sound like you're saying "shut up and
Carl> use C", but to me, that's how you come off. If you're going to
Carl> advise someone to use C, at least try to show some understanding
Carl> for their concerns--it would go a long way.

Then you haven't been listening. This topic comes up over and over and over
again. It's a well-known limitation of the implementation. Poking people
in the eye with it over and over doesn't help. The reasons for the
limitation are explained every time the topic is raised. In the absence of
a GIL-less CPython interpreter you are simply going to have to look
elsewhere for performance improvements I'm afraid. Yes, I'll drag out the
same old saws:

* code hot spots in C or C++

* use tools like Pyrex, Cython, Psyco or Shed Skin

* for array procesing, use numpy, preferably on top of a recent enough
version of Atlas which does transparent multi-threading under the
covers

* use multiple processes

* rewrite your code to use more efficient algorithms

I don't write those out of ignorance for your plight. It's just that if you
want a faster Python program today you're going to have to look elsewhere
for your speedups.

The Future of Python Threading	34	Aug 10, 2007
another thread on Python threading	9	Jun 3, 2007
Multi Threading embedded python	1	Jun 30, 2005
Embedding Python, threading and scalability	7	Jul 8, 2003
The future of Python immutability	50	Sep 3, 2009
Embedded python problem	0	Jul 3, 2006
Dr. Dobb's Python-URL! - weekly Python news and links (Dec 2)	97	Dec 2, 2005
Dr. Dobb's Python-URL! - weekly Python news and links (Nov 7)	4	Nov 7, 2006

Status of Python threading support (GIL removal)?

Jesse Noller

Aahz

Aahz

Aahz

Ross Ridge

Jure Erznožnik

Lie Ryan

Piet van Oostrum

Piet van Oostrum

Piet van Oostrum

Carl Banks

Carl Banks

Kay Schluehr

OdarR

skip

Stefan Behnel

Carl Banks

Jure Erznožnik

Jure Erznožnik

skip

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads