Status of Python threading support (GIL removal)?

J

Jure Erznožnik

See here for introduction:
http://groups.google.si/group/comp.lang.python/browse_thread/thread/370f8a1747f0fb91

Digging through my problem, I discovered Python isn't exactly thread
safe and to solve the issue, there's this Global Interpreter Lock
(GIL) in place.
Effectively, this causes the interpreter to utilize one core when
threading is not used and .95 of a core when threading is utilized.

Is there any work in progess on core Python modules that will
permanently resolve this issue?
Is there any other way to work around the issue aside from forking new
processes or using something else?
 
B

Ben Charrow

Jure said:
See here for introduction:
http://groups.google.si/group/comp.lang.python/browse_thread/thread/370f8a1747f0fb91

Digging through my problem, I discovered Python isn't exactly thread
safe and to solve the issue, there's this Global Interpreter Lock
(GIL) in place.
Effectively, this causes the interpreter to utilize one core when
threading is not used and .95 of a core when threading is utilized.

Is there any work in progess on core Python modules that will
permanently resolve this issue?
Is there any other way to work around the issue aside from forking new
processes or using something else?

There is a group of people working on an alternative implementation to Python
that, among other things, will not have a GIL:
http://code.google.com/p/unladen-swallow/

There was even a successful attempt to remove the GIL from CPython, but it
caused single threaded python code to be much slower. See more here:
http://www.python.org/doc/faq/library/#can-t-we-get-rid-of-the-global-interpreter-lock

Cheers,
Ben
 
M

Martin von Loewis

Digging through my problem, I discovered Python isn't exactly thread
safe and to solve the issue, there's this Global Interpreter Lock
(GIL) in place.

It's the opposite: Python is exactly thread safe precisely because it
has the GIL in place.
Is there any other way to work around the issue aside from forking new
processes or using something else?

If you know that your (C) code is thread safe on its own, you can
release the GIL around long-running algorithms, thus using as many
CPUs as you have available, in a single process.

Regards,
Martin
 
M

Martin von Loewis

Digging through my problem, I discovered Python isn't exactly thread
safe and to solve the issue, there's this Global Interpreter Lock
(GIL) in place.

It's the opposite: Python is exactly thread safe precisely because it
has the GIL in place.
Is there any other way to work around the issue aside from forking new
processes or using something else?

If you know that your (C) code is thread safe on its own, you can
release the GIL around long-running algorithms, thus using as many
CPUs as you have available, in a single process.

Regards,
Martin
 
O

OdarR

See here for introduction:http://groups.google.si/group/comp.lang.python/browse_thread/thread/3...

Digging through my problem, I discovered Python isn't exactly thread
safe and to solve the issue, there's this Global Interpreter Lock
(GIL) in place.
Effectively, this causes the interpreter to utilize one core when
threading is not used and .95 of a core when threading is utilized.

Is there any work in progess on core Python modules that will
permanently resolve this issue?
Is there any other way to work around the issue aside from forking new
processes or using something else?

hi,

please read this carefully,
<http://www.ibm.com/developerworks/aix/library/au-multiprocessing/
index.html?ca=dgr-lnxw07Python-
Multi&S_TACT=105AGX59&S_CMP=grsitelnxw07>

there is a solution for Python on multi-core : multiprocessing api.
Really nice.
<http://docs.python.org/library/functions.html>
Keep real threads for common tasks like network stuff for example.

I recently complained too :)
<http://groups.google.com/group/comp.lang.python/browse_frm/thread/
dbe0836d9602f322#>


Olivier
 
S

Stefan Behnel

Jure said:
See here for introduction:
http://groups.google.si/group/comp.lang.python/browse_thread/thread/370f8a1747f0fb91

Digging through my problem, I discovered Python isn't exactly thread
safe and to solve the issue, there's this Global Interpreter Lock
(GIL) in place.
Effectively, this causes the interpreter to utilize one core when
threading is not used and .95 of a core when threading is utilized.

You might want to read about "The Problem with Threads":

http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf

and then decide to switch to an appropriate concurrency model for your use
case.

Stefan
 
T

Terry Reedy

Jure said:
See here for introduction:
http://groups.google.si/group/comp.lang.python/browse_thread/thread/370f8a1747f0fb91

Digging through my problem, I discovered Python isn't exactly thread
safe and to solve the issue, there's this Global Interpreter Lock
(GIL) in place.
Effectively, this causes the interpreter to utilize one core when
threading is not used and .95 of a core when threading is utilized.

Python does not have (or not have) GIL.
It is an implementation issue.
CPython uses it, to good effect.
Is there any work in progess on core Python modules that will
permanently resolve this issue?
Is there any other way to work around the issue aside from forking new
processes or using something else?

Use one of the other implementations.
Jython, IronPython, Pypy, ???
 
O

OdarR

If you know that your (C) code is thread safe on its own, you can
release the GIL around long-running algorithms, thus using as many
CPUs as you have available, in a single process.

what do you mean ?

Cpython can't benefit from multi-core without multiple processes.

Olivier
 
S

skip

Olivier> what do you mean ?

Olivier> Cpython can't benefit from multi-core without multiple
Olivier> processes.

It can, precisely as Martin indicated. Only one thread at a time can hold
the GIL. That doesn't mean that multiple threads can't execute. Suppose
you have two threads, one of which winds up executing some bit of C code
which doesn't mess with the Python run-time at all (say, a matrix multiply).
Before launching into the matrix multiply, the extension module releases the
GIL then performs the multiply. With the GIL released another thread can
acquire it. Once the multiply finishes the first thread needs to reacquire
the GIL before executing any calls into the Python runtime or returning.
 
O

OdarR

    Olivier> what do you mean ?

    Olivier> Cpython can't benefit from multi-core without multiple
    Olivier> processes.

It can, precisely as Martin indicated.  Only one thread at a time can hold
the GIL.  That doesn't mean that multiple threads can't execute.  Suppose

I don't say multiple threads can't execute....(?).
I say that with the Python library, I don't see (yet) benefit with
multiple threads *on* multiple CPU/core.

Ever seen this recent video/presentation ? :
http://blip.tv/file/2232410
http://www.dabeaz.com/python/GIL.pdf
you have two threads, one of which winds up executing some bit of C code
which doesn't mess with the Python run-time at all (say, a matrix multiply).

I don't know how to do that with common Python operations...
Only one thread will be really running at a time in memory (meanwhile
other thread are waiting).
Are you refering to a specialized code ?
Before launching into the matrix multiply, the extension module releases the
GIL then performs the multiply.  With the GIL released another thread can
acquire it.  Once the multiply finishes the first thread needs to reacquire
the GIL before executing any calls into the Python runtime or returning.

I don't see such improvement in the Python library, or maybe you can
indicate us some meaningfull example...?

I currently only use CPython, with PIL, Reportlab...etc.
I don't see improvement on a Core2duo CPU and Python. How to proceed
(following what you wrote) ?

A contrario, I saw *real* improvement on parallel computing with the
Py 2.6 multiprocessing module.

Olivier
 
C

Carl Banks

There is a group of people working on an alternative implementation to Python
that, among other things, will not have a GIL:http://code.google.com/p/unladen-swallow/


That's not a foregone conclusion. Well it's not a foregone conclusion
that unladen-swallow will succeed at all, but even if it does they
only say they intend to remove the GIL, not that they necessarily
will.

The GIL actually "solves" two problems: the overhead of synchronizing
reference counts, and the difficulty of writing threaded extensions.
The unladen-swallow team only address the first problem in their
plans. So, even if they do remove the GIL, I doubt GvR will allow it
to be merged back into CPython unless makes extensions are just as
easy to write. That is something I have serious doubts they can pull
off.

Which means a GIL-less unladen-swallow is likely to end being another
fork like IronPython and Jython. Those projects already have no GIL.


Carl Banks
 
C

Carl Banks

I don't say multiple threads can't execute....(?).
I say that with the Python library, I don't see (yet) benefit with
multiple threads *on* multiple CPU/core.


He's saying that if your code involves extensions written in C that
release the GIL, the C thread can run on a different core than the
Python-thread at the same time. The GIL is only required for Python
code, and C code that uses the Python API. C code that spends a big
hunk of time not using any Python API (like, as Skip pointed out, a
matrix multiply) can release the GIL and the thread can run on a
different core at the same time.

I always found to be a *terribly* weak rationalization. The fact is,
few Python developers can take much advantage of this.

(Note: I'm not talking about releasing the GIL for I/O operations,
it's not the same thing. I'm talking about the ability to run
computations on multiple cores at the same time, not to block in 50
threads at the same time. Multiple cores aren't going to help that
much in the latter case.)

I wish Pythonistas would be more willing to acknowledge the (few)
drawbacks of the language (or implementation, in this case) instead of
all this rationalization. It's like people here in Los Angeles who
complain about overcast days. What, 330 days of sunshine not enough?
Jesus. I wish people would just say, "This is a limitation of
CPython. There are reasons why it's there, and it helps some people,
but unfortunately it has drawbacks for others", instead of the typical
"all u hav 2 do is rite it in C LOL".


Carl Banks
 
J

Jure Erznožnik

Thanks guys, for all the replies.
They were some very interesting reading / watching.

Seems to me, the Unladen-Swallow might in time produce code which will
have this problem lessened a bit. Their roadmap suggests at least
modifying the GIL principles if not fully removing it. On top of this,
they seem to have a pretty aggressive schedule with good results
expected by Q3 this year. I'm hoping that their patches will be
accepted to cPython codebase in a timely manner. I definitely liket
the speed improvements they showed for Q1 modifications. Though those
improvements don't help my case yet...

The presentation from mr. Beasley was hilarious :D
I find it curious to learn that just simple replacement from events to
actual mutexes already lessens the problem a lot. This should already
be implemented in the cPython codebase IMHO.

As for multiprocessing alternatives, I'll have to look into them. I
haven't yet done multiprocessing code and don't really know what will
happen when I try. I believe that threads would be much more
appropriate for my project, but it's definitely worth a shot. Since my
project is supposed to be cross platform, I'm not really looking
forward to learning cross platform for C++. All my C++ experience is
DOS + Windows derivatives till now :(
 
O

OdarR

I've seen a single Python process using the full capacity of up to 8
CPUs. The application is making heavy use of lxml for large XSL
transformations, a database adapter and my own image processing library
based upon FreeImage.
interesting...

Of course both lxml and my library are written with the GIL in mind.
They release the GIL around every call to C libraries that don't touch
Python objects. PIL releases the lock around ops as well (although it
took me a while to figure it out because PIL uses its own API instead of
the standard macros). reportlab has some optional C libraries that
increase the speed, too. Are you using them?

I don't. Or maybe I did, but I have no clue what to test.
Do you have a real example, some code snippet to can prove/show
activity on multiple core ?
I accept your explanation, but I also like experiencing :)
By the way threads are evil
(http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf) and
not *the* answer to concurrency.

I don't see threads as evil from my little experience on the subject,
but we need them.
I'm reading what's happening in the java world too, it can be
interesting.

Olivier
 
J

Jure Erznožnik

Sorry, just a few more thoughts:

Does anybody know why GIL can't be made more atomic? I mean, use
different locks for different parts of code?
This way there would be way less blocking and the plugin interface
could remain the same (the interpreter would know what lock it used
for the plugin, so the actual function for releasing / reacquiring the
lock could remain the same)
On second thought, forget this. This is probably exactly the cause of
free-threading reduced performance. Fine-graining the locks increased
the lock count and their implementation is rather slow per se. Strange
that *nix variants don't have InterlockedExchange, probably because
they aren't x86 specific. I find it strange that other architectures
wouldn't have these instructions though... Also, an OS should still be
able to support such a function even if underlying architecture
doesn't have it. After all, a kernel knows what it's currently running
and they are typically not preempted themselves.

Also, a beside question: why does python so like to use events instead
of "true" synchronization objects? Almost every library I looked at
used that. IMHO that's quite irrational. Using objects that are
intended for something else for the job while there are plenty of
"true" options supported in every OS out there.

Still, the free-threading mod could still work just fine if there was
just one more global variable added: current python thread count. A
simple check for value greater than 1 would trigger the
synchronization code, while having just one thread would introduce no
locking at all. Still, I didn't like the performance figures of the
mod (0.6 execution speed, pretty bad core / processor scaling)

I don't know why it's so hard to do simple locking just for writes to
globals. I used to do it massively and it always worked almost with no
penalty at all. It's true that those were all Windows programs, using
critical sections.
 
O

OdarR

He's saying that if your code involves extensions written in C that
release the GIL, the C thread can run on a different core than the
Python-thread at the same time.  The GIL is only required for Python
code, and C code that uses the Python API.  C code that spends a big
hunk of time not using any Python API (like, as Skip pointed out, a
matrix multiply) can release the GIL and the thread can run on a
different core at the same time.

I understand the idea, even if I don't see any examples in the
standard library.
any examples ?
(Note: I'm not talking about releasing the GIL for I/O operations,
it's not the same thing.  I'm talking about the ability to run
computations on multiple cores at the same time, not to block in 50
threads at the same time.  Multiple cores aren't going to help that
much in the latter case.)

yes, I also speak about hard computation that could benefit with
multiple cores.

I wish Pythonistas would be more willing to acknowledge the (few)
drawbacks of the language (or implementation, in this case) instead of
all this rationalization.  It's like people here in Los Angeles who
complain about overcast days.  What, 330 days of sunshine not enough?
Jesus.  I wish people would just say, "This is a limitation of
CPython.  There are reasons why it's there, and it helps some people,
but unfortunately it has drawbacks for others", instead of the typical
"all u hav 2 do is rite it in C LOL".

"LOL"
I would like to say such thing about my weather...I live in Europe in
a rainy country.

Olivier
 
J

Jesse Noller

what do you mean ?

Cpython can't benefit from multi-core without multiple processes.

Olivier

Sorry, you're incorrect. I/O Bound threads do in fact, take advantage
of multiple cores.
 
J

Jure Erznožnik

I don't. Or maybe I did, but I have no clue what to test.
Do you have a real example, some code snippet to can prove/show
activity on multiple core ?
I accept your explanation, but I also like experiencing :)


I don't see threads as evil from my little experience on the subject,
but we need them.
I'm reading what's happening in the java world too, it can be
interesting.

Olivier

Olivier,
What Christian is saying is that you can write a C/C++ Python plugin,
release the GIL inside it and then process stuff in threads inside the
plugin.
All this is possible if the progammer doesn't use any Python objects
and it's fairly easy to write such a plugin. Any counting example will
do just fine.

The problem with this solution is that you have to write the code in C
which quite defeats the purpose of using an interpreter in the first
place...
Of course, no pure python code will currently utilize multiple cores
(because of GIL).

I do aggree though that threading is important. Regardless of any
studies showing that threads suck, they are here and they offer
relatively simple concurrency. IMHO they should never have been
crippled like this. Even though GIL solves access violations, it's not
the right approach. It simply kills all threading benefits except for
the situation where you work with multiple I/O blocking threads.
That's just about the only situation where this problem is not
apparent.

We're way past single processor single core computers now. An
important product like Python should support these architectures
properly even if only 1% of applications written in it use threading.

But as Guido himself said; I should not complain but instead try to
contribute to solution. That's the hard part, especially since there's
lots of code that actually need the locking.
 
J

Jure Erznožnik

Sorry, you're incorrect. I/O Bound threads do in fact, take advantage
of multiple cores.

Incorrect. They take advantage of OS threading support where another
thread can run while one is blocked for I/O.
That is not equal to running on multiple cores (though it actually
does do that, just that cores are all not well utilized - sum(x) <
100% of one core).
You wil get better performance running on single core because of the
way GIL is implemented in all cases.
 
P

Paul Boddie

(Note: I'm not talking about releasing the GIL for I/O operations,
it's not the same thing.  I'm talking about the ability to run
computations on multiple cores at the same time, not to block in 50
threads at the same time.  Multiple cores aren't going to help that
much in the latter case.)

There seems to be a mixing together of these two things when people
talk about "concurrency". Indeed, on the concurrency-sig mailing list
[1] there's already been discussion about whether a particular example
[2] is really a good showcase of concurrency. According to Wikipedia,
concurrency is about "computations [...] executing
simultaneously" [3], not about whether one can handle hundreds of
communications channels sequentially, although this topic is obviously
relevant when dealing with communications between processing contexts.

I agree with the over-rationalisation assessment: it's not convenient
(let alone an advantage) for people to have to switch to C so that
they can release the GIL, nor is it any comfort that CPython's
limitations are "acceptable" for the socket multiplexing server style
of solution when that isn't the kind of solution being developed.
However, there are some reasonable tools out there (and viable
alternative implementations), and I'm optimistic that the situation
will only improve.

Paul

[1] http://mail.python.org/mailman/listinfo/concurrency-sig
[2] http://wiki.python.org/moin/Concurrency/99Bottles
[3] http://en.wikipedia.org/wiki/Concurrency_(computer_science)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,022
Latest member
MaybelleMa

Latest Threads

Top