Python, multithreading & GIL

I

Ivan Voras

I've read articles about it but I'm not sure I've got everything right. Here
are some statements about the subject that I'm not 100% sure about:

- when interpreter (cpython) is compiled with pthreads, python programs can
make use of multiple processors (other statements below are for
cpython+pthreads environment)?
- the GIL is only placed on global variables (and makes access to global
variables essentially serialized)? (--> if I don't use global variables, I'm
free from GIL?)
- python can make use of multiple IO accessess across threads: if one thread
does file.read(), others are not blocked by it?
- only one thread can do IO access: if one thread does file.read(), others
cannot (they wait until the 1st read() call ends)?
- all of the above stays the same for network IO (socket.read())?
- all of the above is true for any call to a C function?

Can someone say which statements are true, which are false (and an
explanation of what is more correct :) )?

Thanks!
 
D

Donn Cave

Ivan Voras <[email protected]> said:
I've read articles about it but I'm not sure I've got everything right. Here
are some statements about the subject that I'm not 100% sure about:

- when interpreter (cpython) is compiled with pthreads, python programs can
make use of multiple processors (other statements below are for
cpython+pthreads environment)?

Depends on what you mean by "make use of".
- the GIL is only placed on global variables (and makes access to global
variables essentially serialized)? (--> if I don't use global variables, I'm
free from GIL?)

No, it's a global variable that serializes thread execution.
- python can make use of multiple IO accessess across threads: if one thread
does file.read(), others are not blocked by it?

Yes, because file.read releases the lock before it calls its
underlying C function (and acquires it again afterwards before
proceeding.)
- all of the above stays the same for network IO (socket.read())?
Yes.

- all of the above is true for any call to a C function?

No, C function interfaces are not required to release the lock,
and in fact might reasonably elect not to. For example, a function
that does some trivial computation, like peeking at some value in
library state, would incur a lot of unnecessary overhead by releasing
the lock. Other interfaces might neglect to release the lock just
because the author didn't care about it.

Donn Cave, (e-mail address removed)
 
I

Ivan Voras

Donn said:
Depends on what you mean by "make use of".

"Simultaneusly execute different threads on different processors". I
mean all kinds of threads: IO-based and computation-based.
No, it's a global variable that serializes thread execution.

Now I'm puzzled - how is that different from GIL?

For example: if I have two or more threads that do numerical and string
computations not involving global variables, will they execute without
unexpected locking?
 
J

Jarek Zgoda

Ivan Voras said:
Now I'm puzzled - how is that different from GIL?

For example: if I have two or more threads that do numerical and string
computations not involving global variables, will they execute without
unexpected locking?

I think it's a best time to write some definite documet, how GIL can
affect our programs and how to avoid headaches when using threading with
Python. I know what this acronym (GIL) means, I know the definition, but
I have very limited knowledge on threading issues in languages, that use
VM environments.

Anyone?
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Ivan said:
"Simultaneusly execute different threads on different processors". I
mean all kinds of threads: IO-based and computation-based.

In Python, no two threads will ever simultaneously interprete byte code
instructions.

It might be that two threads started in Python simultaneously execute
non-Python code (like a C extension), or that one thread blocks in IO
and the other executes byte code. However, once one thread executes
Python byte code, no other thread in the same process will do so.
Now I'm puzzled - how is that different from GIL?

That is the GIL: a global variable that serializes thread execution.

However, it is *not* *only* placed on global variables. It is placed
on any kind of byte code, and data access, with the few exceptions
of long-running C code. So if you have two functions

def thread1():
while 1:pass

def thread2():
while 1:pass

and you run them in two seperate threads, you will *not* be free from
the GIL. Both loops hold the GIL while executing, and give it up every
100 or so byte code instructions.
For example: if I have two or more threads that do numerical and string
computations not involving global variables, will they execute without
unexpected locking?

Depends on what you expect. There will be locking, and the threads will
not use two processors effectively (i.e. you typically won't see any
speedup from multiple processors if your computation is written in
Python)

Regards,
Martin
 
I

Ivan Voras

Martin v. Löwis wrote:

In Python, no two threads will ever simultaneously interprete byte code
instructions.

It might be that two threads started in Python simultaneously execute
non-Python code (like a C extension), or that one thread blocks in IO
and the other executes byte code. However, once one thread executes
Python byte code, no other thread in the same process will do so.

Thank you, your post gave the sort of answers I was looking for. :)
 
D

Donn Cave

"Simultaneusly execute different threads on different processors". I
mean all kinds of threads: IO-based and computation-based.

I'm sorry, when I wrote that I expected that I would be saying
more about that later on, but maybe I went over that a little
too light and I didn't mean to be cryptic. My point is that
a Python program is (at least) two layers: the part actually
written in Python, and the part written in C - modules written
in C and all the library functions they call. A Python program
can make use of multiple processors. Only one processor can
actually be executing the interpreter, but the interpreter may
be calling an external cryptography function in another thread,
and listening to a socket in another, etc.

I/O is a natural case of this - Python can't do any kind of I/O
on its own, so we can reasonably expect concurrent I/O. Computation
depends.
Now I'm puzzled - how is that different from GIL?

I meant, the GIL isn't placed on global variables, but it is one.
For example: if I have two or more threads that do numerical and string
computations not involving global variables, will they execute without
unexpected locking?

They will execute serially. I believe the interpreter schedules
threads for some number of instructions, so each thread won't
have to run to completion before the next one can execute - they'll
all probably finish about the same time - but there will be only
one interpreter thread executing at any time.

This has been known to bother people, and some years back a very
capable programmer on the Python scene at the time tried to fix
it with a version of Python that was `free threaded.' I think
the reason it's not the version of Python we're using today is
1. It's a hard problem, and
2. It doesn't make that much practical difference.

That's my opinion, anyway. There are a few lengthy discussions
of the matter in the comp.lang.python archives, for anyone who
wants to see more opinions.

Donn Cave, (e-mail address removed)
 
P

Project2501

surely there is a case for a python VM/interpreter to be able to handle
threads without GIL. That is, map them to whatever underlying OS
facilities are available, and only of they are not available, do bytecode
interleaving. After all, python relies on OS facilities for many other
tasks.
 
D

Donn Cave

Project2501 said:
surely there is a case for a python VM/interpreter to be able to handle
threads without GIL. That is, map them to whatever underlying OS
facilities are available, and only of they are not available, do bytecode
interleaving. After all, python relies on OS facilities for many other
tasks.

Python definitely provides meaningful support for several
types of operating systems threads, including POSIX, and
the way I understand you, it does what you say. It's not
like the way some other interpreted languages (or Python
has in some version of Stackless) implement threads inside
a single OS interpreter thread, these are real OS threads
and your Python code (i.e., the interpreter) runs "in" them.

It's just that part of the support for concurrency is a
lock that protects Python internal data structures from
unsound concurrent access. That's the reason for the GIL.

And as I asserted, it isn't a significant problem in practice.

Donn Cave, (e-mail address removed)
 
I

Ivan Voras

Donn said:
It's just that part of the support for concurrency is a
lock that protects Python internal data structures from
unsound concurrent access. That's the reason for the GIL.

And as I asserted, it isn't a significant problem in practice.

Except if you're planning for multiple processors :(
 
D

Donn Cave

Quoth Ivan Voras <ivoras@__geri.cc.fer.hr>:
| Donn Cave wrote:
|
| > It's just that part of the support for concurrency is a
| > lock that protects Python internal data structures from
| > unsound concurrent access. That's the reason for the GIL.
| >
| > And as I asserted, it isn't a significant problem in practice.
|
| Except if you're planning for multiple processors :(

Usually even then. Most applications with a really serious
computational load will implement the compute-intensive parts
in C, as a Python module (or will use an existing module.)
The ones that will implement that part in pure Python, as
part of a multithreaded architecture that relies on SMP hardware,
are very few. It wouldn't be a good idea even if it worked.

Donn
 
R

Roger Binns

Ivan said:
Except if you're planning for multiple processors :(

To better illustrate this, when you write C code that interfaces
with Python, it looks like this example from my libusb wrapper:

Py_BEGIN_ALLOW_THREADS
res=usb_bulk_read(dev, ep, bytesoutbuffer, *bytesoutbuffersize, timeout);
Py_END_ALLOW_THREADS

Any C code between BEGIN_ALLOW_THREADS and END_ALLOW_THREADS can
run concurrently with any other code meeting the same criteria.
This typically includes most forms of I/O, networking, operating
system access etc. Consequently code that uses that a lot
scales to multiple processors (assuming your OS scales).

You can do the BEGIN/END threads thing in any C extensions you need.
In practise this is good enough for most people. Their Python code
doesn't spend much time processing. And if they did have something
that did a time consuming calculation (eg complex crypto), they are
likely to have it in a C extension, or move it into a seperate process
(eg that is what a database is :)

Worst case code would be this as the body of each thread:

while True: pass

It would not improve no matter how many processors you have.
You would need to scale that by splitting your program into
multiple processes. That then also has the benefit that
you could put the processes on multiple machines (assuming you
use TCP to connect them) and scale away.

Roger
 
C

Carl Banks

Donn said:
Quoth Ivan Voras <ivoras@__geri.cc.fer.hr>:
| Donn Cave wrote:
|
| > It's just that part of the support for concurrency is a
| > lock that protects Python internal data structures from
| > unsound concurrent access. That's the reason for the GIL.
| >
| > And as I asserted, it isn't a significant problem in practice.
|
| Except if you're planning for multiple processors :(

Usually even then. Most applications with a really serious
computational load

You don't know if his application has a serious computational load.

will implement the compute-intensive parts
in C, as a Python module (or will use an existing module.)
The ones that will implement that part in pure Python, as
part of a multithreaded architecture that relies on SMP hardware,
are very few.

I highly disagree. It's reasonable to want a multi-threaded, pure
Python program to run faster with multiple processors, and without
having to rewrite the thing in C. The GIL limits the ability of pure
Python to take advantage of SMP, and that's a definite flaw in Python.

It wouldn't be a good idea even if it worked.

Why?
 
S

Simon Burton

I've read articles about it but I'm not sure I've got everything right.
Here are some statements about the subject that I'm not 100% sure about:

- when interpreter (cpython) is compiled with pthreads, python programs
can make use of multiple processors (other statements below are for
cpython+pthreads environment)?

Not really.
- the GIL is only placed on global variables (and makes access to global
variables essentially serialized)? (--> if I don't use global variables,
I'm free from GIL?)

No. By "Global" we mean "everything".

Simon.
 
I

Ivan Voras

Carl said:
You don't know if his application has a serious computational load.

Depends on what you mean by computing - in my case it's not bare number
crunching but the stuff python is good at and convenient to use, mostly
string manipulation.

I highly disagree. It's reasonable to want a multi-threaded, pure
Python program to run faster with multiple processors, and without
having to rewrite the thing in C. The GIL limits the ability of pure
Python to take advantage of SMP, and that's a definite flaw in Python.

I agree :)
But now, looking at some other scripting languages, I don't see any that
claim to be able to do what we're discussing here. Does anybody know of a
scripting language good at "string crunching" that can exploit SMP with
threading?

ObNote: forking is another way, but very inconvenient...
 
D

Donn Cave

You don't know if his application has a serious computational load.

I don't intend to guess at what his application is about,
but that's the only case I ever hear about where it even
theoretically matters. An application with a trivial
computational aspect will run more or less concurrently.
I highly disagree. It's reasonable to want a multi-threaded, pure
Python program to run faster with multiple processors, and without
having to rewrite the thing in C. The GIL limits the ability of pure
Python to take advantage of SMP, and that's a definite flaw in Python.



Why?

Because it would still be slow.

I'm not arguing that the GIL is a feature, though there may
be a weak case for that (I've had pretty good luck with my
Python programs in a multithreaded system that is supposed
to be a big headache for C++ application programmers, and
I've wondered if the modest amount of extra serialization
Python imposes is actually helping me out there. But I haven't
worked that idea out, because - it doesn't matter, this issue
isn't going anywhere regardless.)

I'm not arguing that no one cares at all, or that it's
unreasonable to wish for it. I'm saying that the need for
free threading doesn't add up to enough motivation for anyone
to take on the very hairy task of a implementing it. (Greg
Stein being the exception that proves the rule - he did
implement it, and we still have a GIL.)

I don't know how the advent of Python compilation options will
change this. Obviously it makes Python more attractive for
compute intensive work, but ... can the compilers use "safe"
data structures so you can run unlocked?

For the sidebar, ocaml has the same system - works with native
OS threads if built that way, but protects itself with a global
lock. Not a global interpreter lock, because this is compiled
code, not interpreted, but still there are data structures.
I happened to be reading a Linux man page for pthread mutexes,
and Xavier Leroy's name appeared at the bottom - one of the
implementors of ocaml, I believe. I'd be interested to hear
about other languages' support for free threading.

Donn Cave, (e-mail address removed)
 
I

Ivan Voras

Roger said:
Ivan Voras wrote:


To better illustrate this, when you write C code that interfaces
with Python, it looks like this example from my libusb wrapper:

Py_BEGIN_ALLOW_THREADS
res=usb_bulk_read(dev, ep, bytesoutbuffer, *bytesoutbuffersize, timeout);
Py_END_ALLOW_THREADS

Any C code between BEGIN_ALLOW_THREADS and END_ALLOW_THREADS can
run concurrently with any other code meeting the same criteria.
This typically includes most forms of I/O, networking, operating
system access etc. Consequently code that uses that a lot
scales to multiple processors (assuming your OS scales).

Thanks, this clarifies a lot :)

So, during the usb_bulk_read() call above, python can and will execute
another pure-python (or a similary mixed-C code) thread if such is available?
 
J

Jeff Epler

I'm not arguing that the GIL is a feature, though there may
be a weak case for that (I've had pretty good luck with my
Python programs in a multithreaded system that is supposed
to be a big headache for C++ application programmers, and
I've wondered if the modest amount of extra serialization
Python imposes is actually helping me out there. But I haven't
worked that idea out, because - it doesn't matter, this issue
isn't going anywhere regardless.)

I have to relate this story:

The application I work on recently switched from C to "C compiled by a
C++ compiler, plus a little bit of C++ code". Basically, this sucks.
Anyway, we've started to use parts of Boost, and I was excited to learn
that Boost has a counted-pointer implementation.

The simplest program I decide to try was to create and destroy a
collection of a reference-counted object (only one C instance is
created, and each of the 2^22 elements in the container is a reference
or pointer to that object). In Python, this looked
like so:
class C(object): pass
v = [C()] * (1<<22)
and in C++ with boost:
#include <boost/shared_ptr.hpp>
#include <vector>
class C { };

int main(void) {
boost::shared_ptr<C> p(new C);
std::vector<boost::shared_ptr<C> > v((1<<22), p);
}

The C++ program consumes 35 megs and runs in 3.7 seconds, the Python
program runs in .5 seconds and uses 22 megs. The Python program runs
just fine with a list of size 1<<25, but boost can't handle it.
If I compile the C++ program without support for threads, that at least
trims the runtime to 1.5 seconds.

The relevant detail here (oh, are you still reading?) is that making all
those reference counts threadsafe in boost more than doubled runtime.
Python does a *lot* of refcount modification!

Jeff
 
A

Alan Kennedy

[Carl Banks]
[Ivan Voras]
> I agree :)
> But now, looking at some other scripting languages, I don't see any
> that claim to be able to do what we're discussing here. Does anybody
> know of a scripting language good at "string crunching" that can
> exploit SMP with threading?

http://www.jython.org
 
R

Roger Binns

Ivan said:
So, during the usb_bulk_read() call above, python can and will execute
another pure-python (or a similary mixed-C code) thread if such is available?

Yes. The call to Py_BEGIN_ALLOW_THREADS releases the GIL and the call to
Py_END_ALLOW_THREADS claims it again. Only one thread at a time can
own the GIL.

The Python interpreter itself will continuously execute bytecode in
one thread until sys.getcheckinterval() bytecode instructions have
been executed, at which point it can switch to another eligible
interpretter thread.

I did see mention in one of these groups about how someone did try
replacing the GIL with finer grained locking, and it actually performed
noticably worse.

Roger
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top