More than one interpreter per process?

S

sturlamolden

Python has a GIL that impairs scalability on computers with more than
one processor. The problem seems to be that there is only one GIL per
process. Solutions to removing the GIL has always stranded on the need
for 'fine grained locking' on reference counts. I believe there is a
second way, which has been overlooked: Having one GIL per interpreter
instead of one GIL per process.

Currently, the Python C API - as I understand it - only allows for a
single interpreter per process. Here is how Python would be embedded
in a multi-threaded C program today, with the GIL shared among the C
threads:

#include <windows.h>
#include <Python.h>
#include <process.h>

void threadproc(void *data)
{
/* create a thread state for this thread */
PyThreadState *mainstate = NULL;
mainstate = PyThreadState_Get();
PyThreadState *threadstate = PyThreadState_New(mainstate);
PyEval_ReleaseLock();

/* swap this thread in, do whatever we need */
PyEval_AcquireLock();
PyThreadState_Swap(threadstate);
PyRun_SimpleString("print 'Hello World1'\n");
PyThreadState_Swap(NULL);
PyEval_ReleaseLock();

/* clear thread state for this thread */
PyEval_AcquireLock();
PyThreadState_Swap(NULL);
PyThreadState_Clear(threadstate);
PyThreadState_Delete(threadstate);
PyEval_ReleaseLock();

/* tell Windows this thread is done */
_endthread();
}

int main(int argc, char *argv[])
{
HANDLE t1, t2, t3;
Py_Initialize();
PyEval_InitThreads();
t1 = _beginthread(threadproc, 0, NULL);
t2 = _beginthread(threadproc, 0, NULL);
t3 = _beginthread(threadproc, 0, NULL);
WaitForMultipleObjects(3, {t1, t2, t3}, TRUE, INFINITE);
Py_Finalize();
return 0;
}

In the Java native interface (JNI) all functions take an en
environment variable for the VM. The same thing could be done for
Python, with the VM including GIL encapsulated in a single object:

#include <windows.h>
#include <Python.h>
#include <process.h>

void threadproc(void *data)
{
PyVM *vm = Py_Initialize(); /* create a new interpreter */
PyRun_SimpleString(vm, "print 'Hello World1'\n");
Py_Finalize(vm);
_endthread();
}

int main(int argc, char *argv[])
{
HANDLE t1 = _beginthread(threadproc, 0, NULL);
HANDLE t2 = _beginthread(threadproc, 0, NULL);
HANDLE t3 = _beginthread(threadproc, 0, NULL);
WaitForMultipleObjects(3, {t1, t2, t3}, TRUE, INFINITE);
return 0;
}

Doesn't that look a lot nicer?

If one can have more than one interpreter in a single process, it is
possible to create a pool of them and implement concurrent programming
paradigms such as 'forkjoin' (to appear in Java 7, already in C# 3.0).
It would be possible to emulate a fork on platforms not supporting a
native fork(), such as Windows. Perl does this in 'perlfork'. This
would deal with the GIL issue on computers with more than one CPU.

One could actually use ctypes to embed a pool of Python interpreters
in a process already running Python.

Most of the conversion of the current Python C API could be automated.
Python would also need to be linked against a multi-threaded version
of the C library.
 
M

Michael L Torrie

sturlamolden said:
Python has a GIL that impairs scalability on computers with more than
one processor. The problem seems to be that there is only one GIL per
process. Solutions to removing the GIL has always stranded on the need
for 'fine grained locking' on reference counts. I believe there is a
second way, which has been overlooked: Having one GIL per interpreter
instead of one GIL per process.

How would this handle python resources that a programmer would want to
share among the threads? What facilities for IPC between the
interpreters would be used?
 
R

Roger Binns

sturlamolden said:
If one can have more than one interpreter in a single process,

You can. Have a look at mod_python and mod_wsgi which does exactly
this. But extension modules that use the simplified GIL api don't work
with them (well, if at all).
Most of the conversion of the current Python C API could be automated.

The biggest stumbling block is what to do when the external environment
makes a new thread and then eventually calls back into Python. It is
hard to know which interpretter that callback should go to.

You are also asking for every extension module to have to be changed.
The vast majority are not part of the Python source tree and would also
have to support the versions before a change like this.

You would have more luck getting this sort of change into Python 3 since
that requires most extension modules to be modified a bit (eg to deal
with string and unicode issues).

But before doing that, why not show how much better your scheme would
make things. The costs of doing it are understood, but what are the
benefits in terms of cpu consumption, memory consumption, OS
responsiveness, cache utilisation, multi-core utilisation etc. If the
answer is 1% then that is noise.

Roger
 
S

sturlamolden

How would this handle python resources that a programmer would want to
share among the threads? What facilities for IPC between the
interpreters would be used?

There would be no IPC as they would live in the same process. A thread-
safe queue would suffice. Python objects would have to be serialized
before placed in the queue. One could also allow NumPy-like arrays in
two separate interpreters to share the same memory buffer.

With an emulated fork() the whole interpreter would be cloned,
possibly deferred in a 'copy on write' scheme.

Multiple processes and IPC is what we have today with e.g. mpi4py.
 
S

sturlamolden

The biggest stumbling block is what to do when the external environment
makes a new thread and then eventually calls back into Python. It is
hard to know which interpretter that callback should go to.

Not if you explicitely hav to pass a pointer to the interpreter in
every API call, which is what I suggested.

You are also asking for every extension module to have to be changed.
The vast majority are not part of the Python source tree and would also
have to support the versions before a change like this.

It would break a lot of stuff.

But porting could be automated by a simple Python script. It just
involves changing PySomething(...) to PySomething(env, ...), with env
being a pointer to the interpreter. Since an extension only needs to
know about a single interpreter, it could possibly be done by
preprocessor macros:

#define PySomething(var) PySomething(env, var)
You would have more luck getting this sort of change into Python 3 since
that requires most extension modules to be modified a bit (eg to deal
with string and unicode issues).

PEPs are closed for Python 3.
 
S

sturlamolden

You can. Have a look at mod_python and mod_wsgi which does exactly
this. But extension modules that use the simplified GIL api don't work
with them (well, if at all).

mod_python implements use Py_NewInterpreter() to create sub-
interpreters. They all share the same GIL. The GIL is declared static
in ceval.c, and shared for the whole process. But ok, if
PyEval_AquireLock() would take a pointer to a 'sub-GIL', sub-
interpreters could run concurrent on SMPs. But it would require a
separate thread scheduler in each subprocess.
 
A

Aahz

PEPs are closed for Python 3.

That's true for core interpreter changes and only applies to Python 3.0.
Overall, what Roger said is true: if there is any hope for your proposal,
you must ensure that you can make it happen in Python 3.
 
G

Graham Dumpleton

mod_python implements use Py_NewInterpreter() to create sub-
interpreters. They all share the same GIL. The GIL is declared static
in ceval.c, and shared for the whole process. But ok, if
PyEval_AquireLock() would take a pointer to a 'sub-GIL', sub-
interpreters could run concurrent on SMPs. But it would require a
separate thread scheduler in each subprocess.

In current versions of Python it is possible for multiple sub
interpreters to access the same instance of a Python object which is
notionally independent of any particular interpreter. In other words,
sharing of objects exists between sub interpreters. If you remove the
global GIL and make it per sub interpreter then you would loose this
ability. This may have an impact of some third party C extension
modules, or in embedded systems, which are able to cache simple Python
data objects for use in multiple sub interpreters so that memory usage
is reduced.

Graham
 
G

Graham Dumpleton

You can. Have a look at mod_python andmod_wsgiwhich does exactly
this. But extension modules that use the simplified GIL api don't work
with them (well, if at all).

When using mod_wsgi there is no problem with C extension modules which
use simplified GIL API provided that one configures mod_wsgi to
delegate that specific application to run in the context of the first
interpreter instance created by Python.

In theory the same should be the case for mod_python but there is
currently a bug in the way that mod_python works such that some C
extension modules using simplified API for the GIL still don't work
even when made to run in first interpreter.

Graham
 
R

Roger Binns

sturlamolden said:
Not if you explicitely hav to pass a pointer to the interpreter in
every API call, which is what I suggested.

You missed my point. What if the code calling back into Python doesn't
know which interpreter it belongs to? Think of web server with python
callbacks registered for handling various things. Currently that
situation works just fine as the simplified gil apis just pick the main
interpreter.

You have now imposed a requirement on all extension modules that they
need to keep track of interpreters in such a way that callbacks from new
threads not started by Python know which interpreter they belong to.
This is usually possible because you can give callback data in the
external environment apis, but your mechanism would prevent any that
don't have that ability from working at all. We wouldn't find those
"broken" implementations until changing to your mechanism.
But porting could be automated by a simple Python script.

Have you actually tried it? See if you can do it for the sqlite module
which is a standard part of the Python library.
PEPs are closed for Python 3.

You glossed over my "prove the benefit outweighs the costs" bit :)
This project will let you transparently use multiple processes:

http://cheeseshop.python.org/pypi/processing

There are other techniques for parallelization using multiple processes
and even the network. For example:

http://www.artima.com/forums/flat.jsp?forum=106&thread=214303
http://www.artima.com/weblogs/viewpost.jsp?thread=214235

Roger
 
R

Roger Binns

Graham said:
When using mod_wsgi there is no problem with C extension modules which
use simplified GIL API provided that one configures mod_wsgi to
delegate that specific application to run in the context of the first
interpreter instance created by Python.

Graham, I've asked you before but never quite got a straight answer.
What *exactly* should extension authors change their code to in order to
be fully compatible? For example how should the C function below be
changed:

void somefunc(void)
{
PyGILState_STATE gilstate=PyGILState_Ensure();

abc();

Py_BEGIN_ALLOW_THREADS
def();
Py_END_ALLOW_THREADS

ghi();

PyGILState_Release(gilstate);
}

Roger
 
G

Graham Dumpleton

Graham, I've asked you before but never quite got a straight answer.

Maybe because it isn't that simple. :)
What *exactly* should extension authors change their code to in order to
be fully compatible? For example how should the C function below be
changed:

void somefunc(void)
{
PyGILState_STATE gilstate=PyGILState_Ensure();

abc();

Py_BEGIN_ALLOW_THREADS
def();
Py_END_ALLOW_THREADS

ghi();

PyGILState_Release(gilstate);

}

What you do depends on what the overall C extension module does. It
isn't really possible to say how the above may need to be changed as
there is a great deal of context which is missing as far as knowing
how that function comes to be called. Presented with that function in
isolation I can only say that using simplified GIL API is probably the
only way of doing it and therefore can only be used safely against the
first interpreter created by Python.

If the direction of calling for a C extension module is always Python
code into C code and that is far as it goes, then none of this is an
issue as you only need to use Py_BEGIN_ALLOW_THREADS and
Py_END_ALLOW_THREADS.

The problem case is where C code needs to callback into Python code
and you are not using simplified GIL API in order to be able to
support multiple sub interpreters. The easiest thing to do here is to
cache a thread state object for the interpreter instance when you
first obtained the handle for the object which allows you to interact
with C extension module internals. Later when a callback from C to
Python code needs to occur then you lookup the cached thread state
object and use that as the argument to PyEval_AcquireThread().

As example see:

http://svn.dscpl.com.au/ose/trunk/software/python/opydispatch.cc

The thread state object is cached when handle to an instance is first
created. Any callbacks which are registered remember the interpreter
pointer and then that is used as key to lookup up the cached thread
state.

This code was done a long time ago. It is possible that it needs to be
revised based on what has been learnt about simplified GIL API.

This way of doing things will also not work where it is possible that
sub interpreters are created and then later destroyed prior to process
exit because of the fact that the interpreter pointer is cached. But
then, enough C extension modules have this problem that recycling sub
interpreters in a process isn't practical anyway.

Note that the indicated file uses a global cache. The agent.hh/
opyagent.cc files at that same location implement a more complicated
caching system based on individual objects.

The software which this is from is probably a more extreme example of
what is required, but your example was too simple to draw any
conclusions from.

Graham
 
C

Christian Heimes

Roger said:
You can. Have a look at mod_python and mod_wsgi which does exactly
this. But extension modules that use the simplified GIL api don't work
with them (well, if at all).

No, you can't. Sub-interpreters share a single GIL and other state. Why
don't you run multiple processes? It's on of the oldest and best working
ways use the full potential of your system. Lot's of Unix servers like
postfix, qmail, apache (with some workers) et al. use processes.

Christian
 
S

sturlamolden

No, you can't. Sub-interpreters share a single GIL and other state. Why
don't you run multiple processes? It's on of the oldest and best working
ways use the full potential of your system. Lot's of Unix servers like
postfix, qmail, apache (with some workers) et al. use processes.

Because there is a broken prominent OS that doesn't support fork()?

MPI works with multiple processes, though, and can be used from Python
(mpi4py) even under Windows.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top