S
sturlamolden
Python has a GIL that impairs scalability on computers with more than
one processor. The problem seems to be that there is only one GIL per
process. Solutions to removing the GIL has always stranded on the need
for 'fine grained locking' on reference counts. I believe there is a
second way, which has been overlooked: Having one GIL per interpreter
instead of one GIL per process.
Currently, the Python C API - as I understand it - only allows for a
single interpreter per process. Here is how Python would be embedded
in a multi-threaded C program today, with the GIL shared among the C
threads:
#include <windows.h>
#include <Python.h>
#include <process.h>
void threadproc(void *data)
{
/* create a thread state for this thread */
PyThreadState *mainstate = NULL;
mainstate = PyThreadState_Get();
PyThreadState *threadstate = PyThreadState_New(mainstate);
PyEval_ReleaseLock();
/* swap this thread in, do whatever we need */
PyEval_AcquireLock();
PyThreadState_Swap(threadstate);
PyRun_SimpleString("print 'Hello World1'\n");
PyThreadState_Swap(NULL);
PyEval_ReleaseLock();
/* clear thread state for this thread */
PyEval_AcquireLock();
PyThreadState_Swap(NULL);
PyThreadState_Clear(threadstate);
PyThreadState_Delete(threadstate);
PyEval_ReleaseLock();
/* tell Windows this thread is done */
_endthread();
}
int main(int argc, char *argv[])
{
HANDLE t1, t2, t3;
Py_Initialize();
PyEval_InitThreads();
t1 = _beginthread(threadproc, 0, NULL);
t2 = _beginthread(threadproc, 0, NULL);
t3 = _beginthread(threadproc, 0, NULL);
WaitForMultipleObjects(3, {t1, t2, t3}, TRUE, INFINITE);
Py_Finalize();
return 0;
}
In the Java native interface (JNI) all functions take an en
environment variable for the VM. The same thing could be done for
Python, with the VM including GIL encapsulated in a single object:
#include <windows.h>
#include <Python.h>
#include <process.h>
void threadproc(void *data)
{
PyVM *vm = Py_Initialize(); /* create a new interpreter */
PyRun_SimpleString(vm, "print 'Hello World1'\n");
Py_Finalize(vm);
_endthread();
}
int main(int argc, char *argv[])
{
HANDLE t1 = _beginthread(threadproc, 0, NULL);
HANDLE t2 = _beginthread(threadproc, 0, NULL);
HANDLE t3 = _beginthread(threadproc, 0, NULL);
WaitForMultipleObjects(3, {t1, t2, t3}, TRUE, INFINITE);
return 0;
}
Doesn't that look a lot nicer?
If one can have more than one interpreter in a single process, it is
possible to create a pool of them and implement concurrent programming
paradigms such as 'forkjoin' (to appear in Java 7, already in C# 3.0).
It would be possible to emulate a fork on platforms not supporting a
native fork(), such as Windows. Perl does this in 'perlfork'. This
would deal with the GIL issue on computers with more than one CPU.
One could actually use ctypes to embed a pool of Python interpreters
in a process already running Python.
Most of the conversion of the current Python C API could be automated.
Python would also need to be linked against a multi-threaded version
of the C library.
one processor. The problem seems to be that there is only one GIL per
process. Solutions to removing the GIL has always stranded on the need
for 'fine grained locking' on reference counts. I believe there is a
second way, which has been overlooked: Having one GIL per interpreter
instead of one GIL per process.
Currently, the Python C API - as I understand it - only allows for a
single interpreter per process. Here is how Python would be embedded
in a multi-threaded C program today, with the GIL shared among the C
threads:
#include <windows.h>
#include <Python.h>
#include <process.h>
void threadproc(void *data)
{
/* create a thread state for this thread */
PyThreadState *mainstate = NULL;
mainstate = PyThreadState_Get();
PyThreadState *threadstate = PyThreadState_New(mainstate);
PyEval_ReleaseLock();
/* swap this thread in, do whatever we need */
PyEval_AcquireLock();
PyThreadState_Swap(threadstate);
PyRun_SimpleString("print 'Hello World1'\n");
PyThreadState_Swap(NULL);
PyEval_ReleaseLock();
/* clear thread state for this thread */
PyEval_AcquireLock();
PyThreadState_Swap(NULL);
PyThreadState_Clear(threadstate);
PyThreadState_Delete(threadstate);
PyEval_ReleaseLock();
/* tell Windows this thread is done */
_endthread();
}
int main(int argc, char *argv[])
{
HANDLE t1, t2, t3;
Py_Initialize();
PyEval_InitThreads();
t1 = _beginthread(threadproc, 0, NULL);
t2 = _beginthread(threadproc, 0, NULL);
t3 = _beginthread(threadproc, 0, NULL);
WaitForMultipleObjects(3, {t1, t2, t3}, TRUE, INFINITE);
Py_Finalize();
return 0;
}
In the Java native interface (JNI) all functions take an en
environment variable for the VM. The same thing could be done for
Python, with the VM including GIL encapsulated in a single object:
#include <windows.h>
#include <Python.h>
#include <process.h>
void threadproc(void *data)
{
PyVM *vm = Py_Initialize(); /* create a new interpreter */
PyRun_SimpleString(vm, "print 'Hello World1'\n");
Py_Finalize(vm);
_endthread();
}
int main(int argc, char *argv[])
{
HANDLE t1 = _beginthread(threadproc, 0, NULL);
HANDLE t2 = _beginthread(threadproc, 0, NULL);
HANDLE t3 = _beginthread(threadproc, 0, NULL);
WaitForMultipleObjects(3, {t1, t2, t3}, TRUE, INFINITE);
return 0;
}
Doesn't that look a lot nicer?
If one can have more than one interpreter in a single process, it is
possible to create a pool of them and implement concurrent programming
paradigms such as 'forkjoin' (to appear in Java 7, already in C# 3.0).
It would be possible to emulate a fork on platforms not supporting a
native fork(), such as Windows. Perl does this in 'perlfork'. This
would deal with the GIL issue on computers with more than one CPU.
One could actually use ctypes to embed a pool of Python interpreters
in a process already running Python.
Most of the conversion of the current Python C API could be automated.
Python would also need to be linked against a multi-threaded version
of the C library.