Idea for removing the GIL...

Vishal · Feb 8, 2011

Hello,

This might sound crazy..and dont know if its even possible, but...

Is it possible that the Python process, creates copies of the
interpreter for each thread that is launched, and some how the thread
is bound to its own interpreter ?

This will increase the python process size...for sure, however data
sharing will remain just like it is in threads.

and it "may" also allow the two threads to run in parallel, assuming
the processors of today can send independent instructions from the
same process to multiple cores?

Comments, suggestions, brush offs are welcome

)

I heard that this has been tried before...any info about that?

Thanks and best regards,
Vishal Sapre

Adam Tauno Williams · Feb 8, 2011

Is it possible that the Python process, creates copies of the
interpreter for each thread that is launched, and some how the thread
is bound to its own interpreter ?
and it "may" also allow the two threads to run in parallel, assuming
the processors of today can send independent instructions from the
same process to multiple cores?
Comments, suggestions, brush offs are welcome )

Yes, it is possible, and done. See the multiprocessing module. It
works very well.
<http://docs.python.org/library/multiprocessing.html>

It isn't exactly the same as threads, but provides many similar
constructs.

Vishal · Feb 8, 2011

Yes, it is possible, and done. See the multiprocessing module. It
works very well.
<http://docs.python.org/library/multiprocessing.html>

It isn't exactly the same as threads, but provides many similar
constructs.

Hi,

Pardon me for my ignorance here, but 'multiprocessing' creates actual
processes using fork() or CreateProcess().
I was talking of a single process, running multiple instances of the
interpreter. Each thread, bound with its own interpreter.
so the GIL wont be an issue anymore...each interpreter has only one
thing to do, and that one thing holds the lock on its own interpreter.
Since its still the same process, data sharing should happen just like
in Threads.

Also, multiprocessing has issues on Windows (most probably because of
the way CreateProcess() functions...)

Thanks and best regards,
Vishal

Jean-Paul Calderone · Feb 8, 2011

Hi,

Pardon me for my ignorance here, but 'multiprocessing' creates actual
processes using fork() or CreateProcess().
I was talking of a single process, running multiple instances of the
interpreter. Each thread, bound with its own interpreter.
so the GIL wont be an issue anymore...each interpreter has only one
thing to do, and that one thing holds the lock on its own interpreter.
Since its still the same process, data sharing should happen just like
in Threads.

CPython does support multiple interpreters in a single process.
However,
you cannot have your cake and eat it too. If you create multiple
interpreters,
then why do you think you'll be able to share objects between them for
free?

In what sense would you have *multiple* interpreters in that scenario?

You will need some sort of locking between the interpreters. Then
you're either
back to the GIL or to some more limited form of sharing - such as you
might
get with the multiprocessing module.

Jean-Paul

Robert Kern · Feb 8, 2011

On Tue, Feb 8, 2011 at 06:34, Vishal <[email protected]

Also, multiprocessing has issues on Windows (most probably because of
the way CreateProcess() functions...)

Such as?

Unlike a UNIX fork, CreateProcess() does not have the same copy-on-write
semantics for initializing the memory of the new process. If you want to pass
data to the children, the data must be pickled and sent across the process
boundary. He's not saying that multiprocessing isn't useful at all on Windows,
just less useful for the scenarios he is considering here.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Roy Smith · Feb 8, 2011

Robert Kern said:
Unlike a UNIX fork, CreateProcess() does not have the same copy-on-write
semantics for initializing the memory of the new process. If you want to pass
data to the children, the data must be pickled and sent across the process
boundary. He's not saying that multiprocessing isn't useful at all on
Windows, just less useful for the scenarios he is considering here.

Amen, brother! I used to work on a project that had a build system
which was very fork() intensive (lots of little perl and shell scripts
driven by make). A full system build on a linux box took 30-60 minutes.
Building the same code on windows/cygwin took about 12 hours. Identical
hardware (8-core, 16 gig Dell server, or something like that).

As far as we could tell, it was entirely due to how bad Windows was at
process creation.

Stefan Behnel · Feb 8, 2011

Roy Smith, 08.02.2011 17:52:

Amen, brother! I used to work on a project that had a build system
which was very fork() intensive (lots of little perl and shell scripts
driven by make). A full system build on a linux box took 30-60 minutes.
Building the same code on windows/cygwin took about 12 hours. Identical
hardware (8-core, 16 gig Dell server, or something like that).

As far as we could tell, it was entirely due to how bad Windows was at
process creation.

Unlikely. Since you mention cygwin, it was likely due to the heavy lifting
cygwin does in order to emulate fork() on Windows.

http://www.cygwin.com/faq/faq-nochunks.html#faq.api.fork

Stefan

Adam Tauno Williams · Feb 8, 2011

Amen, brother! I used to work on a project that had a build system
which was very fork() intensive (lots of little perl and shell scripts

Comparing issues that are simply fork() to using "multiprocessing" is a
bit of a false comparison. multiprocessing provides a fairly large set
of information sharing techniques. Just-doing-a-fork isn't really using
multiprocessing - fork'ing scripts isn't at all an equivalent to using
threads.

As far as we could tell, it was entirely due to how bad Windows was at
process creation.

Nope. If you want performance DO NOT USE cygwin.

John Nagle · Feb 8, 2011

Hello,

This might sound crazy..and dont know if its even possible, but...

Is it possible that the Python process, creates copies of the
interpreter for each thread that is launched, and some how the thread
is bound to its own interpreter ?

This will increase the python process size...for sure, however data
sharing will remain just like it is in threads.

and it "may" also allow the two threads to run in parallel, assuming
the processors of today can send independent instructions from the
same process to multiple cores?

Won't work. You'd have two threads updating the same shared data
structures without locking. In CPython, there's a reference count
shared across threads, but no locking at the object level.

The real reason for the GIL, though, is to support dynamic
code modification in multi-thread progrems. It's the ability
to replace a function while it's being executed in another thread
that's hard to do without a global lock. If it were just a data-side
problem, local object locks, a lock at the allocator, and a
concurrent garbage collector would work.

John Nagle

Carl Banks · Feb 8, 2011

The real reason for the GIL, though, is to support dynamic
code modification in multi-thread progrems. It's the ability
to replace a function while it's being executed in another thread
that's hard to do without a global lock. If it were just a data-side
problem, local object locks, a lock at the allocator, and a
concurrent garbage collector would work.

I realize that you believe that Python's hyper-dynamicism is the cause
of all evils in the world, but in this case you're not correct.

Concurrent garbage collectors work just fine in IronPython and Jython,
which are just as dynamic as CPython. I'm not sure why you think an
executing function would be considered inaccessible and subject to
collection. If you replace a function (code object, actually) in
another thread it only deletes the reference from that namespace,
references on the executing stack still exist.

The real reason they never replaced the GIL is that fine-grained
locking is expensive with reference counting. The only way the cost
of finer-grained locking would be acceptable, then, is if they got rid
of the reference counting altogether, and that was considered too
drastic a change.

Carl Banks

sturlamolden · Feb 8, 2011

Is it possible that the Python process, creates copies of the
interpreter for each thread that is launched, and some how the thread
is bound to its own interpreter ?

In .NET lingo this is called an 'AppDomain'. This is also how tcl
works -- one interpreter per thread. I once had a mock-up of that
using ctypes and Python'c C API. However, the problem with 'app
domains' is that OS handles are global to the process. To make OS
handles private, the easiest solution is to use multiple processes,
which incidentally is what the 'multiprocessing' modules does (or just
os.fork if you are on Unix).

Most people would not consider 'app domains' to be a true GIL-free
Python, but rather think of free threading comparable to .NET, Java
and C++. However, removing the GIL will do no good as long as CPython
uses reference counting. Any access to reference counts must be atomic
(e.g. requiring a mutex or spinlock). Here we can imagine using fine-
grained locking instead of a global interpreter lock. There is a
second problem, which might not be as obvious: In parallel computing
there is something called 'false sharing', which in this case will be
incurred on the reference counts. That is, any updating will dirty the
cache lines everywhere; all processors must stop whatever they are
doing to synchronize cache with RAM. This 'false sharing' will put the
scalability down the drain.

To make a GIL free Python, we must start by removing reference
counting in favour of a generational garbage collector. That also
comes with a cost. The interpreter will sometimes pause to collect
garbage. The memory use will be larger as well, as garbage remain
uncollected for a while and is not immediately reclaimed. Many rely on
CPython because the interpreter does not pause and a Python process
has a small fingerprint. If we change this, we have 'yet another
Java'. There are already IronPython and Jython for those who want
this.

Sturla

Paul Rubin · Feb 9, 2011

sturlamolden said:
comes with a cost. The interpreter will sometimes pause to collect
garbage. The memory use will be larger as well, as garbage remain
uncollected for a while and is not immediately reclaimed. Many rely on
CPython because the interpreter does not pause and a Python process
has a small fingerprint.

We've had that discussion before: CPython's refcount scheme can also
pause (if the last reference to a large structure is released), CPython
has its own gc for cyclic structure with its own pauses, and Python is
fairly memory hungry compared to plenty of small Lisp systems or even
something like J2ME. Python has many nice qualities which is why I use
it every day. But the refcount scheme is just an implementation hack
that gets rationalized way too much. I hope PyPy abandons it.

Jean-Paul Calderone · Feb 9, 2011

But the refcount scheme is just an implementation hack
that gets rationalized way too much. I hope PyPy abandons it.

Done.

Jean-Paul

Aahz · Mar 1, 2011

The real reason they never replaced the GIL is that fine-grained
locking is expensive with reference counting. The only way the cost
of finer-grained locking would be acceptable, then, is if they got rid
of the reference counting altogether, and that was considered too
drastic a change.

....especially given CPython's goal of easy integration with C libraries.
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"Programming language design is not a rational science. Most reasoning
about it is at best rationalization of gut feelings, and at worst plain
wrong." --GvR, python-ideas, 2009-03-01

Stefan Behnel · Mar 1, 2011

Aahz, 01.03.2011 03:02:

...especially given CPython's goal of easy integration with C libraries.

+1, the GIL is much more rarely a problem than some people want to make it
appear. Especially those who don't understand why it's there, or who fail
to notice that threading is not the only way to do parallel processing (and
certainly not the easiest either).

Stefan

xmlrpc idea for getting around the GIL	4	Nov 22, 2009
Determining if any threads are waiting for GIL	0	Dec 20, 2012
Possible suggestion for removing the GIL	4	Sep 13, 2007
Threads, GIL and re.match() performance	5	Jun 25, 2008
when does the GIL really block?	4	Aug 1, 2008
[ANN]: "newthreading" - an approach to simplified thread usage, anda path to getting rid of the GIL	2	Jun 25, 2010
SMP, GIL and Threads	11	Dec 16, 2005
How to release the GIL from C?	1	May 23, 2007

Idea for removing the GIL...

Vishal

Adam Tauno Williams

Vishal

Jean-Paul Calderone

Robert Kern

Roy Smith

Stefan Behnel

Adam Tauno Williams

John Nagle

Carl Banks

sturlamolden

Paul Rubin

Jean-Paul Calderone

Aahz

Stefan Behnel

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads