Idea for removing the GIL...

V

Vishal

Hello,

This might sound crazy..and dont know if its even possible, but...

Is it possible that the Python process, creates copies of the
interpreter for each thread that is launched, and some how the thread
is bound to its own interpreter ?

This will increase the python process size...for sure, however data
sharing will remain just like it is in threads.

and it "may" also allow the two threads to run in parallel, assuming
the processors of today can send independent instructions from the
same process to multiple cores?

Comments, suggestions, brush offs are welcome :))

I heard that this has been tried before...any info about that?

Thanks and best regards,
Vishal Sapre
 
A

Adam Tauno Williams

Is it possible that the Python process, creates copies of the
interpreter for each thread that is launched, and some how the thread
is bound to its own interpreter ?
and it "may" also allow the two threads to run in parallel, assuming
the processors of today can send independent instructions from the
same process to multiple cores?
Comments, suggestions, brush offs are welcome :))

Yes, it is possible, and done. See the multiprocessing module. It
works very well.
<http://docs.python.org/library/multiprocessing.html>

It isn't exactly the same as threads, but provides many similar
constructs.
 
V

Vishal

Yes, it is possible, and done.  See the multiprocessing module.  It
works very well.
<http://docs.python.org/library/multiprocessing.html>

It isn't exactly the same as threads, but provides many similar
constructs.

Hi,

Pardon me for my ignorance here, but 'multiprocessing' creates actual
processes using fork() or CreateProcess().
I was talking of a single process, running multiple instances of the
interpreter. Each thread, bound with its own interpreter.
so the GIL wont be an issue anymore...each interpreter has only one
thing to do, and that one thing holds the lock on its own interpreter.
Since its still the same process, data sharing should happen just like
in Threads.

Also, multiprocessing has issues on Windows (most probably because of
the way CreateProcess() functions...)

Thanks and best regards,
Vishal
 
J

Jean-Paul Calderone

Hi,

Pardon me for my ignorance here, but 'multiprocessing' creates actual
processes using fork() or CreateProcess().
I was talking of a single process, running multiple instances of the
interpreter. Each thread, bound with its own interpreter.
so the GIL wont be an issue anymore...each interpreter has only one
thing to do, and that one thing holds the lock on its own interpreter.
Since its still the same process, data sharing should happen just like
in Threads.

CPython does support multiple interpreters in a single process.
However,
you cannot have your cake and eat it too. If you create multiple
interpreters,
then why do you think you'll be able to share objects between them for
free?

In what sense would you have *multiple* interpreters in that scenario?

You will need some sort of locking between the interpreters. Then
you're either
back to the GIL or to some more limited form of sharing - such as you
might
get with the multiprocessing module.

Jean-Paul
 
R

Robert Kern

On Tue, Feb 8, 2011 at 06:34, Vishal <[email protected]

Also, multiprocessing has issues on Windows (most probably because of
the way CreateProcess() functions...)

Such as?

Unlike a UNIX fork, CreateProcess() does not have the same copy-on-write
semantics for initializing the memory of the new process. If you want to pass
data to the children, the data must be pickled and sent across the process
boundary. He's not saying that multiprocessing isn't useful at all on Windows,
just less useful for the scenarios he is considering here.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
R

Roy Smith

Robert Kern said:
Unlike a UNIX fork, CreateProcess() does not have the same copy-on-write
semantics for initializing the memory of the new process. If you want to pass
data to the children, the data must be pickled and sent across the process
boundary. He's not saying that multiprocessing isn't useful at all on
Windows, just less useful for the scenarios he is considering here.

Amen, brother! I used to work on a project that had a build system
which was very fork() intensive (lots of little perl and shell scripts
driven by make). A full system build on a linux box took 30-60 minutes.
Building the same code on windows/cygwin took about 12 hours. Identical
hardware (8-core, 16 gig Dell server, or something like that).

As far as we could tell, it was entirely due to how bad Windows was at
process creation.
 
S

Stefan Behnel

Roy Smith, 08.02.2011 17:52:
Amen, brother! I used to work on a project that had a build system
which was very fork() intensive (lots of little perl and shell scripts
driven by make). A full system build on a linux box took 30-60 minutes.
Building the same code on windows/cygwin took about 12 hours. Identical
hardware (8-core, 16 gig Dell server, or something like that).

As far as we could tell, it was entirely due to how bad Windows was at
process creation.

Unlikely. Since you mention cygwin, it was likely due to the heavy lifting
cygwin does in order to emulate fork() on Windows.

http://www.cygwin.com/faq/faq-nochunks.html#faq.api.fork

Stefan
 
A

Adam Tauno Williams

Amen, brother! I used to work on a project that had a build system
which was very fork() intensive (lots of little perl and shell scripts

Comparing issues that are simply fork() to using "multiprocessing" is a
bit of a false comparison. multiprocessing provides a fairly large set
of information sharing techniques. Just-doing-a-fork isn't really using
multiprocessing - fork'ing scripts isn't at all an equivalent to using
threads.
As far as we could tell, it was entirely due to how bad Windows was at
process creation.

Nope. If you want performance DO NOT USE cygwin.
 
J

John Nagle

Hello,

This might sound crazy..and dont know if its even possible, but...

Is it possible that the Python process, creates copies of the
interpreter for each thread that is launched, and some how the thread
is bound to its own interpreter ?

This will increase the python process size...for sure, however data
sharing will remain just like it is in threads.

and it "may" also allow the two threads to run in parallel, assuming
the processors of today can send independent instructions from the
same process to multiple cores?

Won't work. You'd have two threads updating the same shared data
structures without locking. In CPython, there's a reference count
shared across threads, but no locking at the object level.

The real reason for the GIL, though, is to support dynamic
code modification in multi-thread progrems. It's the ability
to replace a function while it's being executed in another thread
that's hard to do without a global lock. If it were just a data-side
problem, local object locks, a lock at the allocator, and a
concurrent garbage collector would work.

John Nagle
 
C

Carl Banks

    The real reason for the GIL, though, is to support dynamic
code modification in multi-thread progrems.  It's the ability
to replace a function while it's being executed in another thread
that's hard to do without a global lock.  If it were just a data-side
problem, local object locks, a lock at the allocator, and a
concurrent garbage collector would work.

I realize that you believe that Python's hyper-dynamicism is the cause
of all evils in the world, but in this case you're not correct.

Concurrent garbage collectors work just fine in IronPython and Jython,
which are just as dynamic as CPython. I'm not sure why you think an
executing function would be considered inaccessible and subject to
collection. If you replace a function (code object, actually) in
another thread it only deletes the reference from that namespace,
references on the executing stack still exist.

The real reason they never replaced the GIL is that fine-grained
locking is expensive with reference counting. The only way the cost
of finer-grained locking would be acceptable, then, is if they got rid
of the reference counting altogether, and that was considered too
drastic a change.


Carl Banks
 
S

sturlamolden

Is it possible that the Python process, creates copies of the
interpreter for each thread that is launched, and some how the thread
is bound to its own interpreter ?


In .NET lingo this is called an 'AppDomain'. This is also how tcl
works -- one interpreter per thread. I once had a mock-up of that
using ctypes and Python'c C API. However, the problem with 'app
domains' is that OS handles are global to the process. To make OS
handles private, the easiest solution is to use multiple processes,
which incidentally is what the 'multiprocessing' modules does (or just
os.fork if you are on Unix).

Most people would not consider 'app domains' to be a true GIL-free
Python, but rather think of free threading comparable to .NET, Java
and C++. However, removing the GIL will do no good as long as CPython
uses reference counting. Any access to reference counts must be atomic
(e.g. requiring a mutex or spinlock). Here we can imagine using fine-
grained locking instead of a global interpreter lock. There is a
second problem, which might not be as obvious: In parallel computing
there is something called 'false sharing', which in this case will be
incurred on the reference counts. That is, any updating will dirty the
cache lines everywhere; all processors must stop whatever they are
doing to synchronize cache with RAM. This 'false sharing' will put the
scalability down the drain.

To make a GIL free Python, we must start by removing reference
counting in favour of a generational garbage collector. That also
comes with a cost. The interpreter will sometimes pause to collect
garbage. The memory use will be larger as well, as garbage remain
uncollected for a while and is not immediately reclaimed. Many rely on
CPython because the interpreter does not pause and a Python process
has a small fingerprint. If we change this, we have 'yet another
Java'. There are already IronPython and Jython for those who want
this.


Sturla
 
P

Paul Rubin

sturlamolden said:
comes with a cost. The interpreter will sometimes pause to collect
garbage. The memory use will be larger as well, as garbage remain
uncollected for a while and is not immediately reclaimed. Many rely on
CPython because the interpreter does not pause and a Python process
has a small fingerprint.

We've had that discussion before: CPython's refcount scheme can also
pause (if the last reference to a large structure is released), CPython
has its own gc for cyclic structure with its own pauses, and Python is
fairly memory hungry compared to plenty of small Lisp systems or even
something like J2ME. Python has many nice qualities which is why I use
it every day. But the refcount scheme is just an implementation hack
that gets rationalized way too much. I hope PyPy abandons it.
 
A

Aahz

The real reason they never replaced the GIL is that fine-grained
locking is expensive with reference counting. The only way the cost
of finer-grained locking would be acceptable, then, is if they got rid
of the reference counting altogether, and that was considered too
drastic a change.

....especially given CPython's goal of easy integration with C libraries.
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"Programming language design is not a rational science. Most reasoning
about it is at best rationalization of gut feelings, and at worst plain
wrong." --GvR, python-ideas, 2009-03-01
 
S

Stefan Behnel

Aahz, 01.03.2011 03:02:
...especially given CPython's goal of easy integration with C libraries.

+1, the GIL is much more rarely a problem than some people want to make it
appear. Especially those who don't understand why it's there, or who fail
to notice that threading is not the only way to do parallel processing (and
certainly not the easiest either).

Stefan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top