Will Python 3.0 remove the global interpreter lock (GIL)

S

Steven D'Aprano

Steven, You forgot this part:

"And if you decide to answer, please add a true/false response to this
statement - "CPython in the late 1990's ran too slow"'.


No, I ignored it, because it doesn't have a true/false response. It's a
malformed request. "Too slow" for what task? Compared to what
alternative? Fast and slow are not absolute terms, they are relative. A
sloth is "fast" compared to continental drift, but "slow" compared to the
space shuttle.

BUT even if we all agreed that CPython was (or wasn't) "too slow" in the
late 1990s, why on earth do you imagine that is important? It is no
longer the late 1990s, it is now 2007, and we are not using Python 1.4
any more.
 
S

Steven D'Aprano

Paul it's a pleasure to see that you are not entirely against
complaints.

I'm not against complaints either, so long as they are well-thought out.
I've made a few of my own over the years, some of which may have been
less well-thought out than others.

The very fastest Intel processor of the last 1990's that I found came
out in October 1999 and had a speed around 783Mhz. Current fastest
processors are something like 3.74 Ghz, with larger caches. Memory is
also faster and larger. It appears that someone running a non-GIL
implementation of CPython today would have significantly faster
performance than a GIL CPython implementation of the late 1990's.

That's an irrelevant comparison. It's a STUPID comparison. The two
alternatives aren't "non-GIL CPython on 2007 hardware" versus "GIL
CPython on 1999 hardware" because we aren't using GIL CPython on 1999
hardware, we're using it on 2007 hardware. *That's* the alternative to
the non-GIL CPython that you need to compare against.

Why turn your back on eight years of faster hardware? What's the point of
getting rid of the GIL unless it leads to faster code? "Get the speed and
performance of 1999 today!" doesn't seem much of a selling point in 2007.

Correct me if I am wrong, but it seems that saying non-GIL CPython is
too slow, while once valid, has become invalid due to the increase in
computing power that has taken place.

You're wrong, because the finishing line has shifted -- performance we
were satisfied with in 1998 would be considered unbearable to work with
in 2007.

I remember in 1996 (give or take a year) being pleased that my new
computer allowed my Pascal compiler to compile a basic, bare-bones GUI
text editor in a mere two or four hours, because it used to take up to
half a day on my older computer. Now, I expect to compile a basic text
editor in minutes, not hours.

According to http://linuxreviews.org/gentoo/compiletimes/

the whole of Openoffice-ximian takes around six hours to compile. Given
the speed of my 1996 computer, it would probably take six YEARS to
compile something of Openoffice's complexity.


As a purely academic exercise, we might concede that the non-GIL version
of CPython 1.5 running on a modern, dual-core CPU with lots of RAM will
be faster than CPython 2.5 running on an eight-year old CPU with minimal
RAM. But so what? That's of zero practical interest for anyone running
CPython 2.5 on a modern PC.

If you are running a 1999 PC, your best bet is to stick with the standard
CPython 1.5 including the GIL, because it is faster than the non-GIL
version.

If you are running a 2007 PC, your best bet is *still* to stick with the
standard CPython (version 2.5 now, not 1.5), because it will still be
faster than the non-GIL version (unless you have four or more processors,
and maybe not even then).

Otherwise, there's always Jython or IronPython.
 
P

Paul Rubin

TheFlyingDutchman said:
The very fastest Intel processor of the last 1990's that I found came
out in October 1999 and had a speed around 783Mhz. Current fastest
processors are something like 3.74 Ghz, with larger caches. Memory is
also faster and larger. It appears that someone running a non-GIL
implementation of CPython today would have significantly faster
performance than a GIL CPython implementation of the late 1990's.
Correct me if I am wrong, but it seems that saying non-GIL CPython is
too slow, while once valid, has become invalid due to the increase in
computing power that has taken place.

This reasoning is invalid. For one thing, disk and memory sizes and
network bandwith have increased by a much larger factor than CPU speed
since the late 1990's. A big disk drive in 1999 was maybe 20gb; today
it's 750gb, almost 40x larger, way outstripping the 5x cpu mhz
increase. A fast business network connection was a 1.4 mbit/sec T-1
line, today it's often 100 mbit or more, again far oustripping CPU
mhz. If Python was just fast enough to firewall your T1 net
connection or index your 20gb hard drive in 1999, it's way too slow to
do the same with today's net connections and hard drives, just because
of that change in the hardware landscape. We have just about stopped
seeing increases in cpu mhz: that 3.74ghz speed was probably reached a
couple years ago. We get cpu speed increases now through parallelism,
not mhz. Intel and AMD both have 4-core cpu's now and Intel has a
16-core chip coming. Python is at a serious disadvantage compared
with other languages if the other languages keep up with developments
and Python does not.

Also, Python in the late 90's was pitched as a "scripting language",
intended for small throwaway tasks, while today it's used for complex
applications, and the language has evolved accordingly. CPython is
way behind the times, not only from the GIL, but because its slow
bytecode interpreter, its non-compacting GC, etc. The platitude that
performance doesn't matter, that programmer time is more valuable than
machine time, etc. is at best an excuse for laziness. And more and
more often, in the application areas where Python is deployed, it's
just plain wrong. Take web servers: a big site like Google has
something like a half million of them. Even the comparatively wimpy
site where I work has a couple thousand. If each server uses 150
watts of power (plus air conditioning), then if making the software 2x
faster lets us shut down 1000 of them, the savings in electricity
bills alone is larger than my salary. Of course that doesn't include
environmental benefits, hardware and hosting costs, the costs and
headaches of administering that many boxes, etc. For a lot of Python
users, significant speedups are a huge win.

However, I don't think fixing CPython (through GIL removal or anything
else) is the answer, and Jython isn't the answer either. Python's
future is in PyPy, or should be. Why would a self-respecting Python
implementation be written in (yikes) C or (yucch) Java, if it can be
written in Python? So I hope that PyPy's future directions include
true parallelism.
 
S

Steven D'Aprano

We get cpu speed increases now through parallelism, not mhz. Intel and
AMD both have 4-core cpu's now and Intel has a 16-core chip coming.
Python is at a serious disadvantage compared with other languages if the
other languages keep up with developments and Python does not.

I think what you mean to say is that Python _will be_ at a serious
disadvantage if other languages keep up and Python doesn't. Python can't
be at a disadvantage _now_ because of what happens in the future.

Although, with the rapid take-up of multi-core CPUs, the future is
*really close*, so I welcome the earlier comment from Terry Reedy that
Guido has said he is willing to make changes to the CPython internals to
support multiprocessors, and that people have begun to investigate
practical methods of removing the GIL (as opposed to just bitching about
it for the sake of bitching).

The platitude that performance doesn't matter

Who on earth says that? I've never heard anyone say that.

What I've heard people say is that _machine_ performance isn't the only
thing that needs to be maximized, or even the most important thing.
Otherwise we'd all be writing hand-optimized assembly language, and there
would be a waiting line of about five years to get access to the few
programmers capable of writing that hand-optimized assembly language.

that programmer time is more valuable than machine time

Programmer time is more valuable than machine time in many cases,
especially when tasks are easily parallisable across many machines.
That's why your "comparatively wimpy site" preferred to throw extra web
servers at the job of serving webpages rather than investing in smarter,
harder-working programmers to pull the last skerricks of performance out
of the hardware you already had.

etc. is at best an excuse for laziness.

What are you doing about solving the problem? Apart from standing on the
side-lines calling out "Get yer lazy behinds movin', yer lazy bums!!!" at
the people who aren't even convinced there is a problem that needs
solving?

And more and more often, in the
application areas where Python is deployed, it's just plain wrong. Take
web servers: a big site like Google has something like a half million of
them. Even the comparatively wimpy site where I work has a couple
thousand. If each server uses 150 watts of power (plus air
conditioning), then if making the software 2x faster lets us shut down
1000 of them,

What on earth makes you think that would be anything more than a
temporary, VERY temporary, shutdown? My prediction is that the last of
the machines wouldn't have even been unplugged before management decided
that running twice as fast, or servicing twice as many people at the same
speed, is more important than saving on the electricity bill, and they'd
be plugged back in.

the savings in electricity bills alone is larger than my
salary. Of course that doesn't include environmental benefits, hardware
and hosting costs, the costs and headaches of administering that many
boxes, etc. For a lot of Python users, significant speedups are a huge
win.

Oh, I wouldn't say "No thanks!" to a Python speed up. My newest PC has a
dual-core CPU (no cutting edge for me...) and while Python is faster on
it than it was on my old PC, it isn't twice as fast.

But Python speed ups don't come for free. For instance, I'd *really*
object if Python ran twice as fast for users with a quad-core CPU, but
twice as slow for users like me with only a dual-core CPU.

I'd also object if the cost of Python running twice as fast was for the
startup time to quadruple, because I already run a lot of small scripts
where the time to launch the interpreter is a significant fraction of the
total run time. If I wanted something like Java, that runs fast once it
is started but takes a LONG time to actually start, I know where to find
it.

I'd also object if the cost of Python running twice as fast was for Guido
and the rest of the Python-dev team to present me with their wages bill
for six months of development. I'm grateful that somebody is paying their
wages, but if I had to pay for it myself it wouldn't be done. It simply
isn't that important to me (and even if it was, I couldn't afford it).

Now there's a thought... given that Google:

(1) has lots of money;
(2) uses Python a lot;
(3) already employs both Guido and (I think...) Alex Martelli and
possibly other Python gurus;
(4) is not shy in investing in Open Source projects;
(5) and most importantly uses technologies that need to be used across
multiple processors and multiple machines

one wonders if Google's opinion of where core Python development needs to
go is the same as your opinion?
 
T

TheFlyingDutchman

On Wed, 19 Sep 2007 19:14:39 -0700, Paul Rubin wrote:


What are you doing about solving the problem? Apart from standing on the
side-lines calling out "Get yer lazy behinds movin', yer lazy bums!!!" at
the people who aren't even convinced there is a problem that needs
solving?

He's trying to convince the developers that there is a problem. That
is not the same as your strawman argument.
What on earth makes you think that would be anything more than a
temporary, VERY temporary, shutdown? My prediction is that the last of
the machines wouldn't have even been unplugged before management decided
that running twice as fast, or servicing twice as many people at the same
speed, is more important than saving on the electricity bill, and they'd
be plugged back in.
Plugging back in 1000 servers would be preferable to buying and
plugging in 2000 new servers which is what would occur if the software
in this example had not been sped up 2x and management had still
desired a 2x speed up in system performance as you suggest.
 
T

TheFlyingDutchman

"Terry Reedy" <[email protected]> wrote in message

This is a little confusing because google groups does not show your
original post (not uncommon for them to lose a post in a thread - but
somehow still reflect the fact that it exists in the total-posts
number that they display) that you are replying to.

This assumes that comparing versions of 1.5 is still relevant. As far as I
know, his patch has not been maintained to apply against current Python.
This tells me that no one to date really wants to dump the GIL at the cost
of half Python's speed. Of course not. The point of dumping the GIL is to
use multiprocessors to get more speed! So with two cores and extra
overhead, Stein-patched 1.5 would not even break even.

Quad (and more) cores are a different matter. Hence, I think, the
resurgence of interest.

I am confused about the benefits/disadvantages of the "GIL removal".
Is it correct that the GIL is preventing CPython from having threads?

Is it correct that the only issue with the GIL is the prevention of
being able to do multi-threading?

If you only planned on writing single-threaded applications would GIL-
removal have no benefit?

Can threading have a performance benefit on a single-core machine
versus running multiple processes?
So now this question for you: "CPython 2.5 runs too slow in 2007: true or
false?"

I guess I gotta go with Steven D'Aprano - both true and false
depending on your situation.
If you answer false, then there is no need for GIL removal.

OK, I see that.
If you answer true, then cutting its speed for 90+% of people is bad.

OK, seems reasonable, assuming that multi-threading cannot be
implemented without a performance hit on single-threaded applications.
Is that a computer science maxim - giving an interpreted language
multi-threading will always negatively impact the performance of
single-threaded applications?
| Most people are not currently bothered by the GIL and would not want its
| speed halved.

And another question: why should such people spend time they do not have to
make Python worse for themselves?
Saying they don't have time to make a change, any change, is always
valid in my book. I cannot argue against that. Ditto for them saying
they don't want to make a change with no explanation. But it seems if
they make statements about why a change is not good, then it is fair
to make a counter-argument. I do agree with the theme of Steven
D'Aprano's comments in that it should be a cordial counter-argument
and not a demand.
 
T

TheFlyingDutchman

2

This assumes that comparing versions of 1.5 is still relevant. As far as I
know, his patch has not been maintained to apply against current Python.
This tells me that no one to date really wants to dump the GIL at the cost
of half Python's speed. Of course not. The point of dumping the GIL is to
use multiprocessors to get more speed! So with two cores and extra
overhead, Stein-patched 1.5 would not even break even.

Is the only point in getting rid of the GIL to allow multi-threaded
applications?

Can't multiple threads also provide a performance boost versus
multiple processes on a single-core machine?
So now this question for you: "CPython 2.5 runs too slow in 2007: true or
false?"

Ugh, I guess I have to agree with Steven D'Aprano - it depends.
If you answer false, then there is no need for GIL removal.

OK, I can see that.
If you answer true, then cutting its speed for 90+% of people is bad.

OK, have to agree. Sounds like it could be a good candidate for a
fork. One question - is it a computer science maxim that an
interpreter that implements multi-threading will always be slower when
running single threaded apps?
And another question: why should such people spend time they do not have to
make Python worse for themselves?

I can't make an argument for someone doing something for free that
they don't have the time for. Ditto for doing something for free that
they don't want to do. But it does seem that if they give a reason for
why it's the wrong thing to do, it's fair to make a counter-argument.
Although I agree with Steven D'Aprano's theme in that it should be a
cordial rebuttal and not a demand.
 
H

Hendrik van Rooyen

Steven D'Aprano said:
I think a better question is, how much faster/slower would Stein's code
be on today's processors, versus CPython being hand-simulated in a giant
virtual machine made of clockwork?

This obviously depends on whether or not the clockwork is orange

- Hendrik
 
B

Bruno Desthuilliers

Ben Finney a écrit :
(snip)
One common response to that is "Processes are expensive on Win32". My
response to that is that if you're programming on Win32 and expecting
the application to scale well, you already have problems that must
first be addressed that are far more fundamental than the GIL.

Lol ! +1 QOTW !
 
P

Paul Boddie

Paul it's a pleasure to see that you are not entirely against
complaints.

Well, it seems to me that I'm usually the one making them. ;-)
The very fastest Intel processor of the last 1990's that I found came
out in October 1999 and had a speed around 783Mhz. Current fastest
processors are something like 3.74 Ghz, with larger caches.

True, although you're paying silly money for a 3.8 GHz CPU with a
reasonable cache. However, as always, you can get something not too
far off for a reasonable sum. When I bought my CPU two or so years
ago, there was a substantial premium for as little as 200 MHz over the
3.0 GHz CPU I went for, and likewise a 3.4 GHz CPU seems to be had for
a reasonable price these days in comparison to the unit with an extra
400 MHz.

Returning to the subject under discussion, though, one big difference
between then and now is the availability of dual core CPUs, and these
seem to be fairly competitive on price with single cores, although the
frequencies of each core are lower and you have to decide whether you
believe the AMD marketing numbers: is a dual 2.2 GHz core CPU "4200+"
or not, for example? One can argue whether it's better to have two
cores, especially for certain kinds of applications (and CPython,
naturally), but if I were compiling lots of stuff, the ability to do a
"make -j2" and have a decent speed-up would almost certainly push me
in the direction of multicore units, especially if the CPU consumed
less power. And if anyone thinks all this parallelism is just
hypothetical, they should take a look at distcc to see a fairly clear
roadmap for certain kinds of workloads.
Memory is also faster and larger. It appears that someone running a non-GIL
implementation of CPython today would have significantly faster
performance than a GIL CPython implementation of the late 1990's.
Correct me if I am wrong, but it seems that saying non-GIL CPython is
too slow, while once valid, has become invalid due to the increase in
computing power that has taken place.

Although others have picked over these arguments, I can see what
you're getting at: even if we take a fair proportion of the increase
in computing power since the late 1990s, rather than 100% of it,
CPython without the GIL would still faster and have more potential for
further speed increases in more parallel architectures, rather than
running as fast as possible on a "sequential" architecture where not
even obscene amounts of money will buy you significantly better
performance. But I don't think it's so interesting to consider this
situation as merely a case of removing the GIL and using lots of
threads.

Let us return to the case I referenced above: even across networks,
where the communications cost is significantly higher than that of
physical memory, distributed compilation can provide a good
performance curve. Now I'm not arguing that every computational task
can be distributed in such a way, but we can see that some
applications of parallelisation are mature, even mainstream. There are
also less extreme cases: various network services can be scaled up
relatively effectively by employing multiple processes, as is the UNIX
way; some kinds of computation can be done in separate processes and
the results collected later on - we do this with relational databases
all the time. So, we already know that monolithic multithreaded
processes are not the only answer. (Java put an emphasis on extensive
multithreading and sandboxing because of the requirements of running
different people's code side-by-side on embedded platforms with
relatively few operating system conveniences, as well as on Microsoft
Windows, of course.)

If the programmer cost in removing the GIL and maintaining a GIL-free
CPython ecosystem is too great, then perhaps it is less expensive to
employ other, already well-understood mechanisms instead. Of course,
there's no "global programmer lock", so everyone interested in doing
something about removing the GIL, writing their own Python
implementation, or whatever they see to be the solution can freely do
so without waiting for someone else to get round to it. Like those
more readily parallelisable applications mentioned above, more stuff
can get done provided that everyone doesn't decide to join the same
project. A lesson from the real world, indeed.

Paul
 
C

Chris Mellon

This is a little confusing because google groups does not show your
original post (not uncommon for them to lose a post in a thread - but
somehow still reflect the fact that it exists in the total-posts
number that they display) that you are replying to.



I am confused about the benefits/disadvantages of the "GIL removal".
Is it correct that the GIL is preventing CPython from having threads?

No. Python has threads, and they're wrappers around true OS level
system threads. What the GIL does is prevent *Python* code in those
threads from running concurrently.
Is it correct that the only issue with the GIL is the prevention of
being able to do multi-threading?

This sentence doesn't parse in a way that makes sense.
If you only planned on writing single-threaded applications would GIL-
removal have no benefit?

Yes.

Can threading have a performance benefit on a single-core machine
versus running multiple processes?

A simple question with a complicated answer. With the qualifier "can",
I have to say yes to be honest although you will only see absolute
performance increases on a single core from special purposed APIs that
call into C code anyway - and the GIL doesn't effect those so GIL
removal won't have an effect on the scalability of those operations.

Pure CPU bound threads (all pure Python code) will not increase
performance on a single core (there's CPU level concurrency that can,
but not OS level threads). You can improve *perceived* performance
this way (latency at the expense of throughput), but not raw
performance.

Very, very few operations are CPU bound these days, and even fewer of
the ones were Python is involved. The largest benefits to the desktop
user of multiple cores are the increase in cross-process performance
(multitasking), not single applications.

Servers vary more widely. However, in general, there's not a huge
benefit to faster threading when you can use multiple processes
instead. Python is not especially fast in terms of pure CPU time, so
if you're CPU bound anyway moving your CPU bound code into C (or
something else) is likely to reap far more benefits - and sidestepping
the GIL in the process.

In short, I think any problem that would be directly addressed by
removing the GIL is better addressed by other solutions.
I guess I gotta go with Steven D'Aprano - both true and false
depending on your situation.


OK, I see that.


OK, seems reasonable, assuming that multi-threading cannot be
implemented without a performance hit on single-threaded applications.
Is that a computer science maxim - giving an interpreted language
multi-threading will always negatively impact the performance of
single-threaded applications?

It's not a maxim, per se - it's possible to have lockless concurrency,
although when you do this it's more like the shared nothing process
approach - but in general, yes. The cost of threading is the cost of
the locking needed to ensure safety, and the amount of locking is
proportional to the amount of shared state. Most of the common uses of
threading in the real world do not improve absolute performance and
won't no matter how many cores you use.
 
G

Grant Edwards

Is the only point in getting rid of the GIL to allow multi-threaded
applications?

That's the main point.
Can't multiple threads also provide a performance boost versus
multiple processes on a single-core machine?

That depends on the algorithm, the code, and the
synchronization requirements.
OK, have to agree. Sounds like it could be a good candidate
for a fork. One question - is it a computer science maxim that
an interpreter that implements multi-threading will always be
slower when running single threaded apps?

I presume you're referring to Amdahl's law.

http://en.wikipedia.org/wiki/Amdahl's_law

Remember there are reasons other than speed on a
multi-processor platorm for wanting to do multi-threading.
Sometimes it just maps onto the application better than
a single-threaded solution.
 
P

Paul Rubin

Steven D'Aprano said:
That's why your "comparatively wimpy site" preferred to throw extra web
servers at the job of serving webpages rather than investing in smarter,
harder-working programmers to pull the last skerricks of performance out
of the hardware you already had.

The compute intensive stuff (image rendering and crunching) has
already had most of those skerricks pulled out. It is written in C
and assembler (not by us). Only a small part of our stuff is written
in Python: it just happens to be the part I'm involved with.
But Python speed ups don't come for free. For instance, I'd *really*
object if Python ran twice as fast for users with a quad-core CPU, but
twice as slow for users like me with only a dual-core CPU.

Hmm. Well if the tradeoff were selectable at python configuration
time, then this option would certainly be worth doing. You might not
have a 4-core cpu today but you WILL have one soon.
What on earth makes you think that would be anything more than a
temporary, VERY temporary, shutdown? My prediction is that the last of
the machines wouldn't have even been unplugged

Of course that example was a reductio ad absurdum. In reality they'd
use the speedup to compute 2x as much stuff, rather than ever powering
any servers down. Getting the extra computation is more valuable than
saving the electricity. It's just easier to put a dollar value on
electricity than on computation in an example like this. It's also
the case for our specfiic site that our server cluster is in large
part a disk farm and not just a compute farm, so even if we sped up
the software infinitely we'd still need a lot of boxes to bolt the
disks into and keep them spinning.
Now there's a thought... given that Google:

(1) has lots of money;
(2) uses Python a lot;
(3) already employs both Guido and (I think...) Alex Martelli and
possibly other Python gurus;
(4) is not shy in investing in Open Source projects;
(5) and most importantly uses technologies that need to be used across
multiple processors and multiple machines

one wonders if Google's opinion of where core Python development needs to
go is the same as your opinion?

I think Google's approach has been to do cpu-intensive tasks in other
languages, primarily C++. It would still be great if they put some
funding into PyPy development, since I think I saw something about the
EU funding being interrupted.
 
P

Paul Rubin

Chris Mellon said:
No. Python has threads, and they're wrappers around true OS level
system threads. What the GIL does is prevent *Python* code in those
threads from running concurrently.

Well, C libraries can release the GIL if they are written for thread
safety, but as far as I know, most don't release it. For example I
don't think cElementTree releases the GIL, and it's a huge CPU
consumer in some of the code I run, despite being written in C pretty
carefully. Also, many of the most basic builtin types (such as dicts)
are implemented in C and don't release the GIL.
Very, very few operations are CPU bound these days, and even fewer of
the ones were Python is involved. The largest benefits to the desktop
user of multiple cores are the increase in cross-process performance
(multitasking), not single applications.

If you add up all the CPU cycles being used by Python everywhere,
I wonder how many of them are on desktops and how many are on servers.
Python is not especially fast in terms of pure CPU time, so
if you're CPU bound anyway moving your CPU bound code into C (or
something else) is likely to reap far more benefits - and sidestepping
the GIL in the process.

If moving code into C is so easy, why not move all the code there
instead of just the CPU-bound code? Really, coding in C adds a huge
cost in complexity and unreliability. Python makes life a lot better
for developers, and so reimplementing Python code in C should be seen
as a difficult desperation measure rather than an easy way to get
speedups. Therefore, Python's slowness is a serious weakness and not
just a wart with an easy workaround.
In short, I think any problem that would be directly addressed by
removing the GIL is better addressed by other solutions.

It does sound like removing the GIL from CPython would have very high
costs in more than one area. Is my hope that Python will transition
from CPython to PyPy overoptimistic?
 
C

Chris Mellon

The compute intensive stuff (image rendering and crunching) has
already had most of those skerricks pulled out. It is written in C
and assembler (not by us). Only a small part of our stuff is written
in Python: it just happens to be the part I'm involved with.

That means that this part is also unaffected by the GIL.
Hmm. Well if the tradeoff were selectable at python configuration
time, then this option would certainly be worth doing. You might not
have a 4-core cpu today but you WILL have one soon.


Of course that example was a reductio ad absurdum. In reality they'd
use the speedup to compute 2x as much stuff, rather than ever powering
any servers down. Getting the extra computation is more valuable than
saving the electricity. It's just easier to put a dollar value on
electricity than on computation in an example like this. It's also
the case for our specfiic site that our server cluster is in large
part a disk farm and not just a compute farm, so even if we sped up
the software infinitely we'd still need a lot of boxes to bolt the
disks into and keep them spinning.

I think this is instructive, because it's pretty typical of GIL
complaints. Someone gives an example where the GIL is limited, but
upon inspection it turns out that the actual bottleneck is elsewhere,
that the GIL is being sidestepped anyway, and that the supposed
benefits of removing the GIL wouldn't materialize because the problem
space isn't really as described.
I think Google's approach has been to do cpu-intensive tasks in other
languages, primarily C++. It would still be great if they put some
funding into PyPy development, since I think I saw something about the
EU funding being interrupted.
--

At the really high levels of scalability, such as across a server
farm, threading is useless. The entire point of threads, rather than
processes, is that you've got shared, mutable state. A shared nothing
process (or Actor, if you will) model is the only one that makes sense
if you really want to scale because it's the only one that allows you
to distribute over machines. The fact that it also scales very well
over multiple cores (better than threads, in many cases) is just
gravy.

The only hard example I've seen given of the GIL actually limiting
scalability is on single server, high volume Django sites, and I don't
think that the architecture of those sites is very scalable anyway.
 
P

Paul Rubin

Chris Mellon said:
That means that this part is also unaffected by the GIL.

Right, it was a counterexample against the "speed doesn't matter"
meme, not specifically against the GIL. And that code is fast because
someone undertook comparatively enormous effort to code it in messy,
unsafe languages instead of Python, because Python is so slow.
At the really high levels of scalability, such as across a server
farm, threading is useless. The entire point of threads, rather than
processes, is that you've got shared, mutable state. A shared nothing
process (or Actor, if you will) model is the only one that makes sense
if you really want to scale because it's the only one that allows you
to distribute over machines. The fact that it also scales very well
over multiple cores (better than threads, in many cases) is just
gravy.

In reality you want to organize the problem so that memory intensive
stuff is kept local, and that's where you want threads, to avoid the
communications costs of serializing stuff between processes, either
between boxes or between cores. If communications costs could be
ignored there would be no need for gigabytes of ram in computers.
We'd just use disks for everything. As it is, we use tons of ram,
most of which is usually twiddling its thumbs doing nothing (as DJ
Bernstein put it) because the cpu isn't addressing it at that instant.
The memory just sits there waiting for the cpu to access it. We
actually can get better-than-linear speedups by designing the hardware
to avoid this. See:
http://cr.yp.to/snuffle/bruteforce-20050425.pdf
for an example.
The only hard example I've seen given of the GIL actually limiting
scalability is on single server, high volume Django sites, and I don't
think that the architecture of those sites is very scalable anyway.

The stuff I'm doing now happens to work ok with multiple processes but
would have been easier to write with threads.
 
T

Terry Reedy

| funding into PyPy development, since I think I saw something about the
| EU funding being interrupted.

As far as I know, the project was completed and promised funds paid. But I
don't know of any major follow-on funding, which I am sure they could use.
 
T

Terry Reedy

| It does sound like removing the GIL from CPython would have very high
| costs in more than one area. Is my hope that Python will transition
| from CPython to PyPy overoptimistic?

I presume you mean 'will the leading edge reference version transition...
Or more plainly, "will Guido switch to PyPy for further development of
Python?" I once thought so, but 1) Google sped the arrival of Py3.0 by
hiring Guido with a major chunk of time devoted to Python development, so
he started before PyPy was even remotely ready (and it still is not); and
2) PyPy did not focus only or specifically on being a CPython replacement
but became an umbrella for a variety of experiment (including, for
instance, a Scheme frontend).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top