Python's biggest compromises

Robin Becker · Aug 6, 2003

writes
.....

(Yes, there are issues with Python on SMP machines, but to call Python's
built-in threading "non-existent SMP scalability" is either a lie or
revelatory of near-complete ignorance. That doesn't even count the
various IPC mechanisms.)

I'm not an expert, but the various grid computation schemes seem to
prefer either java or c/c++, I suspect that those schemes aren't really
using threads in main, after all they seem to be running between
machines in different parts of the world even. I suspect Python would be
in better shape if we could migrate threads or tasklets from one
processor to another.

I believe pyro can almost do that, but I haven't tried it.

Syver Enstad · Aug 6, 2003

(Yes, there are issues with Python on SMP machines, but to call
Python's built-in threading "non-existent SMP scalability" is either
a lie or revelatory of near-complete ignorance. That doesn't even
count the various IPC mechanisms.)

It's an interesting subject though. How does python threading on SMP
machines compare with f.ex. Java and C++. I know that at least the
MSVC compiler has a GIL like problem with heap access (new, malloc,
delete, free), which is guarded with a global lock.

Would migrating the global data for a thread to some sort of thread
local storage help Python SMP performance? If Java has better
threading performance than Python how have they solved the interpreter
state problem. Java is interpreted isn't it?

enoch · Aug 6, 2003

Would you care to back up your claim with some actual evidence?

(Yes, there are issues with Python on SMP machines, but to call Python's
built-in threading "non-existent SMP scalability" is either a lie or
revelatory of near-complete ignorance.

Ok, I confess, the term you cited might be little bit exaggerated. But
there's no need to get personal. I'm surely not a liar (w.r.t. to this
thread, everything else is not a matter of public concern

). The
ignorance part, well, we can talk about that ...

That doesn't even count the various IPC mechanisms.)

Correct me if I'm wrong, but I don't think any form of IPC is a
measurement of scalability of something like the python interpreter.

Here are some sources which show that I'm not alone with my assessment
that python has deficiencies w.r.t. SMP systems:

http://www.python.org/pycon/papers/deferex/
"""
It is optimal, however, to avoid requiring threads for any part of a
framework. Threading has a significant cost, especially in Python. The
global interpreter lock destroys any performance benefit that
threading may yield on SMP systems, [...]
"""

http://groups.google.com/groups?hl=...-8&safe=off&[email protected]
(note the author of that post)
"""

My project will be running on an SMP box and requires scalability.
However, my test shows that Python threading has very poor performance
in terms of scaling. In fact it doesn't scale at all.

That's true for pure Python code.
"""

I'm aware that you know quite well about these facts, so I'll leave it
at that. But let me just add one more link which maybe you don't know:

http://www.zope.org/Members/glpb/solaris/multiproc

"""
Well, in worst case, it can actually give you performance UNDER 1X.
The latency switching the GIL between CPUs comes right off your
ability to do work in a quanta. If you have a 1 gigahertz machine
capable of doing 12,000 pystones of work, and it takes 50 milliseconds
to switch the GIL(I dont know how long it takes, this is an example)
you would lose 5% of your peak performance for *EACH* GIL switch.
Setting sys.setchechinterval(240) will still yield the GIL 50 times a
second. If the GIL actually migrates only 10% of the time its
released, that would 50 * .1 * 5% = 25% performance loss. The cost
to switch the GIL is going to vary, but will probably range between .1
and .9 time quantas (scheduler time intervals) and a typical time
quanta is 5 to 10ms.
[...]
However, I have directly observed a 30% penalty under MP constraints
when the sys.setcheckinterval value was too low (and there was too
much GIL thrashing).
"""

So, although python is capable of taking advantage of SMP systems
under certain circumstances (I/O bound systems etc. etc.), there are
real world situations where python's performance is _hurt_ by running
on a SMP system.
Btw. I think even IPC might not help you there, because the different
processes might bounce betweeen CPUs, so only processor binding might
help.

I did quite a bit of googling on this problem - several times -
because I'm selling zope solutions. Sometimes, the client wants to run
the solution on an existing SMP system, and worse, the system has to
fulfill some performance requirements. Then I have the problem of
explaining to him that his admins need to undertake some special tasks
in order for zope to be able to exploit the multiple procs in his
system.

Aazh, I'm lurking this newsgroup since approx. 3 years, so I know who
you are. You have participated in nearly any discussion about threads,
I know your slides, and there's no doubt that you have forgotten more
about this subject than I'll never know.

Aahz · Aug 7, 2003

It's an interesting subject though. How does python threading on SMP
machines compare with f.ex. Java and C++. I know that at least the
MSVC compiler has a GIL like problem with heap access (new, malloc,
delete, free), which is guarded with a global lock.

Sure, but that's not where a C++ application usually spends its time.

Would migrating the global data for a thread to some sort of thread
local storage help Python SMP performance? If Java has better
threading performance than Python how have they solved the interpreter
state problem. Java is interpreted isn't it?

Well, that's a good question. *Does* Java have better threading
performance than Python? If it does, to what extent is that performance
bought at the cost of complexity for the programmer?

Keep in mind that the GIL exists not because of issues with thread-local
storage but because every Python object is global and can have bindings
to it in any -- or every -- thread. Python uses objects *everywhere*;
the GC uses Python objects, stack frames are Python objects, modules are
Python objects. To create "thread-local" storage as you suggest would
require a wholesale revision of Python's object model that would make it
something other than what Python is today.

Based on recent discussions about restricted execution, I suspect that
security would be much more likely to drive such changes; if that
happens, perhaps revisiting the way GIL works might happen with it.

enoch · Aug 7, 2003

<snip>
Since, as you say, you've done some research, that's why I flamed you.
There's just no call for making such an overstated claim -- it is *NOT*
"a little bit exaggerated".

Well, I based this phrase on the fact that while under some
circumstances (e.g. your web spider) python does scale somewhat, under
others (e.g. zope) it may perform even worse on a SMP system. If you
sum these two facts up ...

That I won't argue. But Python's approach also has some benefits even
on SMP systems. And if you choose a multi-process approach, the same
advantages that accrue to Python's approach on a single-CPU box apply
just as much to an SMP system.

Yes, and these advantages also include a simpler threading model, as
far as I understand it, on every system. It's a compromise, that's why
I posted in this thread.

http://www.python.org/pycon/papers/deferex/
"""
It is optimal, however, to avoid requiring threads for any part of a
framework. Threading has a significant cost, especially in Python. The
global interpreter lock destroys any performance benefit that
threading may yield on SMP systems, [...]
"""

Click to expand...

Just because it's a published PyCon paper doesn't mean that it's correct.
The multi-threaded spider that I use as my example is a toy version of a
spider that was used on an SMP box. (That's why I became a threading
expert in the first place -- Tim Peters probably remembers me pestering
him with questions four years ago. ;-) I guarantee you that SMP made
that spider much faster.

But how big is the significance of software which has the same
characteristics as your web spider example versus application servers?

Absolutely. But that's true of any system with threading that isn't
designed and tuned for the needs of a specific application. Python
trades performance in some situations for a clean and simple model of
threading.

Again, the compromise we were talking about. I'm not in a position to
weigh the pros and cons of it against each other, but I think I can
point out some cons of the current approach. I'm not doing that to
spread FUD, but to give an outsiders perspective on what I think might
hurt python in the future, and I want python to thrive because I like
using it alot.

My understanding that most OSes are designed to avoid this; I'd be
interested in seeing some information if I'm wrong. In any event, I do
know that IPC speeds things up in real-world applications on SMP boxes.

For example, there are always lots of discussions about CPU affinity
on linux-kernel, and it seems to be a hard problem. Hyperthreading and
other non-symmetric architectures make this problem even harder.
Add to that the problem of the GIL getting shuffled around and you
have a system where you'll have trouble to predict the performance
characteristics. Admins don't like that. Though, it's not like there
are no problems without the GIL, it just adds to the complication.

Even if Zope is the 800-pound gorilla of the Python world, Python isn't
going to change just for Zope. If you want to talk about ways of
improving Zope's performance on SMP boxes, I'll be glad to contribute
what I can. But spreading false information isn't the way to get me
interested.

I wasn't even aware that zope is the "800-pound gorilla" of the python
world. I used it just as an example for a typical larger server app,
because, well, I know it.
incidentally, the pycon paper above, which you seem to dismiss as
false, is also from a guy which is working on a larger server app.
Maybe there's a pattern?

Keep in mind that one reason IPC has gained popularity is because it
scales more than threading does, in the end. Blade servers are cheaper
than big SMP boxes, and IPC works across multiple computers.

Allow me some comment of the nature of this discussion (python and SMP
in general, not just this thread). I've seen it before and the
ingredients are:

- a major open source project
- developers which love this project
- some "outsider" which points out some perceived deficiency of said
project
- said developers pointing out (rightly or wrongly) reasons why this
deficiency doesn't matter, or that there are other (better) ways for
the "outsider" to achieve what he wants

In most cases this discussion then develops in to a big fat flamewar

.

Two examples are linux and its threading capabilities, and mysql and
ACID compliancy.
A nice quote from the linux discussion btw. was from Alan Cox:

"A Computer is a state machine. Threads are for people who can't
program state machines."

But today, linux' thread support is magnitudes better than it was.

You wrote in another message in this thread:

Well, that's a good question. *Does* Java have better threading
performance than Python? If it does, to what extent is that performance
bought at the cost of complexity for the programmer?

While I can't comment on the second question, here's an article which
sheds some light on the SMP scalability of an older java JDK, the meat
is on the third page:
http://www.javaworld.com/javaworld/jw-08-2000/jw-0811-threadscale.html

Seems that java does indeed have better threading performance than
python.

Robin Becker · Aug 11, 2003

Irmen de Jong said:
Robin Becker wrote:
......

Could you please elaborate on this a bit?
What exactly did you have in mind when talking about
"migrating threads or tasklets" ?

Well I had in mind the grid concept, which I believe implies the
distribution of code to multiple nodes and then the ability to execute
on them (I suppose that includes re-sending data to already distributed
instances).

I imagine that a proper grid would allow reloading of modules as the
overall application requires, but that would be relatively trivial if we
could capture 'execution state'.

Moving a running thread to another process would be fairly hard I
imagine, but I guess that's what we want for load balancing etc.

python-dev summary for 2005-07-01 to 2005-07-15	1	Jul 31, 2005
anybody help me	1	Feb 10, 2006
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Mar 1, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Jan 12, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Dec 15, 2007
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Nov 1, 2007
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Aug 1, 2007
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006

Python's biggest compromises

Robin Becker

Syver Enstad

enoch

Aahz

enoch

Robin Becker

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads