2.6, 3.0, and truly independent intepreters

Andy · Oct 22, 2008

Dear Python dev community,

I'm CTO at a small software company that makes music visualization
software (you can check us out at www.soundspectrum.com). About two
years ago we went with decision to use embedded python in a couple of
our new products, given all the great things about python. We were
close to using lua but for various reasons we decided to go with
python. However, over the last two years, there's been one area of
grief that sometimes makes me think twice about our decision to go
with python...

Some background first... Our software is used for entertainment and
centers around real time, high-performance graphics, so python's
performance, embedded flexibility, and stability are the most
important issues for us. Our software targets a large cross section
of hardware and we currently ship products for Win32, OS X, and the
iPhone and since our customers are end users, our products have to be
robust, have a tidy install footprint, and be foolproof. Basically,
we use embedded python and use it to wrap our high performance C++
class set which wraps OpenGL, DirectX and our own software renderer.
In addition to wrapping our C++ frameworks, we use python to perform
various "worker" tasks on worker thread (e.g. image loading and
processing). However, we require *true* thread/interpreter
independence so python 2 has been frustrating at time, to say the
least. Please don't start with "but really, python supports multiple
interpreters" because I've been there many many times with people.
And, yes, I'm aware of the multiprocessing module added in 2.6, but
that stuff isn't lightweight and isn't suitable at all for many
environments (including ours). The bottom line is that if you want to
perform independent processing (in python) on different threads, using
the machine's multiple cores to the fullest, then you're out of luck
under python 2.

Sadly, the only way we could get truly independent interpreters was to
put python in a dynamic library, have our installer make a *duplicate*
copy of it during the installation process (e.g. python.dll/.bundle ->
python2.dll/.bundle) and load each one explicitly in our app, so we
can get truly independent interpreters. In other words, we load a
fresh dynamic lib for each thread-independent interpreter (you can't
reuse the same dynamic library because the OS will just reference the
already-loaded one).

From what I gather from the python community, the basis for not
offering "real" muti-threaded support is that it'd add to much
internal overhead--and I couldn't agree more. As a high performance C
and C++ guy, I fully agree that thread safety should be at the high
level, not at the low level. BUT, the lack of truly independent
interpreters is what ultimately prevents using python in cool,
powerful ways. This shortcoming alone has caused game developers--
both large and small--to choose other embedded interpreters over
python (e.g. Blizzard chose lua over python). For example, Apple's
QuickTime API is powerful in that high-level instance objects can
leverage performance gains associated with multi-threaded processing.
Meanwhile, the QuickTime API simply lists the responsibilities of the
caller regarding thread safety and that's all its needs to do. In
other words, CPython doesn't need to step in an provide a threadsafe
environment; it just needs to establish the rules and make sure that
its own implementation supports those rules.

More than once, I had actually considered expending company resources
to develop a high performance, truly independent interpreter
implementation of the python core language and modules but in the end
estimated that the size of that project would just be too much, given
our company's current resources. Should such an implementation ever
be developed, it would be very attractive for companies to support,
fund, and/or license. The truth is, we just love python as a
language, but it's lack of true interpreter independence (in a
interpreter as well as in a thread sense) remains a *huge* liability.

So, my question becomes: is python 3 ready for true multithreaded
support?? Can we finally abandon our Frankenstein approach of loading
multiple identical dynamic libs to achieve truly independent
interpreters?? I've reviewed all the new python 3 C API module stuff,
and all I have to say is: whew--better late then never!! So, although
that solves modules offering truly independent interpreter support,
the following questions remain:

- In python 3, the C module API now supports true interpreter
independence, but have all the modules in the python codebase been
converted over? Are they all now truly compliant? It will only take
a single static/global state variable in a module to potentially cause
no end of pain in a multiple interpreter environment! Yikes!

- How close is python 3 really to true multithreaded use? The
assumption here is that caller ensures safety (e.g. ensuring that
neither interpreter is in use when serializing data from one to
another).

I believe that true python independent thread/interpreter support is
paramount and should become the top priority because this is the key
consideration used by developers when they're deciding which
interpreter to embed in their app. Until there's a hello world that
demonstrates running independent python interpreters on multiple app
threads, lua will remain the clear choice over python. Python 3 needs
true interpreter independence and multi-threaded support!

Thanks,
Andy O'Meara

Thomas Heller · Oct 22, 2008

Andy said:
Dear Python dev community,

[...] Basically,
we use embedded python and use it to wrap our high performance C++
class set which wraps OpenGL, DirectX and our own software renderer.
In addition to wrapping our C++ frameworks, we use python to perform
various "worker" tasks on worker thread (e.g. image loading and
processing). However, we require *true* thread/interpreter
independence so python 2 has been frustrating at time, to say the
least. [...]

Sadly, the only way we could get truly independent interpreters was to
put python in a dynamic library, have our installer make a *duplicate*
copy of it during the installation process (e.g. python.dll/.bundle ->
python2.dll/.bundle) and load each one explicitly in our app, so we
can get truly independent interpreters. In other words, we load a
fresh dynamic lib for each thread-independent interpreter (you can't
reuse the same dynamic library because the OS will just reference the
already-loaded one).

Interesting questions you ask.

A random note: py2exe also does something similar for executables build
with the 'bundle = 1' option. The python.dll and .pyd extension modules
in this case are not loaded into the process in the 'normal' way (with
some kind of windows LoadLibrary() call, instead they are loaded by code
in py2exe that /emulates/ LoadLibrary - the code segments are loaded into
memory, fixups are made for imported functions, and marked executable.

The result is that separate COM objects implemented as Python modules and
converted into separate dlls by py2exe do not share their interpreters even
if they are running in the same process. Of course this only works on windows.
In effect this is similar to using /statically/ linked python interpreters
in separate dlls. Can't you do something like that?

So, my question becomes: is python 3 ready for true multithreaded
support?? Can we finally abandon our Frankenstein approach of loading
multiple identical dynamic libs to achieve truly independent
interpreters?? I've reviewed all the new python 3 C API module stuff,
and all I have to say is: whew--better late then never!! So, although
that solves modules offering truly independent interpreter support,
the following questions remain:

- In python 3, the C module API now supports true interpreter
independence, but have all the modules in the python codebase been
converted over? Are they all now truly compliant? It will only take
a single static/global state variable in a module to potentially cause
no end of pain in a multiple interpreter environment! Yikes!

I don't think this is the case (currently). But you could submit patches
to Python so that at least the 'official' modules (builtin and extensions)
would behave corectly in the case of multiple interpreters. At least
this is a much lighter task than writing your own GIL-less interpreter.

My 2 cents,

Thomas

Martin v. Löwis · Oct 22, 2008

- In python 3, the C module API now supports true interpreter

independence, but have all the modules in the python codebase been
converted over?

No, none of them.

Are they all now truly compliant? It will only take
a single static/global state variable in a module to potentially cause
no end of pain in a multiple interpreter environment! Yikes!

So you will have to suffer pain.

- How close is python 3 really to true multithreaded use?

Python is as thread-safe as ever (i.e. completely thread-safe).

I believe that true python independent thread/interpreter support is
paramount and should become the top priority because this is the key
consideration used by developers when they're deciding which
interpreter to embed in their app. Until there's a hello world that
demonstrates running independent python interpreters on multiple app
threads, lua will remain the clear choice over python. Python 3 needs
true interpreter independence and multi-threaded support!

So what patches to achieve that goal have you contributed so far?

In open source, pleas have nearly zero effect; code contributions is
what has effect.

I don't think any of the current committers has a significant interest
in supporting multiple interpreters (and I say that as the one who wrote
and implemented PEP 3121). To make a significant change, you need to
start with a PEP, offer to implement it once accepted, and offer to
maintain the feature for five years.

Regards,
Martin

Andy · Oct 22, 2008

Hi Thomas -

I appreciate your thoughts and time on this subject.

The result is that separate COM objects implemented as Python modules and
converted into separate dlls by py2exe do not share their interpreters even
if they are running in the same process. Of course this only works on windows.
In effect this is similar to using /statically/ linked python interpreters
in separate dlls. Can't you do something like that?

You're definitely correct that homebrew loading and linking would do
the trick. However, because our python stuff makes callbacks into our
C/C++, that complicates the linking process (if I understand you
correctly). Also, then there's the problem of OS X.

I don't think this is the case (currently). But you could submit patches
to Python so that at least the 'official' modules (builtin and extensions)
would behave corectly in the case of multiple interpreters. At least
this is a much lighter task than writing your own GIL-less interpreter.

I agree -- and I've been considering that (or rather, having our
company hire/pay part of the python dev community to do the work). To
consider that, the question becomes, how many modules are we talking
about do you think? 10? 100? I confess that I'm no familiar enough
with the full C python suite to have a good idea of how much work
we're talking about here.

Regards,
Andy

Andy · Oct 22, 2008

No, none of them.
:^)

Python is as thread-safe as ever (i.e. completely thread-safe).

If you're referring to the fact that the GIL does that, then you're
certainly correct. But if you've got multiple CPUs/cores and actually
want to use them, that GIL means you might as well forget about them.
So please take my use of "true multithreaded" to mean "turning off"
the GIL and push the responsibility of object safety to the client/API
level (such as in my QuickTime API example).

So what patches to achieve that goal have you contributed so far?

In open source, pleas have nearly zero effect; code contributions is
what has effect.

This is just my second email, please be a little patient. :^) But
more seriously, I do represent a company ready, able, and willing to
fund the development of features that we're looking for, so please
understand that I'm definitely not coming to the table empty-handed
here.

I don't think any of the current committers has a significant interest
in supporting multiple interpreters (and I say that as the one who wrote
and implemented PEP 3121). To make a significant change, you need to
start with a PEP, offer to implement it once accepted, and offer to
maintain the feature for five years.

Nice to meet you! :^) Seriously though, thank you for all your work on
3121 and taking the initiative with it! It's definitely the first
step in what companies like ours attract us to embedded an interpreted
language. Specifically: unrestricted interpreter and thread-
independent use.

I would *love* for our company to be 10 times larger and be able to
add another zero to what we'd be able to hire/offer the python dev
community for work that we're looking for, but we unfortunately have
limits at the moment. And I would love to see python become the
leading choice when companies look to use an embedded interpreter, and
I offer my comments here to paint a picture of what can make python
more appealing to commercial software developers. Hopefully, the
python dev community doesn't underestimate the dev funding that could
potentially come in from companies if python grew in certain ways!

So, that said, I represent a company willing to fund the development
of features that move python towards thread-independent operation. No
software engineer can deny that we're entering a new era of
multithreaded processing where support frameworks (such as python)
need to be open minded with how they're used in a multi-threaded
environment--that's all I'm saying here.

Anyway, I can definitely tell you and anyone else interested that
we're willing to put our money where our wish-list is. As I mentioned
in my previous post to Thomas, the next step is to get an
understanding of the options available that will satisfy our needs.
We have a budget for this, but it's not astronomical (it's driven by
the cost associated with dropping python and going with lua--or,
making our own pared-down interpreter implementation). Please let me
be clear--I love python (as a language) and I don't want to switch.
BUT, we have to be able to run interpreters in different threads (and
get unhindered/full CPU core performance--ie. no GIL).

Thoughts? Also, please feel free to email me off-list if you prefer.

Oh, while I'm at it, if anyone in the python dev community (or anyone
that has put real work into python) is interested in our software,
email me and I'll hook you up with a complimentary copy of the
products that use python (music visuals for iTunes and WMP).

Regards,
Andy

Martin v. Löwis · Oct 22, 2008

I would *love* for our company to be 10 times larger and be able to

add another zero to what we'd be able to hire/offer the python dev
community for work that we're looking for, but we unfortunately have
limits at the moment.

There is another thing about open source that you need to consider:
you don't have to do it all on your own.

It needs somebody to take the lead, start a project, define a plan,
and small steps to approach it. If it's really something that the
community desperately needs, and if you make it clear that you will
just lead, but get nowhere without contributions, then the
contributions will come in.

If there won't be any contributions, then the itch in the the
community isn't that strong that it needs scratching.

Regards,
Martin

Terry Reedy · Oct 22, 2008

Andy said:
I agree -- and I've been considering that (or rather, having our
company hire/pay part of the python dev community to do the work). To
consider that, the question becomes, how many modules are we talking
about do you think? 10? 100?

In your Python directory, everything in Lib is Python, I believe.
Everything in DLLs is compiled C extensions. I see about 15 for Windows
3.0. These reflect two separate directories in the source tree. Builtin
classes are part of pythonxx.dll in the main directory. I have no idea
if things such as lists (from listobject.c), for instance, are a
potential problem for you.

You could start with the module of most interest to you, or perhaps a
small one, and see if it needs patching (from your viewpoint) and how
much effort it would take to meet your needs.

Terry Jan Reedy

Jesse Noller · Oct 22, 2008

And, yes, I'm aware of the multiprocessing module added in 2.6, but
that stuff isn't lightweight and isn't suitable at all for many
environments (including ours). The bottom line is that if you want to
perform independent processing (in python) on different threads, using
the machine's multiple cores to the fullest, then you're out of luck
under python 2.

So, as the guy-on-the-hook for multiprocessing, I'd like to know what
you might suggest for it to make it more apt for your - and other
environments.

Additionally, have you looked at:
https://launchpad.net/python-safethread
http://code.google.com/p/python-safethread/w/list
(By Adam olsen)

-jesse

Terry Reedy · Oct 22, 2008

Andy said:
This is just my second email, please be a little patient. :^)

As a 10-year veteran, I welcome new contributors with new viewpoints and
information.

more appealing to commercial software developers. Hopefully, the
python dev community doesn't underestimate the dev funding that could
potentially come in from companies if python grew in certain ways!

This seems to be something of a chicken-and-egg problem.

So, that said, I represent a company willing to fund the development
of features that move python towards thread-independent operation.

Perhaps you know of and can persuade other companies to contribute to
such focused effort.

No
software engineer can deny that we're entering a new era of
multithreaded processing where support frameworks (such as python)
need to be open minded with how they're used in a multi-threaded
environment--that's all I'm saying here.

The *current* developers seem to be more interested in exploiting
multiple processors with multiprocessing. Note that Google choose that
route for Chrome (as I understood their comic introduction). 2.6 and 3.0
come with a new multiprocessing module that mimics the threading module
api fairly closely. It is now being backported to run with 2.5 and 2.4.

Advances in multithreading will probably require new ideas and
development energy.

Terry Jan Reedy

Jesse Noller · Oct 22, 2008

The *current* developers seem to be more interested in exploiting multiple
processors with multiprocessing. Note that Google choose that route for
Chrome (as I understood their comic introduction). 2.6 and 3.0 come with a
new multiprocessing module that mimics the threading module api fairly
closely. It is now being backported to run with 2.5 and 2.4.

That's not exactly correct. Multiprocessing was added to 2.6 and 3.0
as a *additional* method for parallel/concurrent programming that
allows you to use multiple cores - however, as I noted in the PEP:

" In the future, the package might not be as relevant should the
CPython interpreter enable "true" threading, however for some
applications, forking an OS process may sometimes be more
desirable than using lightweight threads, especially on those
platforms where process creation is fast and optimized."

Multiprocessing is not a replacement for a "free threading" future
(ergo my mentioning Adam Olsen's work) - it is a tool in the
"batteries included" box. I don't want my cheerleading and driving of
this to somehow implicate that the rest of Python-Dev thinks this is
the "silver bullet" or final answer in concurrency.

However, a free-threaded python has a lot of implications, and if we
were to do it, it requires we not only "drop" the GIL - it also
requires we consider the ramifications of enabling true threading ala
Java et al - just having "true threads" lying around is great if
you've spent a ton of time learning locking, avoiding shared data/etc,
stepping through and cursing poor debugger support for multiple
threads, etc.

This is why I've been a fan of Adam's approach - enabling free
threading via GIL removal is actually secondary to the project's
stated goal: Enable Safe Threading.

In any case, I've jumped the rails - let's just say there's room in
python for multiprocessing, threading and possible a concurrent
package ala java.util.concurrent - but it really does have to be
thought out and done right.

Speaking of which: If you wanted "real" threads, you could use a
combination of JCC (http://pypi.python.org/pypi/JCC/) and Jython.

-jesse

Rhamphoryncus · Oct 22, 2008

Dear Python dev community,

I'm CTO at a small software company that makes music visualization
software (you can check us out atwww.soundspectrum.com). About two
years ago we went with decision to use embedded python in a couple of
our new products, given all the great things about python. We were
close to using lua but for various reasons we decided to go with
python. However, over the last two years, there's been one area of
grief that sometimes makes me think twice about our decision to go
with python...

Some background first... Our software is used for entertainment and
centers around real time, high-performance graphics, so python's
performance, embedded flexibility, and stability are the most
important issues for us. Our software targets a large cross section
of hardware and we currently ship products for Win32, OS X, and the
iPhone and since our customers are end users, our products have to be
robust, have a tidy install footprint, and be foolproof. Basically,
we use embedded python and use it to wrap our high performance C++
class set which wraps OpenGL, DirectX and our own software renderer.
In addition to wrapping our C++ frameworks, we use python to perform
various "worker" tasks on worker thread (e.g. image loading andprocessing). However, we require *true* thread/interpreter
independence so python 2 has been frustrating at time, to say the
least. Please don't start with "but really, python supports multiple
interpreters" because I've been there many many times with people.
And, yes, I'm aware of the multiprocessing module added in 2.6, but
that stuff isn't lightweight and isn't suitable at all for many
environments (including ours). The bottom line is that if you want to
perform independentprocessing (in python) on different threads, using
the machine's multiple cores to the fullest, then you're out of luck
under python 2.

Sadly, the only way we could get truly independent interpreters was to
put python in a dynamic library, have our installer make a *duplicate*
copy of it during the installationprocess(e.g. python.dll/.bundle ->
python2.dll/.bundle) and load each one explicitly in our app, so we
can get truly independent interpreters. In other words, we load a
fresh dynamic lib for each thread-independent interpreter (you can't
reuse the same dynamic library because the OS will just reference the
already-loaded one).

From what I gather from the python community, the basis for not
offering "real" muti-threaded support is that it'd add to much
internal overhead--and I couldn't agree more. As a high performance C
and C++ guy, I fully agree that thread safety should be at the high
level, not at the low level. BUT, the lack of truly independent
interpreters is what ultimately prevents using python in cool,
powerful ways. This shortcoming alone has caused game developers--
both large and small--to choose other embedded interpreters over
python (e.g. Blizzard chose lua over python). For example, Apple's
QuickTime API is powerful in that high-level instance objects can
leverage performance gains associated with multi-threadedprocessing.
Meanwhile, the QuickTime API simply lists the responsibilities of the
caller regarding thread safety and that's all its needs to do. In
other words, CPython doesn't need to step in an provide a threadsafe
environment; it just needs to establish the rules and make sure that
its own implementation supports those rules.

More than once, I had actually considered expending company resources
to develop a high performance, truly independent interpreter
implementation of the python core language and modules but in the end
estimated that the size of that project would just be too much, given
our company's current resources. Should such an implementation ever
be developed, it would be very attractive for companies to support,
fund, and/or license. The truth is, we just love python as a
language, but it's lack of true interpreter independence (in a
interpreter as well as in a thread sense) remains a *huge* liability.

So, my question becomes: is python 3 ready for true multithreaded
support?? Can we finally abandon our Frankenstein approach of loading
multiple identical dynamic libs to achieve truly independent
interpreters?? I've reviewed all the new python 3 C API module stuff,
and all I have to say is: whew--better late then never!! So, although
that solves modules offering truly independent interpreter support,
the following questions remain:

- In python 3, the C module API now supports true interpreter
independence, but have all the modules in the python codebase been
converted over? Are they all now truly compliant? It will only take
a single static/global state variable in a module to potentially cause
no end of pain in a multiple interpreter environment! Yikes!

- How close is python 3 really to true multithreaded use? The
assumption here is that caller ensures safety (e.g. ensuring that
neither interpreter is in use when serializing data from one to
another).

I believe that true python independent thread/interpreter support is
paramount and should become the top priority because this is the key
consideration used by developers when they're deciding which
interpreter to embed in their app. Until there's a hello world that
demonstrates running independent python interpreters on multiple app
threads, lua will remain the clear choice over python. Python 3 needs
true interpreter independence and multi-threaded support!

What you describe, truly independent interpreters, is not threading at
all: it is processes, emulated at the application level, with all the
memory cost and none of the OS protections. True threading would
involve sharing most objects.

Your solution depends on what you need:
* Killable "threads" -> OS processes
* multicore usage (GIL removal) -> OS processes or alternative Python
implementations (PyPy/Jython/IronPython)
* Sane shared objects -> safethread

Andy · Oct 23, 2008

What you describe, truly independent interpreters, is not threading at
all: it is processes, emulated at the application level, with all the
memory cost and none of the OS protections. True threading would
involve sharing most objects.

Your solution depends on what you need:
* Killable "threads" -> OS processes
* multicore usage (GIL removal) -> OS processes or alternative Python
implementations (PyPy/Jython/IronPython)
* Sane shared objects -> safethread

I realize what you're saying, but it's better said there's two issues
at hand:

1) Independent interpreters (this is the easier one--and solved, in
principle anyway, by PEP 3121, by Martin v. Löwis, but is FAR from
being carried through in modules as he pointed out). As you point
out, this doesn't directly relate to multi-threading BUT it is
intimately tied to the issue because if, in principle, every module
used instance data (rather than static data), then python would be
WELL on its way to "free threading" (as Jesse Noller calls it), or as
I was calling it "true multi-threading".

2) Barriers to "free threading". As Jesse describes, this is simply
just the GIL being in place, but of course it's there for a reason.
It's there because (1) doesn't hold and there was never any specs/
guidance put forward about what should and shouldn't be done in multi-
threaded apps (see my QuickTime API example). Perhaps if we could go
back in time, we would not put the GIL in place, strict guidelines
regarding multithreaded use would have been established, and PEP 3121
would have been mandatory for C modules. Then again--screw that, if I
could go back in time, I'd just go for the lottery tickets!! :^)

Anyway, I've been at this issue for quite a while now (we're
approaching our 3rd release cycle), so I'm pretty comfortable with the
principles at hand. I'd say the theme of your comments share the
theme of others here, so perhaps consider where end-user software
houses (like us) are coming from. Specifically, developing commercial
software for end users imposes some restrictions that open source
development communities aren't often as sensitive to, namely:

- Performance -- emulation is a no-go (e.g. Jython)
- Maturity and Licensing -- experimental/academic projects are no-go
(PyPy)
- Cross platform support -- love it or hate it, Win32 and OS X are all
that matter when you're talking about selling (and supporting)
software to the masses. I'm just the messenger here (ie. this is NOT
flamebait). We publish for OS X, so IronPython is therefore out.

Basically, our company is at a crossroads where we really need light,
clean "free threading" as Jesse calls it (e.g. on the iPhone, using
our python drawing wrapper to do primary drawing while running python
jobs on another thread doing image decoding and processing). In our
current iPhone app, we achieve this by using two python bundles
(dynamic libs) in the way I described in my initial post. Sure, thus
solves our problem, but it's pretty messy, sucks up resources, and has
been a pain to maintain.

Moving forward, please understand my posts here are also intended to
give the CPython dev community a glimpse of the issues that may not be
as visible to you guys (as they are for dev houses like us). For
example, it'd be pretty cool if Blizzard went with python instead of
lua, wouldn't you think? But some of the issues I've raised here no
doubt factor in to why end-user dev houses ultimately may have to pass
up python in favor of another interpreted language.

Bottom line: why give prospective devs any reason to turn down python--
there's just so many great things about python!

Regards,
Andy

Andy · Oct 23, 2008

Jesse, Terry, Martin -

First off, thanks again for your time and interest in this matter.
It's definitely encouraging to know that time and real effort is being
put into the matter and I hope my posts on this subject are hopefully
an informative data point for everyone here.

Thanks for that link to Adam Olsen's work, Jesse--I'll definitely look
more closely at it. As I mentioned in my previous post, end-user devs
like me are programmed to get nervous around new mods but at first
glance there definitely seems to be interesting. My initial reaction,
as interesting as the project is, goes back to by previous post about
putting all the object safety responsibility on the shoulders of the
API client. That way, one gets the best of both worlds: free
threading and no unnecessary object locking/blocking (ie. the API
client will manage moving the synchronization req'd to move objects
from one interpreter to another). I could have it wrong, but it seems
like safethread inserts some thread-safety features but they come at
the cost of performance. I know I keep mentioning it, but I think the
QuickTime API (and its documentation) is a great model for how any API
should approach threading. Check out their docs to see how they
address it; conceptually speaking, there's not a single line of thread
safety in QuickTime:

http://developer.apple.com/technotes/tn/tn2125.html

In short: multiple thread is tricky; it's the responsibility of the
API client to not do hazardous things.

And for the record: the module multiprocessing is totally great answer
for python-level MP stuff--very nice work, Jesse!

I'd like to post and discuss more, but I'll pick it up tomorrow...
All this stuff is fun and interesting to talk about, but I have to get
to some other things and it unfortunately comes down to cost
analysis. Sadly, I look at it as I can allocate 2-3 man months (~
$40k) to build our own basic python interpreter implementation that
solves our need for free threading and increased performance (we've
built various internal interpreters over the years so we have good
experience in house, our tools are high performance, and we only use a
pretty small subset of python). Or, there's the more attractive
approach to work with the python dev community and put that dev
expenditure into a form everyone can benefit from.

Regards,
Andy

Rhamphoryncus · Oct 23, 2008

I realize what you're saying, but it's better said there's two issues
at hand:

1) Independent interpreters (this is the easier one--and solved, in
principle anyway, by PEP 3121, by Martin v. Löwis, but is FAR from
being carried through in modules as he pointed out). As you point
out, this doesn't directly relate to multi-threading BUT it is
intimately tied to the issue because if, in principle, every module
used instance data (rather than static data), then python would be
WELL on its way to "free threading" (as Jesse Noller calls it), or as
I was calling it "true multi-threading".

If you want processes, use *real* processes. Your arguments fail to
get transaction because you don't provide a good, justified reason why
they don't and can't work.

Although isolated interpreters would be convenient to you, it's a
specialized use case, and bad language design. There's far more use
cases that aren't isolated (actual threading), so why exclude them?

2) Barriers to "free threading". As Jesse describes, this is simply
just the GIL being in place, but of course it's there for a reason.
It's there because (1) doesn't hold and there was never any specs/
guidance put forward about what should and shouldn't be done in multi-
threaded apps (see my QuickTime API example). Perhaps if we could go
back in time, we would not put the GIL in place, strict guidelines
regarding multithreaded use would have been established, and PEP 3121
would have been mandatory for C modules. Then again--screw that, if I
could go back in time, I'd just go for the lottery tickets!! :^)

You seem confused. PEP 3121 is for isolated interpreters (ie emulated
processes), not threading.

Getting threading right would have been a massive investment even back
then, and we probably wouldn't have as mature of a python we do
today. Make no mistake, the GIL has substantial benefits. It may be
old and tired, surrounded by young bucks, but it's still winning most
of the races.

Anyway, I've been at this issue for quite a while now (we're
approaching our 3rd release cycle), so I'm pretty comfortable with the
principles at hand. I'd say the theme of your comments share the
theme of others here, so perhaps consider where end-user software
houses (like us) are coming from. Specifically, developing commercial
software for end users imposes some restrictions that open source
development communities aren't often as sensitive to, namely:

- Performance -- emulation is a no-go (e.g. Jython)

Got some real benchmarks to back that up? How about testing it on a
16 core (or more) box and seeing how it scales?

- Maturity and Licensing -- experimental/academic projects are no-go
(PyPy)
- Cross platform support -- love it or hate it, Win32 and OS X are all
that matter when you're talking about selling (and supporting)
software to the masses. I'm just the messenger here (ie. this is NOT
flamebait). We publish for OS X, so IronPython is therefore out.

You might be able to use Java on one, IronPython on another, and PyPy
in between. Regardless, my point is that CPython will *never* remove
the GIL. It cannot be done in an effective, highly scalable fashion
without a total rewrite.

Basically, our company is at a crossroads where we really need light,
clean "free threading" as Jesse calls it (e.g. on the iPhone, using
our python drawing wrapper to do primary drawing while running python
jobs on another thread doing image decoding and processing). In our
current iPhone app, we achieve this by using two python bundles
(dynamic libs) in the way I described in my initial post. Sure, thus
solves our problem, but it's pretty messy, sucks up resources, and has
been a pain to maintain.

Is the iPhone multicore, or is it an issue of fairness (ie a soft
realtime app)?

Moving forward, please understand my posts here are also intended to
give the CPython dev community a glimpse of the issues that may not be
as visible to you guys (as they are for dev houses like us). For
example, it'd be pretty cool if Blizzard went with python instead of
lua, wouldn't you think? But some of the issues I've raised here no
doubt factor in to why end-user dev houses ultimately may have to pass
up python in favor of another interpreted language.

Bottom line: why give prospective devs any reason to turn down python--
there's just so many great things about python!

I'd like to see python used more, but fixing these things properly is
not as easy as believed. Those in the user community see only their
immediate problem (threads don't use multicore). People like me see
much bigger problems. We need consensus on the problems, and how to
solve it, and a commitment to invest what's required.

Andy · Oct 23, 2008

You seem confused. PEP 3121 is for isolated interpreters (ie emulated
processes), not threading.

Please reread my points--inherently isolated interpreters (ie. the top
level object) are indirectly linked to thread independence. I don't
want to argue, but you seem hell-bent on not hearing what I'm trying
to say here.

Got some real benchmarks to back that up? How about testing it on a
16 core (or more) box and seeing how it scales?

I don't care to argue with you, and you'll have to take it on faith
that I'm not spouting hot air. But just to put this to rest, I'll
make it clear in this Jython case:

You can't sell software to end users and expect them have a recent,
working java distro. Look around you: no real commercial software
title that sells to soccer moms and gamers use java. There's method
to commercial software production, so please don't presume that you
know my job, product line, and customers better than me, ok?

Just to put things in perspective, I already have exposed my company
to more support and design liability than I knew I was getting into by
going with python (as a result of all this thread safety and
interpreter independence business). I love to go into that one, but
it's frankly just not a good use of my time right now. Please just
accept that when someone says an option is a deal breaker, then it's a
deal breaker. This isn't some dude's masters thesis project here--we
pay our RENT and put our KIDS through school because we sell and ship
software that works is meant to entertain people happy.

I'd like to see python used more, but fixing these things properly is
not as easy as believed. Those in the user community see only their
immediate problem (threads don't use multicore). People like me see
much bigger problems. We need consensus on the problems, and how to
solve it, and a commitment to invest what's required.

Well, you seem to come down pretty hard on people that at your
doorstep saying their WILLING and INTERESTED in supporting python
development. And, you're exactly right: users see only their
immediate problem--but that's the definition of being a user. If
users saw the whole picture from the dev side, then they be
developers, not users.

Please consider that you're representing the python dev community
here; I'm you're friend here, not your enemy.

Andy

Rhamphoryncus · Oct 23, 2008

Please reread my points--inherently isolated interpreters (ie. the top
level object) are indirectly linked to thread independence. I don't
want to argue, but you seem hell-bent on not hearing what I'm trying
to say here.

I think the confusion is a matter of context. Your app, written in C
or some other non-python language, shares data between the threads and
thus treats them as real threads. However, from python's perspective
nothing is shared, and thus it is processes.

Although this contradiction is fine for embedding purposes, python is
a general purpose language, and needs to be capable of directly
sharing objects. Imagine you wanted to rewrite the bulk of your app
in python, with only a relatively small portion left in a C extension
module.

I don't care to argue with you, and you'll have to take it on faith
that I'm not spouting hot air. But just to put this to rest, I'll
make it clear in this Jython case:

You can't sell software to end users and expect them have a recent,
working java distro. Look around you: no real commercial software
title that sells to soccer moms and gamers use java. There's method
to commercial software production, so please don't presume that you
know my job, product line, and customers better than me, ok?

Just to put things in perspective, I already have exposed my company
to more support and design liability than I knew I was getting into by
going with python (as a result of all this thread safety and
interpreter independence business). I love to go into that one, but
it's frankly just not a good use of my time right now. Please just
accept that when someone says an option is a deal breaker, then it's a
deal breaker. This isn't some dude's masters thesis project here--we
pay our RENT and put our KIDS through school because we sell and ship
software that works is meant to entertain people happy.

Consider it accepted. I understand that PyPy/Jython/IronPython don't
fit your needs. Likewise though, CPython cannot fit my needs. What
we both need simply does not exist today.

Well, you seem to come down pretty hard on people that at your
doorstep saying their WILLING and INTERESTED in supporting python
development. And, you're exactly right: users see only their
immediate problem--but that's the definition of being a user. If
users saw the whole picture from the dev side, then they be
developers, not users.

Please consider that you're representing the python dev community
here; I'm you're friend here, not your enemy.

I'm sorry if I came across harshly. My intent was merely to push you
towards supporting long-term solutions, rather than short-term ones.

Rhamphoryncus · Oct 23, 2008

I've been following this discussion with interest, as it certainly seems
that multi-core/multi-CPU machines are the coming thing, and many
applications will need to figure out how to use them effectively.

Reading this PDF paper is extremely interesting (albeit somewhat
dependent on understanding abstract theories of computation; I have
enough math background to follow it, sort of, and most of the text can
be read even without fully understanding the theoretical abstractions).

I have already heard people talking about "Java applications are
buggy". I don't believe that general sequential programs written in
Java are any buggier than programs written in other languages... so I
had interpreted that to mean (based on some inquiry) that complex,
multi-threaded Java applications are buggy. And while I also don't
believe that complex, multi-threaded programs written in Java are any
buggier than complex, multi-threaded programs written in other
languages, it does seem to be true that Java is one of the currently
popular languages in which to write complex, multi-threaded programs,
because of its language support for threads and concurrency primitives.
These reports were from people that are not programmers, but are field
IT people, that have bought and/or support software and/or hardware with
drivers, that are written in Java, and seem to have non-ideal behavior,
(apparently only) curable by stopping/restarting the application or
driver, or sometimes requiring a reboot.

The paper explains many traps that lead to complex, multi-threaded
programs being buggy, and being hard to test. I have worked with
parallel machines, applications, and databases for 25 years, and can
appreciate the succinct expression of the problems explained within the
paper, and can, from experience, agree with its premises and
conclusions. Parallel applications only have been commercial successes
when the parallelism is tightly constrained to well-controlled patterns
that could be easily understood. Threads, especially in "cooperation"
with languages that use memory pointers, have the potential to get out
of control, in inexplicable ways.

Although the paper is correct in many ways, I find it fails to
distinguish the core of the problem from the chaff surrounding it, and
thus is used to justify poor language designs.

For example, the amount of interaction may be seen as a spectrum: at
one end is C or Java threads, with complicated memory models, and a
tendency to just barely control things using locks. At the other end
would be completely isolated processes with no form of IPC. The later
is considered the worst possible, while the latter is the best
possible (purely sequential).

However, the latter is too weak for many uses. At a minimum we'd like
some pipes to communicate. Helps, but it's still too weak. What if
you have a large amount of data to share, created at startup but
otherwise not modified? So we add some read only types and ways to
define your own read only types. A couple of those types need a
process associated with them, so we make sure process handles are
proper objects too.

What have we got now? It's more on the thread end of the spectrum
than the process end, but it's definitely not a C or Java thread, and
it's definitely not an OS process. What is it? Does it have the
problems in the paper? Only some? Which?

Another peeve I have is his characterization of the observer pattern.
The generalized form of the problem exists in both single-threaded
sequential programs, in the form of unexpected reentrancy, and message
passing, with infinite CPU usage or infinite number of pending
messages.

Perhaps threading makes it much worse; I've heard many anecdotes that
would support that. Or perhaps it's the lack of automatic deadlock
detection, giving a clear and diagnosable error for you to fix.
Certainly, the mystery and extremeness of a deadlock could explain how
much it scales people. Either way the paper says nothing.

This statement, after reading the paper, seems somewhat in line with the
author's premise that language acceptability requires that a language be
self-contained/monolithic, and potentially sufficient to implement
itself. That seems to also be one of the reasons that Java is used
today for threaded applications. It does seem to be true, given current
hardware trends, that _some mechanism_ must be provided to obtain the
benefit of multiple cores/CPUs to a single application, and that Python
must either implement or interface to that mechanism to continue to be a
viable language for large scale application development.

Andy seems to want an implementation of independent Python processes
implemented as threads within a single address space, that can be
coordinated by an outer application. This actually corresponds to the
model promulgated in the paper as being most likely to succeed. Of
course, it maps nicely into a model using separate processes,
coordinated by an outer process, also. The differences seem to be:

1) Most applications are historically perceived as corresponding to
single processes. Language features for multi-processing are rare, and
such languages are not in common use.

2) A single address space can be convenient for the coordinating outer
application. It does seem simpler and more efficient to simply "copy"
data from one memory location to another, rather than send it in a
message, especially if the data are large. On the other hand,
coordination of memory access between multiple cores/CPUs effectively
causes memory copies from one cache to the other, and if memory is
accessed from multiple cores/CPUs regularly, the underlying hardware
implements additional synchronization and copying of data, potentially
each time the memory is accessed. Being forced to do message passing of
data between processes can actually be more efficient than access to
shared memory at times. I should note that in my 25 years of parallel
development, all the systems created used a message passing paradigm,
partly because the multiple CPUs often didn't share the same memory
chips, much less the same address space, and that a key feature of all
the successful systems of that nature was an efficient inter-CPU message
passing mechanism. I should also note that Herb Sutter has a recent
series of columns in Dr Dobbs regarding multi-core/multi-CPU parallelism
and a variety of implementation pitfalls, that I found to be very
interesting reading.

Try looking at it on another level: when your CPU wants to read from a
bit of memory controlled by another CPU it sends them a message
requesting they get it for us. They send back a message containing
that memory. They also note we have it, in case they want to modify
it later. We also note where we got it, in case we want to modify it
(and not wait for them to do modifications for us).

Message passing vs shared memory isn't really a yes/no question. It's
about ratios, usage patterns, and tradeoffs. *All* programs will
share data, but in what way? If it's just the code itself you can
move the cache validation into software and simplify the CPU, making
it faster. If the shared data is a lot more than that, and you use it
to coordinate accesses, then it'll be faster to have it in hardware.

It's quite possible they'll come up with something that seems quite
different, but in reality is the same sort of rearrangement. Add
hardware support for transactions, move the caching partly into
software, etc.

I have noted the multiprocessing module that is new to Python 2.6/3.0
being feverishly backported to Python 2.5, 2.4, etc... indicating that
people truly find the model/module useful... seems that this is one way,
in Python rather than outside of it, to implement the model Andy is
looking for, although I haven't delved into the details of that module
yet, myself. I suspect that a non-Python application could load one
embedded Python interpreter, and then indirectly use the multiprocessing
module to control other Python interpreters in other processors. I
don't know that multithreading primitives such as described in the paper
are available in the multiprocessing module, but perhaps they can be
implemented in some manner using the tools that are provided; in any
case, some interprocess communication primitives are provided via this
new Python module.

There could be opportunity to enhance Python with process creation and
process coordination operations, rather than have it depend on
easy-to-implement-incorrectly coordination patterns or
easy-to-use-improperly libraries/modules of multiprocessing primitives
(this is not a slam of the new multiprocessing module, which appears to
be filling a present need in rather conventional ways, but just to point
out that ideas promulgated by the paper, which I suspect 2 years later
are still research topics, may be a better abstraction than the
conventional mechanisms).

One thing Andy hasn't yet explained (or I missed) is why any of his
application is coded in a language other than Python. I can think of a
number of possibilities:

A) (Historical) It existed, then the desire for extensions was seen, and
Python was seen as a good extension language.

B) Python is inappropriate (performance?) for some of the algorithms
(but should they be coded instead as Python extensions, with the core
application being in Python?)

C) Unavailability of Python wrappers for particularly useful 3rd-party
libraries

D) Other?

"It already existed" is definitely the original reason, but now it
includes single-threaded performance and multi-threaded scalability.
Although the idea of "just write an extension that releases the GIL"
is a common suggestion, it needs to be fairly coarse to be effective,
and ensure little of the CPU time is left in python. If the apps
spreads around it's CPU time it is likely impossible to use python
effectively.

greg · Oct 24, 2008

Andy said:
1) Independent interpreters (this is the easier one--and solved, in
principle anyway, by PEP 3121, by Martin v. LÃ¶wis

Something like that is necessary for independent interpreters,
but not sufficient. There are also all the built-in constants
and type objects to consider. Most of these are statically
allocated at the moment.

2) Barriers to "free threading". As Jesse describes, this is simply
just the GIL being in place, but of course it's there for a reason.
It's there because (1) doesn't hold and there was never any specs/
guidance put forward about what should and shouldn't be done in multi-
threaded apps

No, it's there because it's necessary for acceptable performance
when multiple threads are running in one interpreter. Independent
interpreters wouldn't mean the absence of a GIL; it would only
mean each interpreter having its own GIL.

Martin v. Löwis · Oct 24, 2008

You seem confused. PEP 3121 is for isolated interpreters (ie emulated

processes), not threading.

Just a small remark: this wasn't the primary objective of the PEP.
The primary objective was to support module cleanup in a reliable
manner, to allow eventually to get modules garbage-collected properly.
However, I also kept the isolated interpreters feature in mind there.

Regards,
Martin

sturlamolden · Oct 24, 2008

Instead of "appdomains" (one interpreter per thread), or free
threading, you could use multiple processes. Take a look at the new
multiprocessing module in Python 2.6. It has roughly the same
interface as Python's threading and queue modules, but uses processes
instead of threads. Processes are scheduled independently by the
operating system. The objects in the multiprocessing module also tend
to have much better performance than their threading and queue
counterparts. If you have a problem with threads due to the GIL, the
multiprocessing module with most likely take care of it.

There is a fundamental problem with using homebrew loading of multiple
(but renamed) copies of PythonXX.dll that is easily overlooked. That
is, extension modules (.pyd) are DLLs as well. Even if required by two
interpreters, they will only be loaded into the process image once.
Thus you have to rename all of them as well, or you will get havoc
with refcounts. Not to speak of what will happen if a Windows HANDLE
is closed by one interpreter while still needed by another. It is
almost guaranteed to bite you, sooner or later.

There are other options as well:

- Use IronPython. It does not have a GIL.

- Use Jython. It does not have a GIL.

- Use pywin32 to create isolated outproc COM servers in Python. (I'm
not sure what the effect of inproc servers would be.)

- Use os.fork() if your platform supports it (Linux, Unix, Apple,
Cygwin, Windows Vista SUA). This is the standard posix way of doing
multiprocessing. It is almost unbeatable if you have a fast copy-on-
write implementation of fork (that is, all platforms except Cygwin).

PyDev 3.0 Released	2	Nov 7, 2013
C language now truly universal	0	Jan 1, 2011
Truly platform-independent DB access in Python?	18	Aug 28, 2006
Multiple independent Python interpreters in a C/C++ program?	9	Apr 11, 2008
[ANN] pyparsing 2.0.1 released - compatible with Python 2.6 and later	1	Jul 20, 2013
ANN: Celery 3.0 (chiastic slide) released!	1	Jul 7, 2012
os independent rename	3	Sep 17, 2011
Unittest2 on python 2.6	0	Mar 18, 2012

2.6, 3.0, and truly independent intepreters

Andy

Thomas Heller

Martin v. Löwis

Andy

Andy

Martin v. Löwis

Terry Reedy

Jesse Noller

Terry Reedy

Jesse Noller

Rhamphoryncus

Andy

Andy

Rhamphoryncus

Andy

Rhamphoryncus

Rhamphoryncus

greg

Martin v. Löwis

sturlamolden

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads