Glenn, great post and points!
Thanks. I need to admit here that while I've got a fair bit of
professional programming experience, I'm quite new to Python -- I've not
learned its internals, nor even the full extent of its rich library. So
I have some questions that are partly about the goals of the
applications being discussed, partly about how Python is constructed,
and partly about how the library is constructed. I'm hoping to get a
better understanding of all of these; perhaps once a better
understanding is achieved, limitations will be understood, and maybe
solutions be achievable.
Let me define some speculative Python interpreters; I think the first is
today's Python:
PyA: Has a GIL. PyA threads can run within a process; but are
effectively serialized to the places where the GIL is obtained/released.
Needs the GIL because that solves lots of problems with non-reentrant
code (an example of non-reentrant code, is code that uses global (C
global, or C static) variables – note that I'm not talking about Python
vars declared global... they are only module global). In this model,
non-reentrant code could include pieces of the interpreter, and/or
extension modules.
PyB: No GIL. PyB threads acquire/release a lock around each reference to
a global variable (like "with" feature). Requires massive recoding of
all code that contains global variables. Reduces performance
significantly by the increased cost of obtaining and releasing locks.
PyC: No locks. Instead, recoding is done to eliminate global variables
(interpreter requires a state structure to be passed in). Extension
modules that use globals are prohibited... this eliminates large
portions of the library, or requires massive recoding. PyC threads do
not share data between threads except by explicit interfaces.
PyD: (A hybrid of PyA & PyC). The interpreter is recoded to eliminate
global variables, and each interpreter instance is provided a state
structure. There is still a GIL, however, because globals are
potentially still used by some modules. Code is added to detect use of
global variables by a module, or some contract is written whereby a
module can be declared to be reentrant and global-free. PyA threads will
obtain the GIL as they would today. PyC threads would be available to be
created. PyC instances refuse to call non-reentrant modules, but also
need not obtain the GIL... PyC threads would have limited module support
initially, but over time, most modules can be migrated to be reentrant
and global-free, so they can be used by PyC instances. Most 3rd-party
libraries today are starting to care about reentrancy anyway, because of
the popularity of threads.
The assumptions here are that:
Data-1) A Python interpreter doesn't provide any mechanism to share
normal data among threads, they are independent... but message passing
works.
Data-2) A Python interpreter could be extended to provide mechanisms to
share special data, and the data would come with an implicit lock.
Data-3) A Python interpreter could be extended to provide unlocked
access to special data, requiring the application to handle the
synchronization between threads. Data of type 2 could be used to control
access to data of type 3. This type of data could be large, or
frequently referenced data, but only by a single thread at a time, with
major handoffs to a different thread synchronized by the application in
whatever way it chooses.
Context-1) A Python interpreter would know about threads it spawns, and
could pass in a block of context (in addition to the state structure) as
a parameter to a new thread. That block of context would belong to the
thread as long as it exists, and return to the spawner when the thread
completes. An embedded interpreter would also be given a block of
context (in addition to the state structure). This would allow
application context to be created and passed around. Pointers to shared
memory structures, might be typical context in the embedded case.
Context-2) Embedded Python interpreters could be spawned either as PyA
threads or PyC threads. PyC threads would be limited to modules that are
reentrant.
I think that PyB and PyC are the visions that people see, which argue
against implementing independent interpreters. PyB isn't truly
independent, because data are shared, recoding is required, and
performance suffers. Ick. PyC requires "recoding the whole library"
potentially, if it is the only solution. PyD allows access to the whole
standard library of modules, exactly like today, but the existing
limitations still obtain for PyA threads using that model – very limited
concurrency. But PyC threads would execute in their own little
environments, and not need locking. Pure Python code would be
immediately happy there. Properly coded (reentrant, global-free)
extensions would be happy there. Lots of work could be done there, to
use up multi-core/multi-CPU horsepower (shared-memory architecture).
Questions for people that know the Python internals: Is PyD possible?
How hard? Is a PyC thread an effective way of implementing a Python
sandbox? If it is, and if it would attract the attention of Brett
Cannon, who at least once wanted to do a thesis on Python sandboxes, he
could be a helpful supporter.
Questions for Andy: is the type of work you want to do in independent
threads mostly pure Python? Or with libraries that you can control to
some extent? Are those libraries reentrant? Could they be made
reentrant? How much of the Python standard library would need to be
available in reentrant mode to provide useful functionality for those
threads? I think you want PyC
Questions for Patrick: So if you had a Python GUI using the whole
standard library -- would it likely runs fine in PyA threads, and still
be able to use PyC threads for the audio scripting language? Would it be
a problem for those threads to have limited library support (only
reentrant modules)?
That's the rub... In our case, we're doing image and video
manipulation--stuff not good to be messaging from address space to
address space. The same argument holds for numerical processing with
large data sets. The workers handing back huge data sets via
messaging isn't very attractive.
In the module multiprocessing environment could you not use shared
memory, then, for the large shared data items?
Our software runs in real time (so performance is paramount),
interacts with other static libraries, depends on worker threads to
perform real-time image manipulation, and leverages Windows and Mac OS
API concepts and features. Python's performance hits have generally
been a huge challenge with our animators because they often have to go
back and massage their python code to improve execution performance.
So, in short, there are many reasons why we use python as a part
rather than a whole.
The other area of pain that I mentioned in one of my other posts is
that what we ship, above all, can't be flaky. The lack of module
cleanup (intended to be addressed by PEP 3121), using a duplicate copy
of the python dynamic lib, and namespace black magic to achieve
independent interpreters are all examples that have made using python
for us much more challenging and time-consuming then we ever
anticipated.
Again, if it turns out nothing can be done about our needs (which
appears to be more and more like the case), I think it's important for
everyone here to consider the points raised here in the last week.
Moreover, realize that the python dev community really stands to gain
from making python usable as a tool (rather than a monolith). This
fact alone has caused lua to *rapidly* rise in popularity with
software companies looking to embed a powerful, lightweight
interpreter in their software.
As a python language fan an enthusiast, don't let lua win! (I say
this endearingly of course--I have the utmost respect for both
communities and I only want to see CPython be an attractive pick when
a company is looking to embed a language that won't intrude upon their
app's design).
Thanks for the further explanations.