Python C extension providing... Python's own API?

A

Adam Atlas

Does anyone know if it would be possible to create a CPython extension
-- or use the ctypes module -- to access Python's own embedding API
(http://docs.python.org/api/initialization.html &c.)? Could a Python
program itself create a sub-interpreter, and work with it with all the
privileges and capabilities that an actual C program would have?

I realize that this may be a bit too... mystical? ... for a lot of
people's tastes, but I'm just curious if it's possible. :)
 
M

Matimus

Does anyone know if it would be possible to create a CPython extension
-- or use the ctypes module -- to access Python's own embedding API
(http://docs.python.org/api/initialization.html&c.)? Could a Python
program itself create a sub-interpreter, and work with it with all the
privileges and capabilities that an actual C program would have?

I realize that this may be a bit too... mystical? ... for a lot of
people's tastes, but I'm just curious if it's possible. :)

I think that is what the "code" module is for. Maybe not exactly what
you were expecting, but the capability you describe is already there.
Being able to access its own interpreter is one of the things that
makes Python a dynamic language.
 
A

Adam Atlas

I think that is what the "code" module is for. Maybe not exactly what
you were expecting, but the capability you describe is already there.
Being able to access its own interpreter is one of the things that
makes Python a dynamic language.

That's not really what I'm after. Believe me, I've searched the
standard library in length and breadth. :) The `code` module runs in
the same interpreter, basically executing as though it were just a
separate module (except interactively). What I'm interested in is
accessing Python's C API for CREATING interpreters -- i.e. what
mod_python or anything else embedding the Python interpreter would do.
Creating a whole environment, not just a module; it has its own `sys`
parameters, its own `__builtin__`, and so on, so you can safely mess
with those without changing them in the parent interpreter.

So far, I've tried ctypes, and it doesn't work; then I noticed the
docs said it wouldn't anyway. But I think I'll try writing a C
extension module. This could be interesting for sandboxing &c.
 
G

Graham Dumpleton

That's not really what I'm after. Believe me, I've searched the
standard library in length and breadth. :) The `code` module runs in
the same interpreter, basically executing as though it were just a
separate module (except interactively). What I'm interested in is
accessing Python's C API for CREATING interpreters -- i.e. whatmod_pythonor anything else embedding the Python interpreter would do.
Creating a whole environment, not just a module; it has its own `sys`
parameters, its own `__builtin__`, and so on, so you can safely mess
with those without changing them in the parent interpreter.

So far, I've tried ctypes, and it doesn't work; then I noticed the
docs said it wouldn't anyway. But I think I'll try writing a C
extension module. This could be interesting for sandboxing &c.

Having played around in that area more than most I would say it would
be entirely possible to create a C extension module for Python that
would allow one to create additional sub interpreters. The question
only becomes how does one want to make use of it and make the
interface look like.

A simple interface may be to direct that in named sub interpreter
import this module. If named sub interpreter didn't exist then it
would be created. To have code run in the separate interpreter on an
ongoing basis, the imported module could create a thread. Personally
though I don't like triggering side affects from module import, so
might be better to say import this module and if that is successful
then call named object in that module with specified arguments. The
latter is better in that it allows data to be passed across as well,
although you would want to limit to basic data types for various
reasons.

The only tricky issues in this are ensuring that thread state is given
up when calling out of the first interpreter and ensuring that new
thread state is created against the new interpreter before calling
into it. You can't use the same thread state as then you get into
issues with exceptions about running in restricted mode.

Another issue to contend with is whether you allow sub interpreters to
be deleted. This can be tricky for various reasons. First is that when
deleting sub interpreters anything registered with atexit.register()
isn't run as that only happens for main Python interpreter, but one
can ensure they are by calling sys.exitfunc() within that sub
interpreter. Even then, if you have separate threads running in that
sub interpreter and they don't register an atexit function to allow
themselves to be killed, it could hang the attempt to delete the sub
interpreter. Even if that works, you may find that some third party
extension modules can't cope with sub interpreters being killed
through caching of the interpreter reference, something that might end
up point elsewhere after the sub interpreter is killed.

There are also other little minor details that one has to worry about
like providing a sys.argv as sub interpreters don't have that but some
Python modules blindly expect to be able to access it even though they
aren't the actual main function of a program. Also possibly need to
restrict access to sys.stdin from sub interpreters and only allow main
interpreter to use it etc etc.

Anyway, what you want to do is something I have thought about before
in relation to mod_wsgi and whether there is any use for the ability
to call between sub interpreters. In the context of mod_wsgi though
such a thing is probably just an avenue for creating more mischief,
especially within a hosting environment where different peoples
applications may be running in different sub interpreters.

On the other hand, it might be useful in a standalone Python based
WSGI web server which you have more direct control over. It might
take a bit of design work as to how to do it in practice, but you
could create different sub interpreters through the module for
distinct WSGI applications. The main interpreter could be running the
web server and somehow then hand off a WSGI environ etc off to a
manager module in the other sub interpreter that then deals with
communicating with the WSGI application in that sub interpreter for
that specific request. You could conceivably have a hand off
arrangement whereby when an application code base changes that you
start routing requests for URL subset to new instance of application
in new sub interpreter and kill off old sub interpreter when able to.
Would certainly be an interesting area to look at. For mod_wsgi at
least though I ruled out allowing sub interpreters to be killed off
and allowing reloading of a complete application by such a handoff
mechanism as too dangerous a feature in an ISP based web hosting
environment as user code could too easily hang the Apache process when
trying to kill off the sub interpreter.

Anyway, you have got my curiosity going again as to how hard it would
be to write such a module. Personally I don't think it would be that
hard at all and maybe when I get first version of mod_wsgi out I might
have a play with the idea. :)

Graham
 
G

Graham Dumpleton

On the other hand, it might be useful in a standalone Python based
WSGI web server which you have more direct control over. It might
take a bit of design work as to how to do it in practice, but you
could create different sub interpreters through the module for
distinct WSGI applications. The main interpreter could be running the
web server and somehow then hand off a WSGI environ etc off to a
manager module in the other sub interpreter that then deals with
communicating with the WSGI application in that sub interpreter for
that specific request. You could conceivably have a hand off
arrangement whereby when an application code base changes that you
start routing requests for URL subset to new instance of application
in new sub interpreter and kill off old sub interpreter when able to.
Would certainly be an interesting area to look at. For mod_wsgi at
least though I ruled out allowing sub interpreters to be killed off
and allowing reloading of a complete application by such a handoff
mechanism as too dangerous a feature in an ISP based web hosting
environment as user code could too easily hang the Apache process when
trying to kill off the sub interpreter.

One more thought. It would actually be quite cute if you could make
this whole encapsulation of pushing a WSGI request into a distinct sub
interpreter a WSGI middleware component. That way you could just drop
it into any existing web server infrastructure that supports WSGI. Now
I'm getting really interested to have a play. :)

Graham
 
A

Adam Atlas

Wow! I'll have to read through that tomorrow when I'm (hopefully) less
tired. :D

Anyway, I somehow already managed to get this working. I'm calling it
DoublePy.
Here's the alpha or proof-of-concept or whatever we're to call it.
http://adamatlas.org/2007/03/doublepy-0.1.tar.gz

Not bad for an 0.1 written in a couple of hours, methinks. This could
have some really interesting possibilities when it's more mature.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top