On-topic: alternate Python implementations

  • Thread starter Steven D'Aprano
  • Start date
S

Stefan Behnel

Paul Rubin, 04.08.2012 20:18:
Calling CPython hardly counts as compiling Python into C.

CPython is written in C, though. So anything that CPython does can be done
in C. It's not like the CPython project used a completely unusual way of
writing C code.

Besides, I find your above statement questionable. You will always need
some kind of runtime infrastructure when you "compile Python into C", so
you can just as well use CPython for that instead of reimplementing it
completely from scratch. Both Cython and Nuitka do exactly that, and one of
the major advantages of that approach is that they can freely interact with
arbitrary code (Python or not) that was written for CPython, regardless of
its native dependencies. What good would it be to throw all of that away,
just for the sake of having "pure C code generation"?

You're going to compile the whole Python program into a single C
function so that you can do gotos inside of it? What happens if the
program imports a generator?

No, you are going to compile only the generator function into a function
that uses gotos, maybe with an additional in-out struct parameter that
holds its state. Then, on entry, you read the label (or its ID) from the
previous state, reset local variables and jump to the label. On exit, you
store the state back end return. Cython does it that way. Totally straight
forward, as I said.

You mean you're going to have all the same INCREF/DECREF stuff on every
operation in compiled data? Ugh.

If you don't like that, you can experiment with anything from a dedicated
GC to transactional memory.

What implementations would those be? There's the Boehm GC which is
useful for some purposes but not really suitable at large scale, from
what I can tell. Is there something else?

No idea - I'll look it up when I need one. Last I heard, PyPy had a couple
of GCs to choose from, but I don't know how closely the are tied into its
infrastructure.

You're going to let the program just leak memory until it crashes??

Well, it's not like CPython leaks memory until it crashes, now does it? And
it's written in C. So there must be ways to handle this also in C.

Remember that CPython didn't even have a GC before something around 2.0,
IIRC. That worked quite ok in most cases and simply left the tricky cases
to the programmers. It really depends on what your requirements are. Small
embedded systems, time critical code and real-time systems are often much
better off without garbage collection. It's pure convenience, after all.

Compare that to the performance gain of LuaJIT and it starts to look
like something is wrong with that approach, or maybe some issue inherent
in Python itself.

Huh? LuaJIT is a reimplementation of Lua that uses an optimising JIT
compiler specifically for Lua code. How is that similar to the Jython
runtime that runs *on top of* the JVM with its generic byte code based JIT
compiler?

Basically, LuaJIT's JIT compiler works at the same level as the one in
PyPy, which is why both can theoretically provide the same level of
performance gains.

It seems very hard to do reasonable optimizations in the presence of
standard Python techniques like dynamically poking class instance
attributes. I guess some optimizations are still possible, like storing
attributes named as literals in the program in fixed slots, saving some
dictionary lookups even though the slot contents would have to still be
mutable.

Sure. Even when targeting the CPython runtime with the generated C code
(like Cython or Nuitka), you can still do a lot. And sure, static code
analysis will never be able to infer everything that a JIT compiler can see.

Stefan
 
M

MRAB

Paul Rubin, 04.08.2012 20:18:

CPython is written in C, though. So anything that CPython does can be done
in C. It's not like the CPython project used a completely unusual way of
writing C code.

Besides, I find your above statement questionable. You will always need
some kind of runtime infrastructure when you "compile Python into C", so
you can just as well use CPython for that instead of reimplementing it
completely from scratch. Both Cython and Nuitka do exactly that, and one of
the major advantages of that approach is that they can freely interact with
arbitrary code (Python or not) that was written for CPython, regardless of
its native dependencies. What good would it be to throw all of that away,
just for the sake of having "pure C code generation"?



No, you are going to compile only the generator function into a function
that uses gotos, maybe with an additional in-out struct parameter that
holds its state. Then, on entry, you read the label (or its ID) from the
previous state, reset local variables and jump to the label. On exit, you
store the state back end return. Cython does it that way. Totally straight
forward, as I said.



If you don't like that, you can experiment with anything from a dedicated
GC to transactional memory.



No idea - I'll look it up when I need one. Last I heard, PyPy had a couple
of GCs to choose from, but I don't know how closely the are tied into its
infrastructure.



Well, it's not like CPython leaks memory until it crashes, now does it? And
it's written in C. So there must be ways to handle this also in C.

Remember that CPython didn't even have a GC before something around 2.0,
IIRC. That worked quite ok in most cases and simply left the tricky cases
to the programmers. It really depends on what your requirements are. Small
embedded systems, time critical code and real-time systems are often much
better off without garbage collection. It's pure convenience, after all.
[snip]
CPython relied entirely on reference counting, so memory could leak you
if inadvertently created a cycle of memory references. That problem was
fixed when a mark-and-sweep mechanism was added (it's called
occasionally to collect any unreachable cycles).
 
P

Paul Rubin

Stefan Behnel said:
CPython is written in C, though. So anything that CPython does can be
done in C. It's not like the CPython project used a completely unusual
way of writing C code.

CPython is a relatively simple interpreter, and executing code
by invoking such an interpreter IMHO doesn't count as "compiling" it in
any meaningful way.
You will always need some kind of runtime infrastructure when you
"compile Python into C", so you can just as well use CPython for that
instead of reimplementing it completely from scratch.

Maybe there's parts of Cpython you can re-use, but having the CPython
interpreter be the execution engine for "compiled" Python generators
again fails the seriousness test of what it means to compile code. If
you mean something other than that, you might explain more clearly.
Both Cython and Nuitka do exactly that,

I didn't know about Nuitka; it looks interesting but (at least after a
few minutes looking) I don't have much sense of how it works.
No, you are going to compile only the generator function into a function
that uses gotos, maybe with an additional in-out struct parameter that
holds its state.

Yeah, ok, I guess that can work, given python generators are limited
to returning through just one stack level. You might want to avoid
copying locals by just putting everything into a struct, that has to
be retained across entries/exits.
If you don't like that, you can experiment with anything from a dedicated
GC to transactional memory.

OK, but then CPython is no longer managing the memory.
Last I heard, PyPy had a couple of GCs to choose from,

PyPy doesn't compile to C, but I guess compiling to C doesn't preclude
precise GC, as long as the generated C code carefully tracks what C
objects can contain GC-able pointers, and follows some constraints about
when the GC can run. Some other compilers do this so it's not as big a
deal as it sounded like at first. OK.
Well, it's not like CPython leaks memory until it crashes...

I was counting CPython's reference counting as a rudimentary form of GC,
though I guess that's terminology that not everyone agrees on.
Huh? LuaJIT is a reimplementation of Lua that uses an optimising JIT
compiler specifically for Lua code. How is that similar to the Jython
runtime that runs *on top of* the JVM with its generic byte code based
JIT compiler?

I thought LuaJIT compiles the existing Lua VM code, but I haven't
looked at it closely or used it.
Sure. Even when targeting the CPython runtime with the generated C
code (like Cython or Nuitka), you can still do a lot. And sure, static
code analysis will never be able to infer everything that a JIT
compiler can see.

I think even a JIT can't avoid a lot of pain and slowdown, without
complex whole-program analysis and requiring the application to follow
some special conventions, like never importing at "runtime".
 
S

Stefan Behnel

Paul Rubin, 04.08.2012 22:43:
CPython is a relatively simple interpreter, and executing code
by invoking such an interpreter IMHO doesn't count as "compiling" it in
any meaningful way.

Oh, CPython is substantially more than an interpreter. The eval loop is
only *one* way to use the runtime environment. Remember that it has many
builtin types and functions as well as a huge standard library. Much of the
runtime environment is already written in C or can be compiled down to C.
If you compile Python code into C code that avoids the eval loop and only
uses the CPython runtime environment (which is what Cython does), I think
that qualifies as compiling Python code to C. It's definitely the most
practical and user friendly way to do it.

Maybe there's parts of Cpython you can re-use, but having the CPython
interpreter be the execution engine for "compiled" Python generators
again fails the seriousness test of what it means to compile code. If
you mean something other than that, you might explain more clearly.

See above.

I didn't know about Nuitka; it looks interesting but (at least after a
few minutes looking) I don't have much sense of how it works.

It's mostly like Cython but without the type system, i.e. without all the
stuff that makes it useful in real life. Just a bare
Python-to-C++-in-CPython compiler, without much of a way to make it do what
you want.

PyPy doesn't compile to C

RPython (usually) does, though, and my guess is that the memory management
part of the runtime is written in RPython.

but I guess compiling to C doesn't preclude
precise GC, as long as the generated C code carefully tracks what C
objects can contain GC-able pointers, and follows some constraints about
when the GC can run. Some other compilers do this so it's not as big a
deal as it sounded like at first. OK.

Yep, C really becomes a lot nicer when you generate it.

I thought LuaJIT compiles the existing Lua VM code, but I haven't
looked at it closely or used it.

Ok. It obviously reuses code, but the VM part of it is really different
from standard Lua.

Stefan
 
J

Jürgen A. Erhard

Steven D'Aprano, 04.08.2012 08:15:

And not to forget Cython, which is the only static Python compiler that is
widely used. Compiles and optimises Python to C code that uses the CPython
runtime and allows for easy manual optimisations to get C-like performance
out of it.

Cython is certainly *not* a Python *implementation*, since it always
uses the CPython runtime (and compiling Cython C files requires
Python.h).

None of the other implementations require Python for actually
compiling or running Python source.

Oh, yes, you can create a stand-alone... wait, a "stand-alone" app.
By embedding the Python runtime (dynamic linking with libpythonX.Y...
maybe static too? Didn't test, because it's irrelevant for making the
point).

Grits, J
 
S

Steven D'Aprano

C isn't so great for high-assurance stuff either, compared to (say) Ada.
People do use it in critical apps, but that's just because it is (or
anyway used to be) so ubiquitous.

And then they are shocked, SHOCKED I say!, when their app has enough
buffer overflow security vulnerabilities to sink a battleship.

[half a wink]

Haskell doesn't sound all that great as a translation target for Python
either, unfortunately, because its execution semantics are so different.

I have no opinion on that either way, except to say that if some
developer wants to experiment with Python-in-Haskell, good on him or her.
Trying something new is how progress is made.


[...]
Finally, Python itself isn't all that well suited for compilation, given
its high dynamicity. It will be interesting to see if the language
evolves due to PyPy.

Python is a dynamic language, but most Python code is relatively static.
Runtime optimizations that target the common case, but fall back to
unoptimized code in the rare cases that the optimization doesn't apply,
offer the opportunity of big speedups for most code at the cost of
trivial slowdowns when you do something unusual.
 
P

Paul Rubin

Steven D'Aprano said:
Runtime optimizations that target the common case, but fall back to
unoptimized code in the rare cases that the optimization doesn't apply,
offer the opportunity of big speedups for most code at the cost of
trivial slowdowns when you do something unusual.

The problem is you can't always tell if the unusual case is being
exercised without an expensive dynamic check, which in some cases must
be repeated in every iteration of a critical inner loop, even though it
turns out that the program never actually uses the unusual case.
 
S

Steven D'Aprano

The problem is you can't always tell if the unusual case is being
exercised without an expensive dynamic check, which in some cases must
be repeated in every iteration of a critical inner loop, even though it
turns out that the program never actually uses the unusual case.

I never said optimizing Python was easy :)

Obviously if the check is expensive enough, the optimization isn't going
to be worth doing. But often the check is not so expensive, or is just a
matter of tedious and careful book-keeping.

I don't wish to dispute that optimizing Python is hard, but it's not a
Hard Problem like factorizing huge integers, or solving the Palestine/
Israeli conflict. It's hard like cleaning your house after a gang of
drunken frat boys have partied all weekend.
 
D

Dennis Lee Bieber

C isn't so great for high-assurance stuff either, compared to (say) Ada.
People do use it in critical apps, but that's just because it is (or
anyway used to be) so ubiquitous.

And then they are shocked, SHOCKED I say!, when their app has enough
buffer overflow security vulnerabilities to sink a battleship.

[half a wink]
{adding the other half}

One has to realize that it is quite difficult to sink said
battleship -- a complete electrical system failure will still leave it
floating...

In contrast, keeping an inherently unstable, fly by wire, fighter
jet in the air is much more difficult...
 
S

Stefan Behnel

Paul Rubin, 05.08.2012 03:38:
The problem is you can't always tell if the unusual case is being
exercised without an expensive dynamic check, which in some cases must
be repeated in every iteration of a critical inner loop, even though it
turns out that the program never actually uses the unusual case.

Cython does a lot of optimistic optimisations. That's where a large part of
that huge C file comes from that Cython generates from even simple Python code.

For example, in CPython, C function calls are so ridiculously faster than
Python function calls that it's worth some effort if it saves you from
packing an argument tuple to call into a Python function. In fact, we've
been thinking about ways to export C signatures from Python function
objects, so that code implemented in C (or a C compatible language) can be
called directly from other code implemented in C. That's very common in the
CPython ecosystem.

There are a lot of simple things that quickly add up into a much better
performance on average.

Stefan
 
S

Stefan Behnel

Jürgen A. Erhard, 05.08.2012 01:25:
Cython is certainly *not* a Python *implementation*, since it always
uses the CPython runtime (and compiling Cython C files requires
Python.h).

Yes, it avoids an unnecessary duplication of effort as well as a
substantial loss of compatibility that all non-CPython based
implementations suffer from.

You'd be surprised to see how much of Python we implement, though,
including some of the builtins. You might want to revise your opinion once
you start digging into it. It's always easy to disagree at the surface.

None of the other implementations require Python for actually
compiling or running Python source.

Nuitka was on the list as well.

Oh, yes, you can create a stand-alone... wait, a "stand-alone" app.
By embedding the Python runtime (dynamic linking with libpythonX.Y...
maybe static too?

Sure, that works.

Stefan
 
S

Stefan Behnel

Stefan Behnel, 05.08.2012 07:46:
Jürgen A. Erhard, 05.08.2012 01:25:

Nuitka was on the list as well.

Oh, and Stackless was also on Steven's list, as well as WPython. That means
that 50% of the "other implementations" that Steven presented are not
"implementations" according to your apparent definition.

BTW, what is you definition?

Stefan
 
J

Jürgen A. Erhard

Jürgen A. Erhard, 05.08.2012 01:25:

Yes, it avoids an unnecessary duplication of effort as well as a
substantial loss of compatibility that all non-CPython based
implementations suffer from.

But it's not an Python *implementation*, "just" an extension.

Mind you, this is not intended as a slight of Cython as such. I
really like it, though I haven't had need for it yet, but I sure
prefer it to writing extensions in pure C. *brrrr*
Nuitka was on the list as well.

True, which I realized only after my missive. But doesn't change
much, only that the list is wrong.
Sure, that works.

My definition, to also answer your following post, is "does not rely
on any executable part of the CPython source (which includes .c files
and executable code in header files if any, but of course can exclude
the stdlib)". Not sure that's precise enough, but... if it can't
run/work on a system that has no shred of CPython installed, it's not
an alternative *implementation*. The big three don't need CPython
(except PyPy for building, and even it can use a precompile PyPy I think).

Grits, J
 
S

Stefan Behnel

Jürgen A. Erhard, 05.08.2012 14:28:
True, which I realized only after my missive. But doesn't change
much, only that the list is wrong.
Agreed.


My definition, to also answer your following post, is "does not rely
on any executable part of the CPython source (which includes .c files
and executable code in header files if any, but of course can exclude
the stdlib)". Not sure that's precise enough, but... if it can't
run/work on a system that has no shred of CPython installed, it's not
an alternative *implementation*.

I can live with that definition. Cython is (by design) not an independent
reimplementation of Python.

Stefan
 
R

rusi

Most people are aware, if only vaguely, of the big Four Python
implementations:

I think the question about where Cython fits into this, raises the
need for a complementary list to Steven's. What are the different
ways in which python can be extended/embedded. eg

1. 'Classic' extending/embedding
2. SCXX
3. PyCXX
4. Boost
5. Cython
6. Swig
7. Sip
8. ctypes

Is such a list maintained somewhere?
 
S

Stefan Behnel

rusi, 07.08.2012 06:23:
I think the question about where Cython fits into this, raises the
need for a complementary list to Steven's. What are the different
ways in which python can be extended/embedded. eg

1. 'Classic' extending/embedding
2. SCXX
3. PyCXX
4. Boost
5. Cython
6. Swig
7. Sip
8. ctypes

Is such a list maintained somewhere?

Hijacking this page would be a good place to start it IMHO:

http://wiki.python.org/moin/Embedding and Extending

Stefan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Staff online

Members online

Forum statistics

Threads
473,767
Messages
2,569,571
Members
45,045
Latest member
DRCM

Latest Threads

Top