unladen swallow: python and llvm

Luis M. González · Jun 5, 2009

I am very excited by this project (as well as by pypy) and I read all
their plan, which looks quite practical and impressive.
But I must confess that I can't understand why LLVM is so great for
python and why it will make a difference.

AFAIK, LLVM is alot of things at the same time (a compiler
infrastructure, a compilation strategy, a virtual instruction set,
etc).
I am also confussed at their use of the term "jit" (is LLVM a jit? Can
it be used to build a jit?).
Is it something like the .NET or JAVA jit? Or it can be used to
implement a custom jit (ala psyco, for example)?

Also, as some pypy folk said, it seems they intend to do "upfront
compilation". How?
Is it something along the lines of the V8 javascript engine (no
interpreter, no intermediate representation)?
Or it will be another interpreter implementation? If so, how will it
be any better...?

Well, these are a lot of questions and they only show my confussion...
I would highly appreciate if someone knowledgeable sheds some light on
this for me...

Thanks in advance!
Luis

Jesse Noller · Jun 5, 2009

You can email these questions to the unladen-swallow mailing list.
They're very open to answering questions.

Neuruss · Jun 7, 2009

CPython uses a C compiler to compile the python code (written in C)
into native machine code.

unladen-swallow uses an llvm-specific C compiler to compile the CPython
code (written in C) into LLVM opcodes.

The LLVM virtual machine executes those LLVM opcodes. The LLVM
virtual machine also has a JIT (just in time compiler) which converts
the LLVM op-codes into native machine code.

So both CPython and unladen-swallow compile C code into native machine
code in different ways.

So why use LLVM? This enables unladen swallow to modify the python
virtual machine to target LLVM instead of the python vm opcodes.
These can then be run using the LLVM JIT as native machine code and
hence run all python code much faster.

The unladen swallow team have a lot more ideas for optimisations, but
this seems to be the main one.

It is an interesting idea for a number of reasons, the main one as far
as I'm concerned is that it is more of a port of CPython to a new
architecture than a complete re-invention of python (like PyPy /
IronPython / jython) so stands a chance of being merged back into
CPython.

Thanks Nick,
ok, let me see if I got it:
The Python vm is written in c, and generates its own bitecodes which
in turn get translated to machine code (one at a time).
Unladen Swallow aims to replace this vm by one compiled with the llvm
compiler, which I guess will generate different bytecodes, and in
addition, supplies a jit for free. Is that correct?

It's confussing to think about a compiler which is also a virtual
machine, which also has a jit...
Another thing that I don't understand is about the "upfront"
compilation.
Actually, the project plan doesn't mention it, but I read a comment on
pypy's blog about a pycon presentation, where they said it would be
upfront compilation (?). What does it mean?

I guess it has nothing to do with the v8 strategy, because unladen
swallow will be a virtual machine, while v8 compiles everything to
machine code on the first run. But I still wonder what this mean and
how this is different.

By the way, I already posted a couple of question on unladen's site.
But now I see the discussion is way to "low level" for me, and I
wouldn't want to interrupt with my silly basic questions...

Luis

MRAB · Jun 7, 2009

Neuruss said:
Thanks Nick,
ok, let me see if I got it:
The Python vm is written in c, and generates its own bitecodes which
in turn get translated to machine code (one at a time).
Unladen Swallow aims to replace this vm by one compiled with the llvm
compiler, which I guess will generate different bytecodes, and in
addition, supplies a jit for free. Is that correct?

[snip]
No. CPython is written in C (hence the name). It compiles Python source
code to bytecodes. The bytecodes are instructions for a VM which is
written in C, and they are interpreted one by one. There's no
compilation to machine code.

bearophileHUGS · Jun 7, 2009

Luis M. González:

it seems they intend to do "upfront
compilation". How?

Unladen swallow developers want to try everything (but black magic and
necromancy) to increase the speed of Cpython. So they will try to
compile up-front if/where they can (for example most regular
expressions are known at compile time, so there's no need to compile
them at run time. I don't know if Cpython compiles them before running
time).

What I like of Unladen swallow is that it's a very practical approach,
very different in style from ShedSkin and PyPy (and it's more
ambitious than Psyco). I also like Unladen swallow because they are
the few people that have the boldness to do something to increase the
performance of Python for real.
They have a set of reference benchmarks that are very real, not
synthetic at all, so for example they refuse to use Pystones and the
like.

What I don't like of Unladen swallow is that integrating LLVM with the
CPython codebase looks like a not much probable thing. Another thing I
don't like is the name of such project, it's not easy to pronounce for
non-English speaking people.

Bye,
bearophile

Paul Rubin · Jun 7, 2009

What I like of Unladen swallow is that it's a very practical approach,
very different in style from ShedSkin and PyPy (and it's more
ambitious than Psyco). I also like Unladen swallow because they are
the few people that have the boldness to do something to increase the
performance of Python for real.

IMHO the main problem with the Unladen Swallow approach is that it
would surprise me if CPython really spends that much of its time
interpreting byte code. Is there some profiling output around? My
guess is that CPython spends an awful lot of time in dictionary
lookups for method calls, plus incrementing and decrementing ref
counts and stuff like that. Plus, the absence of a relocating garbage
collector may mess up cache hit ratios pretty badly. Shed Skin as I
understand it departs in some ways from Python semantics in order to
get better compiler output, at the expense of breaking some Python
programs. I think that is the right approach, as long as it's not
done too often. That's the main reason why I think it's unfortunate
that Python 3.0 broke backwards compatibility at the particular time
that it did.

bearophileHUGS · Jun 7, 2009

Paul Rubin:

IMHO the main problem with the Unladen Swallow approach is that it would surprise me if CPython really spends that much of its time interpreting byte code.<

Note that Py3 already has a way to speed up byte code interpretation
where compiled by GCC or Intel compiler (it's a very old strategy used
by Forth compilers, that in Py3 is used only partially. The strongest
optimizations used many years ago in Forth aren't used yet in Py3,
probably to keep the Py3 virtual machine simpler, or maybe because
there are not enough Forth experts among them).

Unladen swallow developers are smart and they attack the problem from
every side they can think of, plus some sides they can't think of. Be
ready for some surprises

Is there some profiling output around?<

I am sure they swim every day in profiler outputs

But you have to
ask to them.

Plus, the absence of a relocating garbage collector may mess up cache hit ratios pretty badly.<

I guess they will try to add a relocating GC too, of course. Plus some
other strategy. And more. And then some cherries on top, with whipped
cream just to be sure.

Shed Skin as I understand it departs in some ways from Python semantics in order to get better compiler output, at the expense of breaking some Python programs. I think that is the right approach, as long as it's not done too often.<

ShedSkin (SS) is a beast almost totally different from CPython, SS
compiles an implicitly static subset of Python to C++. So it breaks
most real Python programs, and it doesn't use the Python std lib (it
rebuilds one in C++ or compiled Python), and so on.
SS may be useful for people that don't want to mess with the
intricacies of Cython (ex-Pyrex) and its tricky reference count, to
create compiled python extensions. But so far I think nearly no one is
using SS for such purpose, so it may be a failed experiment (SS is
also slow in compiling, it's hard to make it compile more than
1000-5000 lines of code).

Bye,
bearophile

Kay Schluehr · Jun 8, 2009

ShedSkin (SS) is a beast almost totally different from CPython, SS
compiles an implicitly static subset of Python to C++. So it breaks
most real Python programs, and it doesn't use the Python std lib (it
rebuilds one in C++ or compiled Python), and so on.
SS may be useful for people that don't want to mess with the
intricacies of Cython (ex-Pyrex) and its tricky reference count, to
create compiled python extensions.

Don't understand your Cython compliant. The only tricky part of Cython
is the doublethink regarding Python types and C types. I attempted
once to write a ShedSkin like code transformer from Python to Cython
based on type recordings but never found the time for this because I
have to work on EasyExtend on all fronts at the same time. Maybe next
year or when Unladen Swallow becomes a success - never. The advantage
of this approach over ShedSkin was that every valid Cython program is
also a Python extension module, so one can advance the translator in
small increments and still make continuous progress on the execution
speed front.

Tim Wintle · Jun 8, 2009

It is an interesting idea for a number of reasons, the main
one as far
as I'm concerned is that it is more of a port of CPython to a
new
architecture than a complete re-invention of python (like
PyPy /
IronPython / jython) so stands a chance of being merged back
into
CPython.

Blatant fanboyism. PyPy also has a chance of being merged back into
Python trunk.

How?

I believe that unladen swallow has already had many of it's
optimisations back-ported to CPython, but I can't see how backporting a
python interpreter written in python into C is going to be as easy as
merging from Unladen swallow, which is (until the llvm part) a branch of
CPython.

Personally, I think that PyPy is a much better interpreter from a
theoretical point of view, and opens up massive possibilities for
writing interpreters in general. Unladen Swallow on the other hand is
something we can use _now_ - on real work, on real servers. It's a more
interesting engineering project, and something that shouldn't require
re-writing of existing python code.

Tim W

bearophileHUGS · Jun 8, 2009

Kay Schluehr:

Don't understand your Cython compliant. The only tricky part of Cython is the doublethink regarding Python types and C types. I attempted once to write a ShedSkin like code transformer from Python to Cython based on type recordings but never found the time for this because I have to work on EasyExtend on all fronts at the same time.<

I have tried to create a certain data structure with a recent version
of Pyrex on Windows, and I have wasted lot of time looking for missing
reference count updates that didn't happen, or memory that didn't get
freed.

The C code produced by ShedSkin is a bit hairy but it's 50 times more
readable than the C jungle produced by Pyrex, where I have lost lot of
time looking for the missing reference counts, etc.

In the end I have used D with Pyd to write an extension in a very
quick way. For me writing D code is much simpler than writing Pyrex
code. Never used Pyrex ever since.

I'm sure lot of people like Cython, but I prefer a more transparent
language, that doesn't hide me how it works inside.

Bye,
bearophile

skip · Jun 8, 2009

bearophile> I'm sure lot of people like Cython, but I prefer a more
bearophile> transparent language, that doesn't hide me how it works
bearophile> inside.

Why not just write extension modules in C then?

Paul Boddie · Jun 8, 2009

The C code produced by ShedSkin is a bit hairy but it's 50 times more
readable than the C jungle produced by Pyrex, where I have lost lot of
time looking for the missing reference counts, etc.

The C++ code produced by Shed Skin can actually provide an explicit,
yet accurate summary of the implicit type semantics present in a
program, in the sense that appropriate parameterisations of template
classes may be chosen for the C++ program in order to model the data
structures used in the original program. The analysis required to
achieve this is actually rather difficult, and it's understandable
that in languages like OCaml that are widely associated with type
inference, one is encouraged (if not required) to describe one's types
in advance.

People who claim (or imply) that Shed Skin has removed all the
dynamicity from Python need to look at how much work the tool still
needs to do even with all the restrictions imposed on input programs.

Paul

bearophileHUGS · Jun 8, 2009

(e-mail address removed):

Why not just write extension modules in C then?

In the past I have used some C for that purpose, but have you tried
the D language (used from Python with Pyd)? It's way better,
especially if you for example use libs similar to itertools functions,
etc

Bye,
bearophile

greg · Jun 9, 2009

I have tried to create a certain data structure with a recent version
of Pyrex on Windows, and I have wasted lot of time looking for missing
reference count updates that didn't happen, or memory that didn't get
freed.

Can you elaborate on those problems? The only way
you should be able to get reference count errors
in Pyrex code is if you're casting between Python
and non-Python types.

Neal Becker · Jun 9, 2009

(e-mail address removed):

In the past I have used some C for that purpose, but have you tried
the D language (used from Python with Pyd)? It's way better,
especially if you for example use libs similar to itertools functions,
etc

Bye,
bearophile

Is Pyd maintained? I'm interested, but was scared away when I noticed that
it had not been updated for some time. (I haven't looked recently).

bearophileHUGS · Jun 9, 2009

Greg:

Can you elaborate on those problems?<

I can't, I am sorry, I don't remember the details anymore.
Feel free to ignore what I have written about Pyrex, lot of people
appreciate it, so it must be good enough, even if I was not smart/
expert enough to use it well. I have even failed in using it on
Windows for several days, so probably I am/was quite ignorant. I use
Python also because it's handy. For programmers being lazy is
sometimes a quality

-----------------

This part is almost OT, I hope it will be tolerated.

Neal Becker:

Is Pyd maintained? I'm interested, but was scared away when I noticed that it had not been updated for some time. (I haven't looked recently).<

I think Pyd works (with no or small changes) with a D1 compiler like
DMD, but you have to use the Phobos Standard library, that is worse
than Tango. If you have problems with Pyd you will probably find
people willing to help you on the D IRC channel.

The problem is that the D language isn't used by lot of people, and
most libraries are developed by few university students that stop
mantaining the libraries once they find a job. So most D libraries are
almost abandoned. There is just not enough people in the D community.
And some people don't like to develop new libs because the D2 language
(currently in Alpha still) makes D1 look like a dead end. On the other
hand this is also a good thing, because D1 language has stopped
evolving, so you are often able to compile even "old" code.

Using Pyd is quite easy, but D1 language is not as simple as Python,
despite being three times simpler than C++

The good thing is that
it's not difficult to adapt C code to D, it's almost a mechanical
translation (probably a tool simpler than 2to3 can be enough to
perform such translation of C to D, but of course no one has written
such thing).

Another problem with Pyd is that it may have scalability problems,
that is it may have problems if you want to wrap hundreds of classes
and functions. So before using it for real projects it's better to
test it well.

I have no idea if D will ever have some true success, even if it's
nice. The hystory of Informatics is full of thousands of nice dead
languages. In the meantime I'll keep using it and writing libs, etc. I
have seen than several Python-lovers like D. The new LDC compiler
allows D1 code to be most times about as fast as C++. This is more
than enough. At the moment it seems that D is appreciated by people
that write video games.

Bye,
bearophile

Stefan Behnel · Jun 11, 2009

Nick said:
CPython uses a C compiler to compile the python code (written in C)
into native machine code.

That would be Cython: compile Python code to (optimised) C code and then
run a C compiler over that to get native machine code.

http://cython.org/

CPython compiles Python code to *byte-code* and then *interprets* that in a
virtual machine (which happens to be written in C, hence the name).

Stefan

Stefan Behnel · Jun 11, 2009

Kay Schluehr:

I have tried to create a certain data structure with a recent version
of Pyrex on Windows, and I have wasted lot of time looking for missing
reference count updates that didn't happen, or memory that didn't get
freed.

I wonder what you did then. Apparently, you didn't just compile a Python
program, but tried to use Pyrex/Cython to avoid writing C code. That can
work, but depends on your C expertise.

If you only compile Python code, you will not get in touch with any
ref-counting or memory leaks (any more than in CPython, that is). You only
have to care about freeing memory if you manually allocate that memory
through malloc(), in which case it's your own fault if it doesn't get freed.

I'm sure lot of people like Cython, but I prefer a more transparent
language, that doesn't hide me how it works inside.

Cython doesn't hide anything. Most of the magic that happens is done in
code tree transformations, which you can look up in the compiler code. The
code generation is mostly just mapping the final code tree to C code, with
some type specialisations. Cython will even copy your complete source code
line-by-line into the C code it writes, so that you can easily read up what
your code gets translated to.

I admit that the generated C code is not always simple and obvious, as it
contains lots of runtime type specialisations and optimisations. But that's
the price you pay for fast code. And you can make Cython leave out most of
the duplicated code (i.e. the pessimistic fallbacks) by adding type hints
to your code.

Stefan

PyPy and RPython	11	Sep 1, 2010
Upgraded Ubuntu -> 11.10, Python -> 2.7.2. Now psyco doesn't work?	2	Dec 20, 2011
Standardizing RPython - it's time.	8	Oct 11, 2010
Vilnius/Post EuroPython PyPy Sprint 12-14th of July	0	Jun 22, 2007
PyPy 0.99 released	0	Feb 17, 2007
[ANN] Pyjamas 0.5 Web Widget Set and python-to-javascript Compilerreleased	0	Mar 11, 2009
Release of PyPy 0.7.0	10	Aug 28, 2005
The real problem with Python 3 - no business case for conversion(was "I strongly dislike Python 3")	79	Jul 2, 2010

unladen swallow: python and llvm

Luis M. González

Jesse Noller

Neuruss

MRAB

bearophileHUGS

Paul Rubin

bearophileHUGS

Kay Schluehr

Tim Wintle

bearophileHUGS

skip

Paul Boddie

bearophileHUGS

greg

Neal Becker

bearophileHUGS

Stefan Behnel

Stefan Behnel

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads