Please enlighten me about PyPy

R

Ray

Hello!

I've been reading about PyPy, but there are some things that I don't
understand about it. I hope I can get some enlightenment in this
newsgroup :)

First, the intro:

<excerpt>
"The PyPy project aims at producing a flexible and fast Python
implementation. The guiding idea is to translate a Python-level
description of the Python language itself to lower level languages."
</excerpt>

So the basic idea is that PyPy is an implementation of Python in Python
(i.e.: writing Python's interpreter in Python), and then translate that
into another language such as C or Java? How is it different from
CPython or Jython then?

Also, what does "translation" here mean? Translation as in, say, "Use
Jython to translate PyPy to Java classes"? Or "Use Psyco to translate
PyPy to native exec"?

<excerpt>
"Rumors have it that the secret goal is being faster-than-C which is
nonsense, isn't it?"
</excerpt>

Why is this supposed to be nonsense if it's been translated to C? I
mean, C version of PyPy vs. CPython, both are in C, then why is this
supposed to be nonsense?

It seems that I'm missing a lot of nuances in the word "translation"
here.

Also, this one:

<excerpt>
We have written a Python interpreter in Python, without many references
to low-level details. (Because of the nature of Python, this is already
a complicated task, although not as much as writing it in - say - C.)
Then we use this as a "language specification" and manipulate it to
produce the more traditional interpreters that we want. In the above
sense, we are generating the concrete "mappings" of Python into
lower-level target platforms.
</excerpt>

So the "language specification" in this paragraph _is_ the Python
implementation in Python, a.k.a.: PyPy? Then what does "manipulate it
to produce the more traditional interpreters" mean?

I mean, it seems from what I read that PyPy is more about a translator
that translates Python code into something else rather than
implementing Python in Python. In that case, it could have been any
other project, right? As in implementing X in Python, and then
translate to another language?

Thanks for any pointers!
 
?

=?iso-8859-1?q?Luis_M._Gonz=E1lez?=

Hmmm... I know it's complicated, and all these questions can make your
head explode.
I'll tell you what I understand about Pypy and, at the same time, I'll
leave the door open for further explanations or corrections.

As you know, python is a dynamic language.
It means, amongst other things, that the programmer doesn't provide
type information when declaring variables, like in statically typed
languages.
Its code doesn't get translated to machine code through a compiler,
like in C.
Instead, it is "interpreted" by the interprter, which finds out each
variable type at run-time.
This interpretation makes scripting languages like python much slower
than traditional static languages.

Recently, Python got a speed boost via Psyco, which is something like a
proof of concept for a just-in-time compiler. It is a cpython extension
and it can improve python's speed by analyzing run-time information and
generating machine code on the fly.
However, psyco can only analize python code and as you know, python
relies on many extensions coded in c, for performance.
So its author decided that having a python implementation written in
python would laid a much better basis for implementing psyco-like
techniques.

This implementation requires a minimal core, writen in a restricted
subset of python called "rpython". This subset avoids many of the most
dynamic aspects of python, making it easier to authomatically translate
it to C through a tool that uses top-notch type inference techniques.
This translated version of the rpython interpreter (which got already
auto-translated to c), is the basis of Pypy.

On top of it, New Psyco-like and just-in-time techniques will be
implemented for achieving maximum performance.

However, I still doubt that I really understood it...
I'm still not sure if the type inference techniques will be used to
improve the performance of programs running on pypy, or if these
techniques were only intended for getting the rpython interpreter
translated to c.

As far as I know, pypy is currently about 10/20 times slower than
cpython, although many optimizations remain to be done.
And I 'm not sure, but I think that its developers rely on the
psyco-like techniques to achieve the big speed boost their looking for.

Luis
 
R

Ray

Hi Luis!

Thanks for your reply :) Some further questions below...
So its author decided that having a python implementation written in
python would laid a much better basis for implementing psyco-like
techniques.

OK, so far I get it... I think. So it's implementing the Python
interpreter in a Python subset called RPython, makes it more amenable
to translation with psyco-like techniques. But how is this superior
compared to CPython? Is it because Psyco is a specializer, which
generate potentially different code for different sets of data? So the
assumption is that the interpreter deals with a very specific set of
data that Psyco will be able to make use to generate very efficient
machine code?

I still don't get how this can be superior to the hand-coded C version
though?

Also, this sounds like it involves implementing the Python's libraries
that are currently implemented in C, in Python, so that they can be
translated. Did I get that correctly?
This implementation requires a minimal core, writen in a restricted
subset of python called "rpython". This subset avoids many of the most
dynamic aspects of python, making it easier to authomatically translate
it to C through a tool that uses top-notch type inference techniques.

OK, now I understand this bit about RPython, thanks.
This translated version of the rpython interpreter (which got already
auto-translated to c), is the basis of Pypy.

Now I'm confused again--psyco translates Python into machine code--so
how does this tie in with the fact that the interpreter written in
Python is translated into another language (in this case C?)
However, I still doubt that I really understood it...
I'm still not sure if the type inference techniques will be used to
improve the performance of programs running on pypy, or if these
techniques were only intended for getting the rpython interpreter
translated to c.

As far as I know, pypy is currently about 10/20 times slower than
cpython, although many optimizations remain to be done.
And I 'm not sure, but I think that its developers rely on the
psyco-like techniques to achieve the big speed boost their looking for.

This is another one I don't get--this approach seems to imply that when
PyPy is reasonably complete, it is expected that it'll be faster than
CPython. I mean, I don't get how something that's translated into C can
be faster than the handcoded C version?

Thanks,
Ray
 
S

Steve Holden

Kevin said:
21 Dec 2005 19:33:20 -0800, Luis M. González <[email protected]
<mailto:[email protected]>>:

... ...
This implementation requires a minimal core, writen in a restricted
subset of python called "rpython". This subset avoids many of the most
dynamic aspects of python, making it easier to authomatically translate
it to C through a tool that uses top-notch type inference techniques.


Why not directly write the minimal core in C?
Because then you'd have to maintain it in C. This way, once you have the
first working translator you can translate it into C to improve its
performance, and use it to translate the *next* working translator, and
so on. Consequently your maintenance work is done on the Python code
rather than hand-translated C.

Fairly standard bootstrapping technique, though it's sometimes difficult
to appreciate the problems involved in writing a compiler for language X
in X itself. Typical is the fact that the compiler for version m has to
be written in version (m-1), for example :)

used-to-do-that-stuff-for-a-living-ly y'rs - steve
 
?

=?iso-8859-1?q?Luis_M._Gonz=E1lez?=

Well, first and foremost, when I said that I leave the door open for
further explanations, I meant explanations by other people more
knowlegeable than me :)
Now I'm confused again--psyco translates Python into machine code--so
how does this tie in with the fact that the interpreter written in
Python is translated into another language (in this case C?)

No, the psyco-like techniques come later, after the rpython interpreter
is auto-translated to c. They are not used to translate the interpreter
to c (this is done through a tool that uses type inference, flow-graph
anailisis, etc, etc).
Getting the rpython auto-translated to C is the first goal of the
project (already achieved).
That means having a minimal core, writen in a low level language (c for
speed) that hasn't been writen by hand, but auto-translated to c from
the python source -> much easier to improve and maintain from now on.

Think about this: improving and maintaining a hand coded c
implementation like cpython is a nightmare. The more complex the code,
the more dificult is its improvement and experimentation.
Now, they have it all written in python (rpython) instead, which is
easier, nicer and more flexible. And this python code can get
authomatically translated to C (no hand coding, this is done by the
tool I mentioned above).

Now this is both, a conclusion and a question (because I also have many
doubts about it :):
At this moment, the traslated python-in-python version is, or intends
to be, something more or less equivalenet to Cpython in terms of
performance. Because it is in essence almost the same thing: another C
python implementation. The only difference is that while Cpython was
written by hand, pypy was writen in python and auto-translated to C.

What remains to be done now is implementing the psyco-like techniques
for improving speed (amongst many other things, like stackless, etc).

Luis
 
C

Claudio Grondi

Steve said:
Because then you'd have to maintain it in C. This way, once you have the
first working translator you can translate it into C to improve its
performance, and use it to translate the *next* working translator, and
so on. Consequently your maintenance work is done on the Python code
rather than hand-translated C.

Fairly standard bootstrapping technique, though it's sometimes difficult
to appreciate the problems involved in writing a compiler for language X
in X itself. Typical is the fact that the compiler for version m has to
be written in version (m-1), for example :)

used-to-do-that-stuff-for-a-living-ly y'rs - steve

I am glad someone asked the question about PyPy, because I need same
enlightenment. Reading what has been written up to now I would like to
present here my current understanding to get eventually corrected when
got something the wrong way.

Do I understand it right, that :

Translating Python code to C for compilation is the way to avoid the
necessity to write a Python compiler as a hand-coded Assembler code for
each platform (i.e. Operating System / Processor combination)?

This hand-coding is already done by the people providing a C-compiler
for a platform and it can be assumed, that a C-compiler is always
available, so why dig that deep, when in practice it could be sufficient
to begin at the abstraction level of a C-compiler and not hand-coded
Assembler?

Sticking to ANSI C/C++ will make it possible to become multi platform
without the necessity of writing own pieces of hand-coded Assembler for
each platform what is usually already done by others providing that
platform and the C compiler for it.

So to use PyPy for creating a Python based Operating System where e.g.
IDLE replaces the usual command line interface and Tkinter becomes the
core of the GUI, it will be sufficient to replace the step of
translation to C code and compilation using a C compiler by an in
Assembler hand-coded Python compiler for each specific platform?

The expectation to become faster than CPython with the PyPy approach I
understand as a hope, that creating another Python engine architecture
(i.e. hierarchy of software pieces/modules the entire Python scripting
engine consist of) can lead to improvements not possible when sticking
to the given architecture in the current CPython implementation. After
it has been demonstrated the PyPy can be faster than the current CPython
implementation it will be sure possible to totally rewrite the CPython
implementation to achieve same speed by changing the architecture of the
elementary modules.
Is this maybe what causes confusion in understanding the expectation
that PyPy can come along with a speed improvement over CPython? The
fact, that another architecture of elementary modules can lead to speed
improvement and the fact, that it can sure be also then implemented in
the CPython approach to achieve the same speed, but would need a total
rewrite of CPython i.e. duplication of the PyPy effort?

Claudio
 
C

Carl Friedrich Bolz

Hi!
Well, first and foremost, when I said that I leave the door open for
further explanations, I meant explanations by other people more
knowlegeable than me :)

You did a very good job to describe what PyPy is in this and the
previous mail! I will try to give a justification about why PyPy is done
how it is done.
No, the psyco-like techniques come later, after the rpython interpreter
is auto-translated to c. They are not used to translate the interpreter
to c (this is done through a tool that uses type inference, flow-graph
anailisis, etc, etc).
Getting the rpython auto-translated to C is the first goal of the
project (already achieved).
That means having a minimal core, writen in a low level language (c for
speed) that hasn't been writen by hand, but auto-translated to c from
the python source -> much easier to improve and maintain from now on.

Indeed. The fact that the core is written in RPython has a number of
advantages:

The first point is indeed maintainability: Python is a lot more flexible
and more concise than C, so changes and enhancements become much easier.
Another point is that our interpreter can not only be translated, but
also run on top of CPython! This makes testing very fast, because you
don't need to translate the interpreter first before testing it -- just
run in on CPython.

The most important advantage of writing the interpreter in Python is
that of flexibility. In CPython a lot of implementation choices are done
rather early: The choice to use C as the platform the interpreter works
on, the choice to use reference counting (which is reflected
everywhere), the choice to have a GIL, the choice to not be stackless.
All these choices are deeply embedded into the implementation and are
rather hard to change. Not so in PyPy. Since the interpreter is written
in Python and then translated, the translation process can change
different aspects of the interpreter while translating it. The
interpreter implementation does not need to concern itself with all
these aspects.

One example of this is that we are not restricted to translate out
interpreter to C. There are currently backends to translate RPython to
C, LLVM (llvm.org), JavaScript (incomplete) and plans to write a
Smalltalk and a Java backend. That means that we could potentially
generate something that is similar to Jython -- which is not entirely
true, because the interfacing with Java libraries would not work, but
pypy-java would run on the JVM.

Another example is that we can choose at translation time which garbage
collection strategy to use. At the moment we even have two different
garbage collectors implemented: one simple reference counting one and
one that uses the Boehm garbage collector. We have also started (as part
of my Summer of Code project) an experimental garbage collection
framework which allow us to implement garbage collectors in Python. This
framework is not finished yet and needs to be integrated with the rest
of PyPy.

In a similar manner we hope to make different threading models choosable
at translation time.

[snip]
Now this is both, a conclusion and a question (because I also have many
doubts about it :):
At this moment, the traslated python-in-python version is, or intends
to be, something more or less equivalenet to Cpython in terms of
performance. Because it is in essence almost the same thing: another C
python implementation. The only difference is that while Cpython was
written by hand, pypy was writen in python and auto-translated to C.

Yes, at the moment pypy-c is rather similar to CPython, although slower
(a bit better than ten times slower than CPython at the moment), except
that we can already choose between different aspects (see above).
What remains to be done now is implementing the psyco-like techniques
for improving speed (amongst many other things, like stackless, etc).

Stackless is already implemented. In fact, it took around three days to
do this at the Paris sprint :). It is another aspect that we can choose
at translation time (that means you can also choose to not be stackless
if you want to). With stackless we can support arbitrarily deep
recursion (until the heap is full, that is). We don't export any
task-switching capabilities to the user, yet.

About the psyco-like JIT techniques: we hope to be able to not write the
JIT by hand but to generate it as part of the translation process. But
this is at the moment still quite unclear, in heavy flux and nowhere
near finished yet.

Cheers,

Carl Friedrich Bolz
 
?

=?iso-8859-1?q?Luis_M._Gonz=E1lez?=

Thanks Carl for your explanation!
I just have one doubt regarding the way Pypy is supposed to work when
its finished:

We know that for translating the rpython interpreter to C, the pypy
team developed a tool that relies heavily on static type inference.

My question is:
Will this type inference also work when running programs on pypy?
Is type inference a way to speed up running programs on pypy, or it was
just a means to translate the Rpython interpreter to C?

In other words:
Will type inference work on running programs to speed them up, or this
task is only carried out by psyco-like techniques?
 
C

Carl Friedrich Bolz

Hi!

some more pointers in addition to the good stuff that Luis wrote...
So the basic idea is that PyPy is an implementation of Python in Python
(i.e.: writing Python's interpreter in Python), and then translate that
into another language such as C or Java? How is it different from
CPython or Jython then?

CPython and Jython both need to implement the Python interpreter. So the
work to capture the Python semantics (which is quite big) is done twice,
once in C and once in Java. In PyPy we hope to do that only once and
then write translator backends for C and Java (which is a minor task,
compared to writing a whole Python interpreter).
Also, what does "translation" here mean? Translation as in, say, "Use
Jython to translate PyPy to Java classes"? Or "Use Psyco to translate
PyPy to native exec"?

It's more like "use the translator (which is another Python program) to
translate PyPy to whatever platform you are targetting".
<excerpt>
"Rumors have it that the secret goal is being faster-than-C which is
nonsense, isn't it?"
</excerpt>

Why is this supposed to be nonsense if it's been translated to C? I
mean, C version of PyPy vs. CPython, both are in C, then why is this
supposed to be nonsense?

The idea is that we hope that /Python code/ becomes faster than C code,
which is of course nonsense, right? :)
It seems that I'm missing a lot of nuances in the word "translation"
here.

Also, this one:

<excerpt>
We have written a Python interpreter in Python, without many references
to low-level details. (Because of the nature of Python, this is already
a complicated task, although not as much as writing it in - say - C.)
Then we use this as a "language specification" and manipulate it to
produce the more traditional interpreters that we want. In the above
sense, we are generating the concrete "mappings" of Python into
lower-level target platforms.
</excerpt>

So the "language specification" in this paragraph _is_ the Python
implementation in Python, a.k.a.: PyPy? Then what does "manipulate it
to produce the more traditional interpreters" mean?

The "manipulation" means the translation process that is used to
translate it into a target lanugage.
I mean, it seems from what I read that PyPy is more about a translator
that translates Python code into something else rather than
implementing Python in Python. In that case, it could have been any
other project, right? As in implementing X in Python, and then
translate to another language?

In theory, yes. But the translator is really not the main product of the
PyPy project. It is more like a means to an end, that we use to get a
more flexible, better, faster, ... implementation of Python. This means
that the translator at the moment is very much customized for our needs
when writing the interpreter, which makes the translator quite a bit
harder to use for other projects. But in theory it is possible to use
the translator for other projects, as long as these projects adheres to
the necessary staticness conditions.

This fact could be used nicely: If someone writes, say, a Ruby or Perl
interpreter in RPython he will get all the benefits of PyPy for free:
different target platforms, different garbage collectors, stacklessness,
maybe a JIT (which is still unclear at the moment).

Cheers,

Carl Friedrich Bolz
 
C

Carl Friedrich Bolz

Hi!
Thanks Carl for your explanation!
I just have one doubt regarding the way Pypy is supposed to work when
its finished:

We know that for translating the rpython interpreter to C, the pypy
team developed a tool that relies heavily on static type inference.

My question is:
Will this type inference also work when running programs on pypy?
Is type inference a way to speed up running programs on pypy, or it was
just a means to translate the Rpython interpreter to C?

In other words:
Will type inference work on running programs to speed them up, or this
task is only carried out by psyco-like techniques?

The static type inference is just a means. It will not be used for the
speeding up of running programs. The problem with the current type
inference is that it is really very static and most python programs are
not static enough for it.

Therefore we will rather use techniques that are similar to Psyco (note
that our JIT work is still in the early beginnings and that my comments
reflect only what we currently think might work :) ). The idea is that
the JIT looks at the running code and assumes some things it finds there
to be constant (like the type of a variable), inserts a check that this
still holds, and then optimizes the code under this assumption.

Cheers,

Carl Friedrich
 
?

=?iso-8859-1?q?Luis_M._Gonz=E1lez?=

Carl said:
The static type inference is just a means. It will not be used for the
speeding up of running programs. The problem with the current type
inference is that it is really very static and most python programs are
not static enough for it.

Therefore we will rather use techniques that are similar to Psyco (note
that our JIT work is still in the early beginnings and that my comments
reflect only what we currently think might work :) ). The idea is that
the JIT looks at the running code and assumes some things it finds there
to be constant (like the type of a variable), inserts a check that this
still holds, and then optimizes the code under this assumption.


Thanks!
I think I completely understand the whole thing now :)

Anyway, I guess it's just a matter of time untill we can use this
translation tool to translate other programs, provided they are written
in restricted python, right?
So we will have two choices:
1) running normal python programs on Pypy.
2) translating rpython programs to C and compiling them to stand-alone
executables.

Is that correct?
 
C

Carl Friedrich Bolz

Luis said:
Thanks!
I think I completely understand the whole thing now :)

If only we could say the same :)
Anyway, I guess it's just a matter of time untill we can use this
translation tool to translate other programs, provided they are written
in restricted python, right?

Yes. This is even possible right now, with one caveat: Basically it is
not so hard to write a new program in RPython. RPython is still kind of
nice, it is testable on CPython so this is not such a bad task. There
are problems with that, though: You don't have most of the stdlib in
RPython (mostly only a few functions from os, sys, math work). The other
problem is that it is quite hard to convert /existing/ programs to
RPython because they will most probably not adhere to the staticness
conditions. And it is surprisingly hard to convert an existing program
to RPython.
So we will have two choices:
1) running normal python programs on Pypy.
2) translating rpython programs to C and compiling them to stand-alone
executables.

Is that correct?

Indeed. Another possibility is to write a PyPy extension module in
RPython, have that translated to C and then use this in your pure python
code. Actually, one of our current rather wild ideas (which might not be
followed) is to be able to even use RPython to write extension modules
for CPython.

Cheers,

Carl Friedrich Bolz
 
?

=?iso-8859-1?q?Luis_M._Gonz=E1lez?=

Anyway, I guess it's just a matter of time untill we can use this
translation tool to translate other programs, provided they are written
in restricted python, right?
So we will have two choices:
1) running normal python programs on Pypy.
2) translating rpython programs to C and compiling them to stand-alone
executables.

Is that correct?

Oh, forget this question...
You already made this clear in another post in this thread...
Thanks!
Luis
 
?

=?iso-8859-1?q?Luis_M._Gonz=E1lez?=

Carl said:
Actually, one of our current rather wild ideas (which might not be
followed) is to be able to even use RPython to write extension modules
for CPython.

I don't think this is a wild idea. In fact, it is absolutely
reasonable.
I'm sure that creating this translation tool was a titanic task, and
now that you have it why not using it? This is a treasure that opens up
many possibilities...
Even if Pypy ends up being not as fast as intended (I hope not!), the
fact that you guys created this translation tool was well worth the
effort.

Thanks again for your explanations and keep up the good work!
Cheers,
Luis
 
S

Scott David Daniels

Luis said:
At this moment, the traslated python-in-python version is, or intends
to be, something more or less equivalent to Cpython in terms of
performance.
Actually, I think here it is more or less equivalent in behavior.
> Because it is in essence almost the same thing: another C python interpreter
implementation. The only difference is that while Cpython was
written by hand, pypy was written in python and auto-translated to C.
That is not the only difference. It becomes a lot easier to experiment
with alternative implementations of features and run timing tests.
What remains to be done now is implementing the psyco-like techniques
for improving speed (amongst many other things, like stackless, etc).
While the psyco-like tricks for specialization should definitely improve
the interpreter, there is a second trick (watch for exploding heads
here). The big trick is that you can specialize the interpreter for
running _its_ input (a Python program), thus giving you a new
interpreter that only runs your Python program -- a very specialized
interpreter indeed.

--Scott David Daniels
(e-mail address removed)
 
C

Carl Friedrich Bolz

Hi!
Luis M. González wrote:



Actually, I think here it is more or less equivalent in behavior.

Yes, apart from some minor differences (obscure one: in CPython you
cannot subclass str or tuple while adding slots, for no good reason,
while you can do that in PyPy).

[snip]
While the psyco-like tricks for specialization should definitely improve
the interpreter, there is a second trick (watch for exploding heads
here). The big trick is that you can specialize the interpreter for
running _its_ input (a Python program), thus giving you a new
interpreter that only runs your Python program -- a very specialized
interpreter indeed.

Indeed! And this specialized interpreter can with some right be called a
compiled version of the user-program! That means that an interpreter
together with a specializer is a compiler.

Now it is possible to take that fun game even one step further: You
specialize the _specializer_ for running its input (which is the
interpreter), thus giving you a new specializer which can specialize
only the interpreter for a later given user program -- a very
specialized specializer indeed. This can then be called a just-in-time
compiler. (Note that this is not quite how JIT of PyPy will look like :)

recursively-yours,

Carl Friedrich Bolz
 
R

Ray

Luis said:
Well, first and foremost, when I said that I leave the door open for
further explanations, I meant explanations by other people more
knowlegeable than me :)

<snip>

Thanks for clearing up some of my confusion with PyPy, Luis!

Cheers,
Ray
 
R

Ray

Carl said:
Hi!

some more pointers in addition to the good stuff that Luis wrote...

<snip>

Thanks Carl! That solidified my mental picture of PyPy a lot more :)

Warm regards,
Ray
 
B

Bugs

Scott said:
[snip] The big trick is that you can specialize the interpreter for
running _its_ input (a Python program), thus giving you a new
interpreter that only runs your Python program -- a very specialized
interpreter indeed.
Now THAT will be slick!

What is the current roadmap/timeline for PyPy?

Anyone know if Guido is interested in ever becoming deeply involved in
the PyPy project?
 
?

=?iso-8859-1?q?Luis_M._Gonz=E1lez?=

Thanks for clearing up some of my confusion with PyPy, Luis!

Hey, I'm glad you brought up this topic!
This thread really helped me to understand some dark corners of this
exciting project.

I also want to thank Carl and all the other Pypy developers for their
outstanding work!
I've been quietly following the evolution of Pypy through its mailing
list, and I eagerly wait for every new announcement they make, but I
never dared to ask any question fearing that I would look like a fool
amongst these rocket scientist...

Cheers,
Luis
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

pypy and ctypes 2
About me... 0
PyPy and RPython 11
Release of PyPy 0.7.0 10
PyPy 0.99 released 0
PyPy Europython Sprint Announcement 0
Please help me!!! 3
PyPy 1.0: JIT compilers for free and more 29

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top