[ann] first release of PyPy

C

Christian Tismer

Torsten Bronger wrote:

....
I've been told by so many books and on-line material that Python
cannot be compiled (unless you cheat). So how is this possible?

Have a look at Psyco, that will be folded into and improved
by PyPy.

--
Christian Tismer :^) <mailto:[email protected]>
tismerysoft GmbH : Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9A : *Starship* http://starship.python.net/
14109 Berlin : PGP key -> http://wwwkeys.pgp.net/
work +49 30 802 86 56 mobile +49 173 24 18 776 fax +49 30 80 90 57 05
PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04
whom do you want to sponsor today? http://www.stackless.com/
 
C

Christian Tismer

Ville said:
Torsten> What's supposed to be compiled? Only PyPy itself or also
Torsten> the programs it's "interpreting"?

PyPy is written in python, if it can be compiled then the programs can
be as well.

Well, this is not really true. PyPy is written in RPython,
a sub-language of Python that is implicitly defined by
"simple and static enough to be compilable".

We have not yet started to work on the dynamic nature of
Python, that needs different technology (Psyco).
Torsten> I've been told by so many books and on-line material that
Torsten> Python cannot be compiled (unless you cheat). So how is
Torsten> this possible?

These guys are exploring a new territory. OTOH, Lisp is a dynamic
language like python and it can be compiled to native code. Pyrex
demonstrates the "trivial" way to compile python to native code, the
real problem is making the resulting code fast. Typically this
requires type inference (i.e. figuring out the type of an object from
the context because there are no type declarations) to avoid dict
lookups in method dispatch.

Type inference works fine for our implementation of Python,
but it is in fact very limited for full-blown Python programs.
Yoou cannot do much more than to try to generate effective code
for the current situation that you see. But that's most often
quite fine.

--
Christian Tismer :^) <mailto:[email protected]>
tismerysoft GmbH : Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9A : *Starship* http://starship.python.net/
14109 Berlin : PGP key -> http://wwwkeys.pgp.net/
work +49 30 802 86 56 mobile +49 173 24 18 776 fax +49 30 80 90 57 05
PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04
whom do you want to sponsor today? http://www.stackless.com/
 
P

Paul Rubin

Christian Tismer said:
Type inference works fine for our implementation of Python,
but it is in fact very limited for full-blown Python programs.
Yoou cannot do much more than to try to generate effective code
for the current situation that you see. But that's most often
quite fine.

Type inference (or static type declarations) is one part of compiling
dynamic languages but I think its importance is overblown in these
Python compiler threads. There's lots of compiled Lisp code out there
that's completely dynamic, with every operation dispatching on the
type tags in the Lisp objects. Yes, the code runs slower than when
the compiler knows the type in advance, but it's still much faster
than interpreted code.

I'd expect one of the worst bottlenecks in Python is the multiple
levels of dictionary lookup needed when you say a.x(). The
interpreter has to search through the method dictionaries for class(a)
and all of its superclasses. It has to do this every time you do the
operation, since those dictionaries can change at any time. Being
able to do that, it seems to me, is NOT in the interest of reliable or
maintainable programming--look at the cruft in socket.py, for example.
Being able to statically generate the method call (like a C++ compiler
does) or even just being able to cache a method list in each class
(avoiding searching through all the superclasses on subsequent calls
to any operation) would probably make a big difference in execution
speed in both the compiler and interpreter. It would require a change
to the Python language but I think the change would be a beneficial
one both from the software maintainability and the performance point
of view.
 
V

Ville Vainio

Christian> Well, this is not really true. PyPy is written in
Christian> RPython, a sub-language of Python that is implicitly
Christian> defined by "simple and static enough to be compilable".

Could it be possible to tag some modules in application code as
RPython-compatible, making it possible to implement the speed critical
parts in RPython?
 
C

Christian Tismer

Ville said:
Christian> Well, this is not really true. PyPy is written in
Christian> RPython, a sub-language of Python that is implicitly
Christian> defined by "simple and static enough to be compilable".

Could it be possible to tag some modules in application code as
RPython-compatible, making it possible to implement the speed critical
parts in RPython?

Interesting idea.
Especially since we have automatic translation from RPythonic
application code to interpreter level.
Maybe not for now, but I'm cc-ing pypy-dev.

@rpythonic :)

--
Christian Tismer :^) <mailto:[email protected]>
tismerysoft GmbH : Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9A : *Starship* http://starship.python.net/
14109 Berlin : PGP key -> http://wwwkeys.pgp.net/
work +49 30 802 86 56 mobile +49 173 24 18 776 fax +49 30 80 90 57 05
PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04
whom do you want to sponsor today? http://www.stackless.com/
 
R

Rocco Moretti

Alex said:
The question still remains, can it run it's self? ;)

I think they try, every once in a while, to self host. The only problem
at this stage of the game is the ~2000x speed slowdown. Using that
figure, a five second startup time for PyPy on CPython would take about
3 hours for PyPy on PyPy on CPython (5s*2000). Running a 1 second (on
CPython) Python program would take a month and a half for PyPy on PyPy
on CPython. (1s*2000*2000)

Once they get the speed issue licked, the self hosting problems should
be no trouble. ;)
 
C

Carl Friedrich Bolz

This allready worked in the past, though it doesn't at the moment.
I think they try, every once in a while, to self host. The only problem
at this stage of the game is the ~2000x speed slowdown. Using that
figure, a five second startup time for PyPy on CPython would take about
3 hours for PyPy on PyPy on CPython (5s*2000). Running a 1 second (on
CPython) Python program would take a month and a half for PyPy on PyPy
on CPython. (1s*2000*2000)

Once they get the speed issue licked, the self hosting problems should
be no trouble. ;)

Speed isn't even the biggest problem when running PyPy on itself. PyPy
still 'fakes' some objects, e.g. borrows them from the underlying
Python. This is mostly the case for things that have direct access to
the OS, e.g. files. If you run PyPy on PyPy on CPython you try to fake
the faked objects again, which gives trouble. Since we have to handle
faked objects differently in the future anyway we decided that at the
moment it isn't worth the effort to keep the self-hosting working.

Regards,

Carl Friedrich
 
S

Shane Hathaway

Mike said:
Basically, there's a *lot* of history in programming languages. I'd
hate to see someone think that we went straight from assembler to C,
or that people didn't understand the value of dynamic languages very
early.

Yes, although I wasn't following historical events; I was following the
trends of what programmers in general have used. Theory has always been
far ahead of practice... and generalizations are never correct. ;-)

Shane
 
K

Kay Schluehr

Carl said:
This allready worked in the past, though it doesn't at the moment.


Speed isn't even the biggest problem when running PyPy on itself. PyPy
still 'fakes' some objects, e.g. borrows them from the underlying
Python.

Does it mean You create an RPython object that runs on top of CPython,
but is just an RPython facade wrapped around a CPython object? So You
have four kinds of Pythons:

RPy - translateable into LL code
APy - non-translateable but interpretable by translated RPy
RPy* - non-translateable but consistent interface with RPy. Calls
APy*
APy* - not translateable and not interpreteable by translated RPy

"Selfhosting" would imply vanishing RPy* and APy*. But the problem
seems to be that selfhosting must somehow be broken because the system
needs to interact with OS-dependend librarys. As long as You run the
system upon CPython the problem does not occur but once You drop it, a
kind of "extension objectspace" must be created which is translated
into code with nice interfacing properties. Or do You think that
RPython translations will be sufficient and another ext-objectspace is
just useless epi-cycling?

Kay
 
M

Mike Meyer

Shane Hathaway said:
Yes, although I wasn't following historical events; I was following the
trends of what programmers in general have used. Theory has always been
far ahead of practice... and generalizations are never correct. ;-)

Well, I'd say that generalization isn't correct. I recall a period before C
became popular when a plethora of different languages were widely used,
depending on the application domain. COBOL, FORTRAN, ALGOL, LISP, Pascal,
Snobol, PL/I, PL/360, APL, various assemblers and others all had their uses.

C (and later C++) has come to dominate a lot of application domains. But
it inherited a lot from those other languages, as did the VHLL's that
have started displacing C in some application domains. Those languages are
still worth studying, because you can see what they did wrong. And what,
in retrospect, they did right that was forgotten by C.

<mike
 
C

Carl Friedrich Bolz

Kay said:
Does it mean You create an RPython object that runs on top of CPython,
but is just an RPython facade wrapped around a CPython object?

yes. It means that there are objects that behave like they should in
PyPy but are implemented by keeping creating a regular CPython instance
of the object and delegating all calls on PyPy-level back to the CPython
level.
So You have four kinds of Pythons:

RPy - translateable into LL code
APy - non-translateable but interpretable by translated RPy
RPy* - non-translateable but consistent interface with RPy. Calls
APy*
APy* - not translateable and not interpreteable by translated RPy

"Selfhosting" would imply vanishing RPy* and APy*. But the problem
seems to be that selfhosting must somehow be broken because the system
needs to interact with OS-dependend librarys. As long as You run the
system upon CPython the problem does not occur but once You drop it, a
kind of "extension objectspace" must be created which is translated
into code with nice interfacing properties. Or do You think that
RPython translations will be sufficient and another ext-objectspace is
just useless epi-cycling?

Not exactly sure what you mean here. It's clear that we have to handle
faked objects differently to get a stand-alone PyPy version. One
possibility would be, that the RPython code calls certain functions
which are implemented in Python (can be regular Python), that are not
translated but replaced by a proper C function. For example we have a
function intmask at the moment which takes a long and removes as many
bits from it as neccessary to make it fit into an int again. This
function is left out when translating, since in C an int can obviously
not overflow to a long.

Regards,

Carl Friedrich
 
I

ionel

so what could this PyPy do in the future ? .. concretely ...
hope this is not a stupid question
 
C

Carl Friedrich Bolz

ionel said:
so what could this PyPy do in the future ? .. concretely ...
hope this is not a stupid question
Maybe the description from the homepage says it best:

The PyPy project aims at producing a flexible and fast Python
implementation. The guiding idea is to translate a Python-level
description of the Python language itself to lower level languages.
Rumors have it that the secret goal is being faster-than-C which is
nonsense, isn't it?


Regards,

Carl Friedrich
 
H

holger krekel

Hi Kay,

Does it mean You create an RPython object that runs on top of CPython,
but is just an RPython facade wrapped around a CPython object? So You
have four kinds of Pythons:

RPy - translateable into LL code
APy - non-translateable but interpretable by translated RPy
RPy* - non-translateable but consistent interface with RPy. Calls
APy*
APy* - not translateable and not interpreteable by translated RPy

"Selfhosting" would imply vanishing RPy* and APy*. But the problem
seems to be that selfhosting must somehow be broken because the system
needs to interact with OS-dependend librarys. As long as You run the
system upon CPython the problem does not occur but once You drop it, a
kind of "extension objectspace" must be created which is translated
into code with nice interfacing properties.

You are mostly right but 'extension objectspace' is misleading.
Object Spaces are only responsible for manipulating Python
application objects.

To get rid of 'faked' objects we need implementations for IO
access and operating system interactions. Those can sometimes
even be written in pure python (applevel) as is the case for
a preliminary version of a file object.
RPython translations will be sufficient and another ext-objectspace is
just useless epi-cycling?

Conceptually, we need a good concept to perform foreign
function invocation (FFI) much like ctypes or other approaches do.
However, concretely, we might at first just write some very
low-level code (even lower level than RPython) to interact
with os-level APIs and weave that into the translation process.
This is an area we are beginning to explore in more depth currently.

cheers,

holger
 
A

Anton Vredegoor

Carl said:
Rumors have it that the secret goal is being faster-than-C which is
nonsense, isn't it?

Maybe not. If one can call functions from a system dll (a la ctypes,
some other poster already mentioned there was some investigation in
this area) one can skip a layer of the hierarchy (remove the c-coded
middleman!) and this would possibly result in faster code.

I'm not involved in PyPy myself but this would seem a logical
possibility. To go a step further, if the compiler somehow would know
about the shortest machine code sequence which would produce the
desired effect then there would be no reason to limit onself to only
those relatively inefficent standard code sequences that are inside
system dll's.

Just design specific optimized dll's on the fly :)

(Now going into turbo overdive) One could have a central computer
checking which data transformations (at a polymorfic level) a specific
program is accomplishing and 'reengineer or restructure' the code
inductively to check whether some other coder already had 'said the
same thing' in 'better python code'. So one would get a warning when
reinventing the wheel even if one had invented a square one :) or if
one had distributed functionality in an inefficent way. Next, after
standardizing the input code this way one could have a list of these
'frequently used standard sequences' memoized at the central location
in order to speed up the compilation phase. Of course the central
interpreter would be sensitive to local code history so this would ease
the code recognition process.

This would work like the way human attention works in that we recognize
the word 'wheel' sooner if we first saw a picture of a car. The only
problem with this approach seems to be that it looks like a straight
path to borghood ...

Anton

'resistance is futile, all your codes are belong to us!'
 
K

Kay Schluehr

Anton said:
I'm not involved in PyPy myself but this would seem a logical
possibility. To go a step further, if the compiler somehow would know
about the shortest machine code sequence which would produce the
desired effect then there would be no reason to limit onself to only
those relatively inefficent standard code sequences that are inside
system dll's.

Are You shure that this problem is effectively solvable in any
language? Since You did not precise Your idea I'm not shure whether You
want to solve the halting-problem in PyPy or not ;)

Kay
 
A

Anton Vredegoor

Kay said:
Are You shure that this problem is effectively solvable in any
language? Since You did not precise Your idea I'm not shure whether You
want to solve the halting-problem in PyPy or not ;)

Since PyPy is covering new territory it seemed important to provide new
ideas so that they have something to look forward to and will not fall
asleep at the entrance of the new area. Maybe I failed with the new
part but at least I tried :)

Whether they are supposed to solve the halting problem or if that can
reasonably be expected I don't now either. Is it ethical to send people
on an impossible mission in order to harvest the spinoff? Some evil
genius might have created this universe in order to do just that!

However, people posting code to this list are often reminded of other
algorithms (are you sorting this list? why not use quicksort?) so it
seems possible at least for humans to guess the intentions of another
coder sometimes, and provide better code.

Every time something is described at a higher level (these levels
cannot be found within the original system but must be created by a
leap of the imagination or by divine intervention) there seem to be
ways to remove superfluous things and be more effective even at the
lower level.

Anton

'answering all questions destroys the universe?'
 
R

Rocco Moretti

Kay said:
Anton Vredegoor wrote:




Are You shure that this problem is effectively solvable in any
language? Since You did not precise Your idea I'm not shure whether You
want to solve the halting-problem in PyPy or not ;)

I'm always amazed at how many people take the intractableness of the
halting problem as a reason to not even try. Sure, you can't tell if an
*arbitrary* program halts or not, but there are many where you can.
(Assuming a perfect Python Interpreter):

c = 5 + 6

halts.

c = 5
while 1:
c = c + 6

doesn't.

A trip through the Zope internals may be a little harder to decide. But
one might argue that code that does not give a clear "yes, it halts" is
poorly written, and should be rewritten to be better behaved.

At any rate, Anton isn't talking about solving the halting problem, but
is hinting more along the lines of introspecting the code and, for
example, substituting a bubble sort for a quick sort when the specific
circumstances determine that it would be quicker. Granted, giving the
"best" code sequence would be tough to impossible to determine, but a
"better" code sequence would still make people happy.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,163
Latest member
Sasha15427
Top