what's the point of rpython?

B

Brendan Miller

So I kind of wanted to ask this question on the pypy mailing list..
but there's only a pypy-dev list, and I don't want to put noise on the
dev list.

What's the point of RPython? By this, I don't mean "What is RPython"?
I get that. I mean, why?

The goals of the pypy project seems to be to create a fast python
implementation. I may be wrong about this, as the goals seem a little
amorphous if you look at their home page.

So, to do that this RPython compiler was created? Except that it
doesn't compile python, it compiles a static subset of python that has
type inference like ML.

This RPython thing seems like a totally unnecessary intermediate step.
Instead of writing an implementation of one language, there's an
implementation of two languages.

Actually, it seems like it harms the ultimate goal. For the
interpreted pyton to be fast, the interpreter has to be written in a
reasonably fast language. ML, and C++ compilers have had a lot of work
put into their optimization steps. A lot of the performance boost you
get from a static language is that knowing the types at compile time
lets you inline code like crazy, unroll loops, and even execute code
at compile time.

RPython is a statically typed language because I guess the developers
associate static languages with speed? Except that they use it to
generate C code, which throws away the type information they need to
get the speed increase. Huh? I thought the goal was to write a fast
dynamic language, not a slow static one?

Is this going anywhere or is this just architecture astronautics?

The RPython project seems kind of interseting to me and I'd like to
see more python implementations, but looking at the project I can't
help but think that they haven't really explained *why* they are doing
the things they are doing.

Anyway, I can tell this is the sort of question that some people will
interpret as rude. Asking hard questions is never polite, but it is
always necessary :)

Thanks,
Brendan
 
L

Luis M. González

So I kind of wanted to ask this question on the pypy mailing list..
but there's only a pypy-dev list, and I don't want to put noise on the
dev list.

What's the point of RPython? By this, I don't mean "What is RPython"?
I get that. I mean, why?

The goals of the pypy project seems to be to create a fast python
implementation. I may be wrong about this, as the goals seem a little
amorphous if you look at their home page.

So, to do that this RPython compiler was created? Except that it
doesn't compile python, it compiles a static subset of python that has
type inference like ML.

This RPython thing seems like a totally unnecessary intermediate step.
Instead of writing an implementation of one language, there's an
implementation of two languages.

Actually, it seems like it harms the ultimate goal. For the
interpreted pyton to be fast, the interpreter has to be written in a
reasonably fast language. ML, and C++ compilers have had a lot of work
put into their optimization steps. A lot of the performance boost you
get from a static language is that knowing the types at compile time
lets you inline code like crazy, unroll loops, and even execute code
at compile time.

RPython is a statically typed language because I guess the developers
associate static languages with speed? Except that they use it to
generate C code, which throws away the type information they need to
get the speed increase. Huh? I thought the goal was to write a fast
dynamic language, not a slow static one?

Is this going anywhere or is this just architecture astronautics?

The RPython project seems kind of interseting to me and I'd like to
see more python implementations, but looking at the project I can't
help but think that they haven't really explained *why* they are doing
the things they are doing.

Anyway, I can tell this is the sort of question that some people will
interpret as rude. Asking hard questions is never polite, but it is
always necessary :)

Thanks,
Brendan

Check out this thread:
http://groups.google.com/group/comp...931?lnk=gst&q=enlighten+pypy#2174f796cd687931

One of pypy's goals is having a more flexible implementation, easier
to experiment with than cpython.
Having all written in the same language facilitates this task.
Why Rpython? Because by avoiding the dinamicity of the whole python
language, it is possible to translate this code to a static one (c,
java, whatever).
The translated rpython interpreter aims to be something similar in
performance to cpython. The only difference is that cpython was
written from scratch in c, while pypy's interpreter was written in
rpython and then translated automatically to c.

The pypy team intends to achieve greater speed and performance by
adding a JIT compiler to this interpreter.
So pypy-c + JIT = faster python.

I hope this helps...

Luis
 
A

alex23

What's the point of RPython? By this, I don't mean "What is RPython"?
I get that. I mean, why?

This is more or less covered in the FAQ:

"The restrictions are to ensure that type inference (and so,
ultimately, translation to other languages) of RPython programs is
possible. These restrictions only apply after the full import happens,
so at import time arbitrary Python code can be executed."

http://codespeak.net/pypy/dist/pypy/doc/faq.html#id25
The goals of the pypy project seems to be to create a fast python
implementation. I may be wrong about this, as the goals seem a little
amorphous if you look at their home page.

The home page itself is ambiguous, and does oversell the performance
aspect. The *actual* goal as outlined by their official docs is to
implement Python in Python, at every level. If this means utilising a
less-dynamic subset of Python for the lower levels, its -still- at
least more-Python-than-C.

"PyPy is both:
* a reimplementation of Python in Python, and
* a framework for implementing interpreters and virtual machines for
programming languages, especially dynamic languages."

http://codespeak.net/pypy/dist/pypy/doc/faq.html#id11

The PyPy devs feel that this will allow them to more easily experiment
with optimising the interpreter for greater speeds, but that isn't one
of the stated goals (just, apparently, their 'secret' one).
This RPython thing seems like a totally unnecessary intermediate step.
Instead of writing an implementation of one language, there's an
implementation of two languages.

My understanding is that the 'higher' level Python language features
are implemented on top of the restricted RPython features, so it's
more of an organic growth of one language than two separate
implementations.
A lot of the performance boost you
get from a static language is that knowing the types at compile time
lets you inline code like crazy, unroll loops, and even execute code
at compile time.

RPython is a statically typed language because I guess the developers
associate static languages with speed?

Doesn't the first paragraph above state the same thing that you
question in the next?
Except that they use it to
generate C code, which throws away the type information they need to
get the speed increase. Huh? I thought the goal was to write a fast
dynamic language, not a slow static one?

They're not just generating C code, they currently also target LLVM,
JVM and CLI.
Is this going anywhere or is this just architecture astronautics?

Well, they actually seem to have achieved some substantial gains, so I
think it's unfair to imply that the project isn't based on pragmatic
objectives. Their initial JIT prototype failed to work as well as
expected, and they've spent some time re-thinking the approach; I'm
happier to wait for a stably performing JIT that can be improved over
time than a short term gain.
The RPython project seems kind of interseting to me and I'd like to
see more python implementations, but looking at the project I can't
help but think that they haven't really explained *why* they are doing
the things they are doing.

A lot of this is covered in the FAQ. Whether you agree with their
approach or not, they're the ones actively pushing this effort
forward.

It's been a few months since the last update, but the PyPy status blog
may have more information for you. At the very least, it's a venue to
discuss your concerns directly with the PyPy devs.

http://morepypy.blogspot.com
 
B

Brendan Miller

The goals of the pypy project seems to be to create a fast python
The home page itself is ambiguous, and does oversell the performance
aspect. The *actual* goal as outlined by their official docs is to
implement Python in Python, at every level.

Ok fair enough. In some ways I see that as more of a purely
intellectual exercise than the practical endeavor that I assumed the
project was originally.

However, one of the links I was sent had one of the devs talking about
using the translation process to make C/Java/LLVM implementations out
of the same interpreter code. I'll say that makes a lot of sense.

Another question I was wondering about is whether they plan on
maintaining good C bindings? Will existing bindings for third party
libraries be able to work?

Also, are they going to do away with the GIL? The python devs seem to
consider the GIL a non-issue, though they may change their mind in 3
years when we all have 32 core desktops, until then getting rid of the
GIL would make pypy pretty attractive in some quarters. I know the
scons project was running into GIL issues.

Finally, I'm pretty unclear on what versions of python that pypy is targeting.
 
A

alex23

Ok fair enough. In some ways I see that as more of a purely
intellectual exercise than the practical endeavor that I assumed the
project was originally.

Well, it opens up Python as a whole for experimentation & modification
by Python developers, which seems pretty practical to me :)
Another question I was wondering about is whether they plan on
maintaining good C bindings? Will existing bindings for third party
libraries be able to work?

You really should run through the whole faq:

http://codespeak.net/pypy/dist/pypy/doc/faq.html#can-i-use-cpython-extension-modules

(Spoiler alert: the answer is no.)
Also, are they going to do away with the GIL?

One FAQ entry before the above:

http://codespeak.net/pypy/dist/pypy/doc/faq.html#do-threads-work-what-are-the-modules-that-work
The python devs seem to
consider the GIL a non-issue, though they may change their mind in 3
years when we all have 32 core desktops, until then getting rid of the
GIL would make pypy pretty attractive in some quarters. I know the
scons project was running into GIL issues.

It's not a case of the Python devs digging their heels in and not
giving the world what it desperately wants. Here's an article by Guido
talking about the last attempt to remove the GIL and the performance
issues that arose:

"I'd welcome a set of patches into Py3k *only if* the performance for
a single-threaded program (and for a multi-threaded but I/O-bound
program) *does not decrease*."

http://www.artima.com/weblogs/viewpost.jsp?thread=214235
Finally, I'm pretty unclear on what versions of python that pypy is targeting.

I wonder where such a frequently asked question could be answered? ;)

http://codespeak.net/pypy/dist/pypy/doc/faq.html#which-python-version-2-x-does-pypy-implement
 
P

Paul Rubin

alex23 said:
Here's an article by Guido talking about the last attempt to remove
the GIL and the performance issues that arose:

"I'd welcome a set of patches into Py3k *only if* the performance for
a single-threaded program (and for a multi-threaded but I/O-bound
program) *does not decrease*."

The performance decrease is an artifact of CPython's rather primitive
storage management (reference counts in every object). This is
pervasive and can't really be removed. But a new implementation
(e.g. PyPy) can and should have a real garbage collector that doesn't
suffer from such effects.
 
A

andrew cooke

Since this is a PyPy bashing thread, maybe it's an appropriate place
to suggest that the project has got a little bit waylaid by exploring
cool things instead of releasing a useful final result?

I am not questioning rpython directly - the case for something like
that is obvious. But there's a question of balance. It's possible to
go on building ever more complex systems which are theoretically
justified, but which postpone ever finishing the job. At some point
there has to be a "good enough".

To some extent I am playing devil's advocate here, but as an outside
who looked at PyPy a while back, my uninformed and naive impression
was that the project was suffering from the kid of issues I have
caricatured above....

Andrew

PS I guess you are aware of worse is better etc? I think this may
also be a US/Euro culture issue...
 
L

Luis M. González

Since this is a PyPy bashing thread, maybe it's an appropriate place
to suggest that the project has got a little bit waylaid by exploring
cool things instead of releasing a useful final result?

I am not questioning rpython directly - the case for something like
that is obvious.  But there's a question of balance.  It's possible to
go on building ever more complex systems which are theoretically
justified, but which postpone ever finishing the job.  At some point
there has to be a "good enough".

To some extent I am playing devil's advocate here, but as an outside
who looked at PyPy a while back, my uninformed and naive impression
was that the project was suffering from the kid of issues I have
caricatured above....

Andrew

PS I guess you are aware of worse is better etc?  I think this may
also be a US/Euro culture issue...

It's worth adding that, regarding the goal of having a faster python,
we have had for a long time a solid and useful proof-of-concept in
psyco. Pypy rolls up on this concept and adds many more useful
features, not all of them related to speed but to flexibility.
I have no doubt the project will be succesful. The question is how
long we will have to wait...

Luis
 
B

Brendan Miller

The performance decrease is an artifact of CPython's rather primitive
storage management (reference counts in every object). This is
pervasive and can't really be removed. But a new implementation
(e.g. PyPy) can and should have a real garbage collector that doesn't
suffer from such effects.

That's interesting, I hadn't heard the reference counting mechanism
was related to the GIL. Is it just that you need to lock the reference
count before mutating it if there's no GIL? Really, that shouldn't be
the case. Reference counting can be done with a lock free algorithm.

Garbage collection is definitely in vogue right now. However, people
tend to treat it more like a religion than an algorithm. Garbage
collection vs reference counting actually has some trade offs both
ways. GC gets you some amortized performance gains, and some space
gains because you don't need to hang on to a bunch of counts. However,
GC also has the problem of having a very loose memory profile and poor
interactive performance during compaction if the heap is large. In
some cases this discussion becomes complicated with python because
python has both reference counting and GC.
 
P

Paul Rubin

Brendan Miller said:
That's interesting, I hadn't heard the reference counting mechanism
was related to the GIL. Is it just that you need to lock the reference
count before mutating it if there's no GIL?

Yes. Someone tried inserting such a lock but it slowed down the
single-cpu case unacceptably.
Really, that shouldn't be
the case. Reference counting can be done with a lock free algorithm.

That's interesting, got a reference? (no pun intended)
 
P

Paul Rubin


I found a paper by Detlefs et al describing a method which is

a) patented
b) can potentially lock out some threads from ever running, and
c) relies on a hardware instruction (double compare and swap)
that's not available on most processors.

There are well known concurrent and parallel GC techniques that
don't have that problem, see for example Cheng's dissertation:
http://reports-archive.adm.cs.cmu.edu/anon/2001/CMU-CS-01-174.pdf

GHC 6.10 uses a parallel stop-the-world gc that is simpler, I think.
 
B

Brendan Miller

Maybe I'm missing something here but a lock free algorithm for
reference counting seems pretty trivial. As long as you can atomically
increment and decrement an integer without locking you are pretty much
done.

For a reference implementation of lock free reference counting on all
common platforms check out boosts implementation of shared_ptr (a
reference counting smart pointer designed around multithreaded use
cases).
There are well known concurrent and parallel GC techniques that
Hmm... I didn't really mention poor parallelism as a problem of GC. As
I see it the two trade offs that have to be made for GC vs alternative
techniques. The "embarrassing pause" during compaction which makes it
impossible to use for applications like interactive video display that
can't halt to compact a several gigabyte heap without causing stutter,
and the loose memory profile.

Maybe the document you sent me addresses those, and I'd be interested
if it did. It's 150~ pages though so I haven't really had time to read
it yet.

As far as parallelism problems with GC go... the only ones I can
imagine is that if you had a lot of threads going generating lots of
garbage you would need to start to compact more frequently. Since
compaction halts all threads, this could potentially cause very
frequent compactions? Is that what you were getting at? I'd wondered
about that before, but didn't know for a fact whether it came up in
real world scenarios.
 
P

Paul Rubin

Brendan Miller said:
Maybe I'm missing something here but a lock free algorithm for
reference counting seems pretty trivial. As long as you can atomically
increment and decrement an integer without locking you are pretty much
done.

What cpu's do you know of that can atomically increment and decrement
integers without locking?
 
C

Carl Banks

Maybe I'm missing something here but a lock free algorithm for
reference counting seems pretty trivial. As long as you can atomically
increment and decrement an integer without locking you are pretty much
done.

You're missing that most of the platforms that Python supports can't
actually do this. Keep in mind that it has to be atomic across all
cores.

(For Python to use such a technique, it would have to be doable on
every platform Python supports. They aren't going to get rid of the
GIL for some platforms and not others.)

For a reference implementation of lock free reference counting on all
common platforms check out boosts implementation of shared_ptr (a
reference counting smart pointer designed around multithreaded use
cases).

I just looked at the boost documentation, which claims that multiple
asynchronous writes to the same shared_ptr results in undefined
behavior. That will not suffice for Python reference counting.



Carl Banks
 
P

Paul Rubin

Brendan Miller said:
As long as you can atomically increment and decrement an integer
without locking you are pretty much done.

Most cpu's can't do that.
For a reference implementation of lock free reference counting on all
common platforms check out boosts implementation of shared_ptr (a
reference counting smart pointer designed around multithreaded use
cases).

That sounds very mysterious to me--are you sure it is intended for
multiprocessing and not just multiple threads on a single processor?
Do you have a url for the code?
I see it the two trade offs that have to be made for GC vs alternative
techniques. The "embarrassing pause" during compaction which makes it
impossible to use for applications like interactive video display that
can't halt to compact a several gigabyte heap without causing stutter,
and the loose memory profile.

The embarassing pause is characteristic of a so-called "stop the
world" gc. A gc where any pauses are guaranteed bounded in size
(preferably to a small value) is called a "real time" gc. I believe
Cheng's algorithm is a real time alg but it's been a while since I
read that thesis.

Anyway, nobody programs interactive video in Python, and Python's ref
counting scheme is not real time since it may have to do arbitrarily
many decrefs when you release a large structure.
As far as parallelism problems with GC go... the only ones I can
imagine is that if you had a lot of threads going generating lots of
garbage you would need to start to compact more frequently. Since
compaction halts all threads, this could potentially cause very
frequent compactions? Is that what you were getting at?

I'm not sure what you're asking. Parallelism is used for making
things go faster, but making things parallel usually complicates them.
A parallel realtime gc is sort of the holy grail.

Cheng's thesis is pretty readable as I remember, and discusses most of
these issues. There is also a book by Appel about gc, but it's
perhaps a little bit dated by now.
 
K

Kay Schluehr

Is this going anywhere or is this just architecture astronautics?

The RPython project seems kind of interseting to me and I'd like to
see more python implementations, but looking at the project I can't
help but think that they haven't really explained *why* they are doing
the things they are doing.

Remember that the original objective of PyPy was to improve the JIT.
Psyco is limited by the fact that the whole runtime is implemented in
C. The infamous "faster than C" actually refers to work on program
specializers on C code i.e. treating C as a language that is JIT
compiled on a fine grained level ( block structure - whole function
JIT compilation wouldn't obviously yield any advantages ).

So it is not just the application level Python code that shall run
through the JIT but also the interpreter level code. So why not equate
them and think about interpreter level code as Python as well? This
might be the idea. But then the problem of bootstrapping comes up:
there is no interpreter level code with the required properties. Hence
RPython that can serve as a foundation.

I'm also not sure I like the approach. Rather than designing a whole
new runtime for having a fully reflective system I'd model C in Python
( call it CiPy ) and create a bijection between CiPy code and C code.
Once this has been established one can study other translations of
CiPy or Python into CiPy ( not unlike Pyrex/Cython ) doing systematic
refactorings on CiPy code in Python, study properties using runtime
reflection, experiment with macro systems etc. All of this is done in
PyPy as well but sometimes it seems the team has created more new
problems than they solved.
Anyway, I can tell this is the sort of question that some people will
interpret as rude. Asking hard questions is never polite, but it is
always necessary :)

I think it is a good question.
 
S

skip

Carl> I just looked at the boost documentation, which claims that
Carl> multiple asynchronous writes to the same shared_ptr results in
Carl> undefined behavior. That will not suffice for Python reference
Carl> counting.

Carl, I'm quite unfamiliar with Boost and am not a C++ person, so may have
read what you saw but not recognized it in the C++ punctuation soup. I
couldn't find what you referred to. Can you provide a URL?

Thanks,
 
P

Paul Rubin

Carl, I'm quite unfamiliar with Boost and am not a C++ person, so may have
read what you saw but not recognized it in the C++ punctuation soup. I
couldn't find what you referred to. Can you provide a URL?

http://www.boost.org/doc/libs/1_37_0/libs/smart_ptr/shared_ptr.htm#ThreadSafety

Note there is also something called "interprocess shared pointers"
(link below) that don't have that caveat. I wonder if they use locks.

http://www.boost.org/doc/libs/1_37_...nterprocess.interprocess_smart_ptr.shared_ptr
 
B

Brendan Miller


I think you are misreading that. It says that multiple assignments to
different copies of a share_ptr in different threads are fine. This is
the important bit because copies of the same pointer share the same
reference count, and assignments and resets will decrement that ref
count. They say they don't handle mutations of the *same* pointer in
different threads, which is a different issue.

The programmer is responsible for synchronizing access to the pointer,
and the pointed to object, but not the ref count. This may be not be
obvious if you don't use shared_ptr a lot.

You also mentioned in an earlier post that most processors don't
support automic increments... I'm hesitant to dispute you here because
this is outside of my field of expertise. However, a quick google
search for "x86 atomic increment" comes up with this:

XADD

http://www.codemaestro.com/reviews/8
http://siyobik.info/index.php?module=x86&id=159

Again, I'm not an assembly guru, but his sounds like exactly what
you'd want. It gains exclusive access to system memory in a
multi-processor environtment without leaving user space. Thus XADD is
an atomic increment/decrement. It would be educational if someone more
famliar with x86 than me could speak to the performance merits of this
on modern multicore machines.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top