Is a "real" C-Python possible?

B

Bruno Desthuilliers

(e-mail address removed) a écrit :
(snip)
I'd like to provide some evidence that Python is *faster* than Java.

Then benchmark the time taken for the interpreter (oops, sorry: "VM") to
start !-)
 
S

sturlamolden

I understand that the standard Python distribution is considered
the C-Python. Howerver, the current C-Python is really a combination
of C and Python implementation. There are about 2000 Python files
included in the Windows version of Python distribution. I'm not sure
how much of the C-Python is implemented in C but I think the more
modules implemented in C, the better performance and lower memory
footprint it will get.

Donald Knuth, one of the fathers of modern computer science, is famous
for stating that "premature optimization is the root of all evil in
computer science." A typical computer program tends to have
bottlenecks that accounts for more than 90% of the elapsed run time.
Directing your optimizations anywhere else is futile.

Writing a program in C will not improve the speed of your hardware. If
the bottleneck is a harddisk or a network connection, using C will not
change that. Disk i/o is a typical example of that. It is not the
language that determines the speed by which Python or C can read from
a disk. It is the disk itself.

I had a data vizualization program that was slowed down by the need to
move hundreds of megabytes of vertex data to video RAM. It would
obviously not help to make the handful of OpenGL calls from C instead
of Python. The problem was the amount of data and the speed of the
hardware (ram or bus). The fact that I used Python instead of C
actually helped to make the problem easier to solve.

We have seen several examples that 'dynamic' and 'interpreted'
languages can be quite efficient: There is an implementation of Common
Lisp - CMUCL - that can compete with Fortran in efficiency for
numerical computing. There are also versions of Lisp than can compete
with the latest versions of JIT-compiled Java, e.g. SBCL and Allegro.
As it happens, SBCL and CMUCL is mostly implemented in Lisp. The issue
of speed for a language like Python has a lot to do with the quality
of the implementation. What really makes CMUCL shine is the compiler
that emits efficient native code on the fly. If it is possible to make
a very fast Lisp, it should be possible to make a very fast Python as
well. I remember people complaining 10 years ago that 'Lisp is so
slow'. A huge effort has been put into making Lisp efficient enough
for AI. I hope Python some day will gain a little from that effort as
well.

We have a Python library that allows us to perform a wide range of
numerical tasks at 'native speed': NumPy (http://www.scipy.org). How
such array libraries can be used to get excellent speedups is
explained here: http://home.online.no/~pjacklam/matlab/doc/mtt/index.html

We obviously need more effort to make Python more efficient for CPU
bound tasks. Particularly JIT compilation like Java, compilation like
Lisp or data specialization like Psyco.

But writing larger parts of the standard library in C is not a
solution.
 
S

sturlamolden

Nevertheless it is just one algorithm that beats Python in an area that
is well known to be slow. Python's numbers are several factors slower
than C code because the overhead of the dynamic language throws lots of
data out of the cache line. If you need fast and highly optimized int
and floating point operations you can rewrite the algorithm in C and
create a Python interface for it.

Lisp is a dynamically typed language. CMUCL can compete with Fortran
for numerical work. SBCL can compete with the Java server VM. If the
current CPython VM throws data out of the cache line, then it must be
a design flaw in the VM.
 
B

Bruno Desthuilliers

Jack a écrit :
> Aahz a écrit


I think most Java-Python benchmarks you can find online will indicate
that Java is a 3-10 times faster. A few here:
http://mail.python.org/pipermail/python-list/2002-January/125789.html
http://blog.snaplogic.org/?p=55

This last one points to a funny anthology (year 2k, Python 1.5.2 and
Java 1.1) piece of a paper (electronic paper of course) if you read past
the "benchmark" part (which BTW doesn't pretend to be that serious - the
'do nothing' test is just hilarious). I really like this part (sorry, I
only kept the titles - but you can have a look at the whole text, url is
below):

"""
* Unresolved Release-Critical Bugs in Java*
1. Don't Use Swing.
(snip rant about Swing memory leaks, just kept this:)
The AWT does this too, but you could probably write an application that
ran for longer than 20 minutes using it.

2. Don't allocate memory.
(snip)
3. Don't use java.lang.String.intern
(snip)
4. Don't expect your app to run
(snip)
5. Don't print anything
(snip)
6. Don't write large apps
(snip)
7. Don't write small apps
(snip)
"""

Heck... This sure looks like a very high price to pay wrt/ "raw speed"
gain !-)

Oh, yes, the url:
http://www.twistedmatrix.com/users/glyph/rant/python-vs-java.html

Yeps, that's it : twisted. No surprise the guy "decided to move Twisted
Reality to python."

It's really worth reading. While I was by that time a total newbie to
programming, and will probably never be anything close to the author, it
looks like we took a similar decision at the same time, and mostly based
on similar observations : *in practice*, Java sucks big time - when
Python JustWorks(tm).
 
A

Aahz

Donald Knuth, one of the fathers of modern computer science, is famous
for stating that "premature optimization is the root of all evil in
computer science."

From my .sig database:

"Premature optimization is the root of all evil in programming."
--C.A.R. Hoare (often misattributed to Knuth, who was himself quoting
Hoare)
 
B

Bruno Desthuilliers

sturlamolden a écrit :
Lisp is a dynamically typed language. CMUCL can compete with Fortran
for numerical work. SBCL can compete with the Java server VM. If the
current CPython VM throws data out of the cache line, then it must be
a design flaw in the VM.

Or a lack of time and money. Lisp is one of the older programming
languages around, and at a time had BigBucks(tm) invested on it to try
and make it practically usable.
 
S

sturlamolden

Or a lack of time and money. Lisp is one of the older programming
languages around, and at a time had BigBucks(tm) invested on it to try
and make it practically usable.

Yes. But strangely enough, the two Lisp implementations that really
kick ass are both free and not particularly old. CMUCL and SBCL proves
that you can make a dynamic language implementation extremely
efficient if you try hard enough. There are also implementations of
Scheme (e.g. Bigloo) that shows the same.
 
S

sturlamolden

"Premature optimization is the root of all evil in programming."
--C.A.R. Hoare (often misattributed to Knuth, who was himself quoting
Hoare)

Oh, I was Hoare? Thanks. Anyway, it doesn't change the argument that
optimizing in wrong places is a waste of effort.
 
S

sturlamolden

The Ruby developers are allowed to be proud. They were able to optimize
some aspects of the implementation to get one algorithm about 14 times
faster. That's good work. But why was it so slow in the first place?

The thing to notice here is that Congiano spent 31.5 seconds computing
36 Fibonacci numbers in Python and 11.9 seconds doing the same in
Ruby. Those numbers are ridiculous! The only thing they prove is that
Congiano should not be programming computers. Anyone getting such
results should take a serious look at their algoritm instead of
blaming the language. I don't care if it takes 31.5 seconds to compute
36 Fibonacci numbers in Python 2.5.1 with the dumbest possible
algorithm.
 
K

Kay Schluehr

We obviously need more effort to make Python more efficient for CPU
bound tasks. Particularly JIT compilation like Java, compilation like
Lisp or data specialization like Psyco.

Given that the Python core team has been mostly silent about JIT
compilation and Armin Rigos work in particular which started 5 years
ago ( Psyco will not be promoted towards Python 3.0 and there is no
indication that anyone but Armin would maintain Psyco ) I wonder about
this sudden insight. But maybe you can tell us more when you work on
such stuff for CPython yourself? Given your status in the core Python
team this would have a chance of not being just a loosely coupled
independent project as almost all the interesting stuff that happens
around Python these days.
 
H

Hrvoje Niksic

sturlamolden said:
Yes. But strangely enough, the two Lisp implementations that really
kick ass are both free and not particularly old.

Not two, but one -- SBL is simply a fork of CMU CL. As for their age,
the CMU CL states that it has been "continually developed since the
early 1980s".
 
D

Duncan Booth

sturlamolden said:
The thing to notice here is that Congiano spent 31.5 seconds computing
36 Fibonacci numbers in Python and 11.9 seconds doing the same in
Ruby. Those numbers are ridiculous! The only thing they prove is that
Congiano should not be programming computers. Anyone getting such
results should take a serious look at their algoritm instead of
blaming the language. I don't care if it takes 31.5 seconds to compute
36 Fibonacci numbers in Python 2.5.1 with the dumbest possible
algorithm.
Quite so.

Take something like
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/498110
and then modify the Python code from the "Ruby smokes Python" article by
the addition of @memoize(3) to decorate the otherwise unchanged fib
function: the Python runtime drops down to 0.002 seconds.

That is just slightly better than Ruby's 11.9 seconds although I'm sure the
Ruby code would also gain as much from a memoize decorator.


from memoize import memoize

@memoize(3)
def fib(n):
if n == 0 or n == 1:
return n
else:
return fib(n-1) + fib(n-2)

from time import clock
start = clock()
for i in range(36):
print "n=%d => %d" % (i, fib(i))
print clock()-start


When I run this (with output directed to a file: I'm not trying to time
windows console speed), the output is:

n=0 => 0
n=1 => 1
n=2 => 1
n=3 => 2
n=4 => 3
n=5 => 5
n=6 => 8
n=7 => 13
n=8 => 21
n=9 => 34
n=10 => 55
n=11 => 89
n=12 => 144
n=13 => 233
n=14 => 377
n=15 => 610
n=16 => 987
n=17 => 1597
n=18 => 2584
n=19 => 4181
n=20 => 6765
n=21 => 10946
n=22 => 17711
n=23 => 28657
n=24 => 46368
n=25 => 75025
n=26 => 121393
n=27 => 196418
n=28 => 317811
n=29 => 514229
n=30 => 832040
n=31 => 1346269
n=32 => 2178309
n=33 => 3524578
n=34 => 5702887
n=35 => 9227465
0.00226425425578
 
M

MonkeeSage

Quite so.

Take something likehttp://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/498110
and then modify the Python code from the "Ruby smokes Python" article by
the addition of @memoize(3) to decorate the otherwise unchanged fib
function: the Python runtime drops down to 0.002 seconds.

That is just slightly better than Ruby's 11.9 seconds although I'm sure the
Ruby code would also gain as much from a memoize decorator.

from memoize import memoize

@memoize(3)
def fib(n):
if n == 0 or n == 1:
return n
else:
return fib(n-1) + fib(n-2)

from time import clock
start = clock()
for i in range(36):
print "n=%d => %d" % (i, fib(i))
print clock()-start

When I run this (with output directed to a file: I'm not trying to time
windows console speed), the output is:

n=0 => 0
n=1 => 1
n=2 => 1
n=3 => 2
n=4 => 3
n=5 => 5
n=6 => 8
n=7 => 13
n=8 => 21
n=9 => 34
n=10 => 55
n=11 => 89
n=12 => 144
n=13 => 233
n=14 => 377
n=15 => 610
n=16 => 987
n=17 => 1597
n=18 => 2584
n=19 => 4181
n=20 => 6765
n=21 => 10946
n=22 => 17711
n=23 => 28657
n=24 => 46368
n=25 => 75025
n=26 => 121393
n=27 => 196418
n=28 => 317811
n=29 => 514229
n=30 => 832040
n=31 => 1346269
n=32 => 2178309
n=33 => 3524578
n=34 => 5702887
n=35 => 9227465
0.00226425425578

Another point is, the reason the ruby code shows such a performance
increase is because of the way it wraps native (C) types for integers
in the the new byte compiler; i.e., it's a directed optimization,
which the example code exploits to its full extent. But with
dictionary access, for example, python still creams ruby (by a 2/1
factor in my tests). Speaking as someone who uses both python and
ruby, I can say that ruby 1.9 is approaching python's speed, which is
very cool, but is still not quite as fast as python in general (the
whole "smokes python" bit is just propaganda that utilizes a specific
feature vector, and is generally unhelpful).

Regards,
Jordan
 
S

sturlamolden

@memoize(3)
def fib(n):
if n == 0 or n == 1:
return n
else:
return fib(n-1) + fib(n-2)


The thing I would do is:

def fibo(n):
while 1:
try:
return fibo.seq[n]
except AttributeError:
fibo.seq = [0, 1, 1]
except IndexError:
fibo.seq.append( fibo.seq[-2] + fibo.seq[-1] )


Here are some timings I got on my laptop (1.7 GHz Pentium M, Windows
XP, Python 2.5.1), calculating 36 Fibonacci numbers:

First run, initalizing cache: 243 µs
Second run, exploiting cache: 28 µs
Third run, exploting cache: 27 µs

This is 6 orders of magnitude faster than Congiano's benchmark. That
is a speed up by a factor of a million.
 
J

John Nagle

We're ten years into Python, and it's still a naive interpreter.
It's time for a serious optimizing compiler. Shed Skin is going
in the right direction. But for some reason, people seem to dislike the
Shed Skin effort. Its author writes "Am I the only one seeing the potential
of an implicitly statically typed Python-like-language that runs at
practically the same speed as C++?"

"For a set of 27 non-trivial test programs (at about 7,000 lines in total;
.... measurements show a typical speedup of 2-40 times over Psyco, about 10 on
average, and 2-220 times over CPython, about 35 on average." So that's
what's possible.

I'm surprised that Google management isn't pushing Guido towards
doing something about the performance problem.

John Nagle
 
P

Paul Rudin

sturlamolden said:
Oh, I was Hoare? Thanks. Anyway, it doesn't change the argument that
optimizing in wrong places is a waste of effort.

Although sometimes people seem to think that it goes "optimisation is
the root...". The "premature" bit is significant.
 
C

Chris Mellon

We're ten years into Python, and it's still a naive interpreter.

This is an absurd misrepresentation of the state of the Python VM.
It's time for a serious optimizing compiler. Shed Skin is going
in the right direction. But for some reason, people seem to dislike the
Shed Skin effort. Its author writes "Am I the only one seeing the potential
of an implicitly statically typed Python-likea-lnguage that runs at
practically the same speed as C++?"

"For a set of 27 non-trivial test programs (at about 7,000 lines in total;
... measurements show a typical speedup of 2-40 times over Psyco, about 10 on
average, and 2-220 times over CPython, about 35 on average." So that's
what's possible.

.... with roughly a hundredth of the python standard library, and a
bunch of standard python features not even possible. I like
generators, thanks.

If shedskin can actually match Pythons feature set and provide the
performance it aspires to, thats great, and I may even start using it
then. But in the meantime, hardly anything I write is CPU bound and
when it is I can easily optimize using other mechanisms. Shedskin
doesn't give me anything that's worth my time to improve on it, or the
restrictions it places on my code. I think JIT is the future of
optimization anyway.
I'm surprised that Google management isn't pushing Guido towards
doing something about the performance problem.

Assuming your conclusion (ie, that there's a performance problem to do
something about) doesn't prove your case.
 
A

Adam Funk

We have seen several examples that 'dynamic' and 'interpreted'
languages can be quite efficient: There is an implementation of Common
Lisp - CMUCL - that can compete with Fortran in efficiency for
numerical computing. There are also versions of Lisp than can compete
with the latest versions of JIT-compiled Java, e.g. SBCL and Allegro.
As it happens, SBCL and CMUCL is mostly implemented in Lisp. The issue
of speed for a language like Python has a lot to do with the quality
of the implementation. What really makes CMUCL shine is the compiler
that emits efficient native code on the fly. If it is possible to make
a very fast Lisp, it should be possible to make a very fast Python as
well. I remember people complaining 10 years ago that 'Lisp is so
slow'. A huge effort has been put into making Lisp efficient enough
for AI. I hope Python some day will gain a little from that effort as
well.

I've been told that Torbjörn Lager's implementation of the Brill
tagger in Prolog is remarkably fast, but that it uses some
counter-intuitive arrangements of the predicate and argument
structures in order to take advantage of the way Prolog databases are
indexed.
 
S

Steven D'Aprano

We're ten years into Python, and it's still a naive interpreter.
It's time for a serious optimizing compiler. Shed Skin is going in the
right direction. But for some reason, people seem to dislike the Shed
Skin effort. Its author writes "Am I the only one seeing the potential
of an implicitly statically typed Python-like-language that runs at
practically the same speed as C++?"

"For a set of 27 non-trivial test programs (at about 7,000 lines in
total;
... measurements show a typical speedup of 2-40 times over Psyco, about
10 on average, and 2-220 times over CPython, about 35 on average." So
that's what's possible.

I'm surprised that Google management isn't pushing Guido towards
doing something about the performance problem.

Maybe because it isn't as much a problem as people with C envy assume it
must be? (Disclaimer: I'm not suggesting that John is one of those
people.)

Not that I'd object to anyone else doing the work to speed up Python, but
for the things I use Python for, I've never felt the need to say "Gosh
darn it, my script took twelve milliseconds to run, that's just too
slow!!!". Maybe Google are in the same boat?

Actually, in Google's case, I guess their bottleneck is not Python, but
trying to push around gigabytes of data. That will be slow no matter what
language you write in.
 
D

Diez B. Roggisch

John said:
We're ten years into Python, and it's still a naive interpreter.
It's time for a serious optimizing compiler. Shed Skin is going
in the right direction. But for some reason, people seem to dislike the
Shed Skin effort. Its author writes "Am I the only one seeing the potential
of an implicitly statically typed Python-like-language that runs at
practically the same speed as C++?"

"For a set of 27 non-trivial test programs (at about 7,000 lines in
total; ... measurements show a typical speedup of 2-40 times over Psyco,
about 10 on average, and 2-220 times over CPython, about 35 on
average." So that's
what's possible.

No, it's not. Shedskin is interesting, but just a small subset of Python
- and without completeness, performance is useless.

The PyPy approach is much more interesting - first create a
full-featured Python itself, then create optimizing backends for it,
also for just a language subset - RPython.

And if possible - which it is only in a very limited set of cases for
not type-annotated code - identify parts that conform to RPython's
constraints, and compile that JITly.

Diez
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,279
Latest member
LaRoseDermaBottle

Latest Threads

Top