n-body problem at shootout.alioth.debian.org

P

Peter Maas

I have noticed that in the language shootout at shootout.alioth.debian.org
the Python program for the n-body problem is about 50% slower than the Perl
program. This is an unusual big difference. I tried to make the Python program
faster but without success. Has anybody an explanation for the difference?
It's pure math so I expected Perl and Python to have about the same speed.

Peter Maas, Aachen
 
J

John J. Lee

Peter Maas said:
I have noticed that in the language shootout at shootout.alioth.debian.org
the Python program for the n-body problem is about 50% slower than the Perl
program. This is an unusual big difference. I tried to make the Python program
faster but without success. Has anybody an explanation for the difference?
It's pure math so I expected Perl and Python to have about the same speed.

Peter Maas, Aachen

Replacing ** with multiplication in advance() cut it down to 0.78
times the original running time for me. That brings it along side
PHP, one place below Perl (I didn't bother to upload the edited script).


John
 
B

bearophileHUGS

Peter Maas:
I have noticed that in the language shootout at shootout.alioth.debian.org
the Python program for the n-body problem is about 50% slower than the Perl
program. This is an unusual big difference. I tried to make the Python program
faster but without success. Has anybody an explanation for the difference?
It's pure math so I expected Perl and Python to have about the same speed.

I don't have a good answer for you, but if you profile that code you
find that most of the running time is used here, in this nested loop:

for j in xrange(i + 1, nbodies):
b2 = bodies[j]

dx = b_x - b2.x
dy = b_y - b2.y
dz = b_z - b2.z

distance = sqrt(dx*dx + dy*dy + dz*dz)
aux = dt / (distance*distance*distance)
b_mass_x_mag = b_mass * aux
b2_mass_x_mag = b2.mass * aux

b.vx -= dx * b2_mass_x_mag
b.vy -= dy * b2_mass_x_mag
b.vz -= dz * b2_mass_x_mag
b2.vx += dx * b_mass_x_mag
b2.vy += dy * b_mass_x_mag
b2.vz += dz * b_mass_x_mag

If you are using Python you can use Psyco, it helps a lot (473 s
instead of 2323 s, against 1622 s for the Perl version):
http://shootout.alioth.debian.org/sandbox/benchmark.php?test=nbody&lang=all
If you need speed, you can also use ShedSkin that compiles this Python
program producing an executable essentially fast as the C++ version
(about 16 s on the same computer, the Shootout autor isn't going to add
ShedSkin to the language list yet).

Bye,
bearophile
 
N

Neil Cerutti

I have noticed that in the language shootout at
shootout.alioth.debian.org the Python program for the n-body
problem is about 50% slower than the Perl program. This is an
unusual big difference. I tried to make the Python program
faster but without success. Has anybody an explanation for the
difference? It's pure math so I expected Perl and Python to
have about the same speed.

The Perl program is obfuscated to speed it up; it uses an array
of planet property arrays instead of objects.

Of course, perhaps that's the right way to do it in Perl.

A python solution that indexed lists instead of looking up
attributes of objects might be faster.
 
M

Matteo

Peter said:
I have noticed that in the language shootout at shootout.alioth.debian.org
the Python program for the n-body problem is about 50% slower than the Perl
program. This is an unusual big difference. I tried to make the Python program
faster but without success. Has anybody an explanation for the difference?
It's pure math so I expected Perl and Python to have about the same speed.

Peter Maas, Aachen

Well, one implementation difference is that the Perl script uses
parallel global arrays
(one array for each positional component, one for each velocity
component, and one for mass) whereas the python implementation uses
planet objects. Now, this may not actually make too much of a
difference as it is, but if you really want to make the python version
faster, I'd use the Numeric or Numpy (parts of which may soon be
standardized), and do array-wide computations, rather than using python
loops (at least replacing the innermost loops). My reasonably
well-informed guess is that the speed would improve markedly.

Of course, numpy is not a standard package (though there is a proposal
to add a standard 'array' package to python, based of numpy/numeric),
but if you want to do any numerics with python, you shouldn't be
without it.

-matt
 
S

skip

Peter> I have noticed that in the language shootout at
Peter> shootout.alioth.debian.org the Python program for the n-body
Peter> problem is about 50% slower than the Perl program. This is an
Peter> unusual big difference. I tried to make the Python program faster
Peter> but without success. Has anybody an explanation for the
Peter> difference? It's pure math so I expected Perl and Python to have
Peter> about the same speed.

They do have "about the same speed" (factor of 1.6x between Perl and
Python). The Python #2 improves that to under 1.4x.

I took the original version, tweaked it slightly (probably did about the
same things as Python #2, I didn't look). For N == 200,000 the time went
from 21.94s (user+sys) to 17.22s. Using psyco and binding just the advance
function on my improved version improved that further to 6.48s. I didn't
have the patience to run larger values of N. Diff appended.

I suspect the numpy users in the crowd could do somewhat better.

Skip

% diff -u nbody.py.~1~ nbody.py
--- nbody.py.~1~ 2006-10-06 15:13:31.636675000 -0500
+++ nbody.py 2006-10-06 15:24:46.048689000 -0500
@@ -5,6 +5,7 @@
# contributed by Kevin Carson

import sys
+import psyco

pi = 3.14159265358979323
solar_mass = 4 * pi * pi
@@ -14,19 +15,20 @@
pass

def advance(bodies, dt) :
- for i in xrange(len(bodies)) :
+ nbodies = len(bodies)
+ for i in xrange(nbodies) :
b = bodies

- for j in xrange(i + 1, len(bodies)) :
+ for j in xrange(i + 1, nbodies) :
b2 = bodies[j]

dx = b.x - b2.x
dy = b.y - b2.y
dz = b.z - b2.z
- distance = (dx**2 + dy**2 + dz**2)**0.5
-
- b_mass_x_mag = dt * b.mass / distance**3
- b2_mass_x_mag = dt * b2.mass / distance**3
+ dsqr = (dx*dx + dy*dy + dz*dz)
+ dtd3 = dt / dsqr ** 1.5
+ b_mass_x_mag = dtd3 * b.mass
+ b2_mass_x_mag = dtd3 * b2.mass

b.vx -= dx * b2_mass_x_mag
b.vy -= dy * b2_mass_x_mag
@@ -39,6 +41,7 @@
b.x += dt * b.vx
b.y += dt * b.vy
b.z += dt * b.vz
+psyco.bind(advance)

def energy(bodies) :
e = 0.0
 
S

skip

Skip> I took the original version, tweaked it slightly (probably did
Skip> about the same things as Python #2, I didn't look). For N ==
Skip> 200,000 the time went from 21.94s (user+sys) to 17.22s. Using
Skip> psyco and binding just the advance function on my improved version
Skip> improved that further to 6.48s. I didn't have the patience to run
Skip> larger values of N. Diff appended.

Ah, wait a moment. One more tweak. Make the body class a psyco class.
That improves the runtime to 3.02s. Diff appended.

Skip

% diff -u nbody.py.~1~ nbody.py
--- nbody.py.~1~ 2006-10-06 15:13:31.636675000 -0500
+++ nbody.py 2006-10-06 15:35:36.619976000 -0500
@@ -5,28 +5,31 @@
# contributed by Kevin Carson

import sys
+import psyco
+import psyco.classes

pi = 3.14159265358979323
solar_mass = 4 * pi * pi
days_per_year = 365.24

-class body :
+class body(psyco.classes.psyobj):
pass

def advance(bodies, dt) :
- for i in xrange(len(bodies)) :
+ nbodies = len(bodies)
+ for i in xrange(nbodies) :
b = bodies

- for j in xrange(i + 1, len(bodies)) :
+ for j in xrange(i + 1, nbodies) :
b2 = bodies[j]

dx = b.x - b2.x
dy = b.y - b2.y
dz = b.z - b2.z
- distance = (dx**2 + dy**2 + dz**2)**0.5
-
- b_mass_x_mag = dt * b.mass / distance**3
- b2_mass_x_mag = dt * b2.mass / distance**3
+ dsqr = (dx*dx + dy*dy + dz*dz)
+ dtd3 = dt / dsqr ** 1.5
+ b_mass_x_mag = dtd3 * b.mass
+ b2_mass_x_mag = dtd3 * b2.mass

b.vx -= dx * b2_mass_x_mag
b.vy -= dy * b2_mass_x_mag
@@ -39,6 +42,7 @@
b.x += dt * b.vx
b.y += dt * b.vy
b.z += dt * b.vz
+psyco.bind(advance)

def energy(bodies) :
e = 0.0
 
P

Paul McGuire

Peter Maas said:
I have noticed that in the language shootout at shootout.alioth.debian.org
the Python program for the n-body problem is about 50% slower than the
Perl
program. This is an unusual big difference. I tried to make the Python
program
faster but without success. Has anybody an explanation for the difference?
It's pure math so I expected Perl and Python to have about the same speed.

Peter Maas, Aachen

The advance method is the most fertile place for optimization, since it is
called approximately n(n-1)/2 times (where n=2E7). I was able to trim about
25% from the Python runtime with these changes:

Change:
distance = (dx**2 + dy**2 + dz**2)**0.5

b_mass_x_mag = dt * b.mass / distance**3
b2_mass_x_mag = dt * b2.mass / distance**3
to:
mag = dt / (dx*dx + dy*dy + dz*dz)**1.5

b_mass_x_mag = b.mass * mag
b2_mass_x_mag = b2.mass * mag


And by changing the loop iteration from:

for i in xrange(len(bodies)) :
b = bodies
for j in xrange(i + 1, len(bodies)) :
b2 = bodies[j]

to:

remainingBodies = bodies[:]
for b in bodies[:-1]:
del remainingBodies[0]
for b2 in remainingBodies:


-- Paul
 
B

bearophileHUGS

Ah, wait a moment. One more tweak. Make the body class a psyco class.
That improves the runtime to 3.02s. Diff appended.

Nice. Maybe you can do the same trick with:
from psyco.classes import __metaclass__

If you want you can try that trick with this version of mine:
http://shootout.alioth.debian.org/sandbox/benchmark.php?test=nbody&lang=psyco&id=3

(I can't try it, precompiled Psyco doesn't exists yet for Py2.5). If
it's faster, then I can upload some modified version...

Bye,
bearophile
 
P

Peter Maas

John said:
Replacing ** with multiplication in advance() cut it down to 0.78
times the original running time for me. That brings it along side
PHP, one place below Perl (I didn't bother to upload the edited script).

I tried this also but got only 90%. Strange.
 
P

Peter Maas

Matteo said:
Of course, numpy is not a standard package (though there is a proposal
to add a standard 'array' package to python, based of numpy/numeric),
but if you want to do any numerics with python, you shouldn't be
without it.

I know that nbody.py could be speeded up by psyco and numpy but I was
curious why plain Python was slower than plain Perl. Thanks for your
hints, guys!
 
P

Paddy

Nice. Maybe you can do the same trick with:
from psyco.classes import __metaclass__

If you want you can try that trick with this version of mine:
http://shootout.alioth.debian.org/sandbox/benchmark.php?test=nbody&lang=psyco&id=3

(I can't try it, precompiled Psyco doesn't exists yet for Py2.5). If
it's faster, then I can upload some modified version...

Bye,
bearophile

You might also put the outer loop calling function advance so many
times, into the advance function:

==========
def advance(bodies, dt, n) :
for i in xrange(n) :
for i,b in enumerate(bodies) :

...

def main() :
...
advance(bodies, 0.01, n)

==========

This I like. It points to areas of python where maybe we should be
adding to the standard library whilst also showing off the cream of the
non-standard libraries out their.

- Paddy.
 
G

Giovanni Bajo

Peter said:
I have noticed that in the language shootout at
shootout.alioth.debian.org the Python program for the n-body problem
is about 50% slower than the Perl program. This is an unusual big
difference. I tried to make the Python program faster but without
success. Has anybody an explanation for the difference? It's pure
math so I expected Perl and Python to have about the same speed.

Did you try using an old-style class instead of a new-style class?
 
B

bearophileHUGS

Paddy:
You might also put the outer loop calling function advance so many
times, into the advance function:

Remember that the autors of the Shootout refuse some changes to the
code (like this one), to allow a fair comparison. The rules are strict.

I have improved the Psyco version:
http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?test=nbody&lang=all

If I find the time I'll try to speed up the Python version too
following some of the suggestions of this thread. Note that on that
site there are 4 different comparisons.

The interesting question is how the Lua JIT can be that fast, it's
often faster than Psyco:
http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?test=all&lang=luajit&lang2=psyco

http://luajit.luaforge.net/

Bye,
bearophile
 
P

Peter Maas

Remember that the autors of the Shootout refuse some changes to the
code (like this one), to allow a fair comparison. The rules are strict.

I'm only aware of the rule that the language has to be used "as-is", e.g.
you must not encapsulate time-critical parts in a C extension and announce
the result as "fast Python".

To put the outer loop inside a function is a degree of freedom which is
available in every language so should be allowed in a shoot-out. The global
arrays in the Perl program are on the same track.
 
P

Peter Maas

Giovanni said:
Did you try using an old-style class instead of a new-style class?

The original program has an old style class, changing it to a new
style class increases run time by 25% (version is 2.4.3 btw).
 
P

Peter Maas

Paul said:
The advance method is the most fertile place for optimization, since it is
called approximately n(n-1)/2 times (where n=2E7). I was able to trim about
25% from the Python runtime with these changes:
[...]

My results:

Your changes: 18% runtime decrease
Your changes + objects->lists: 25% runtime decrease

The Python program is much closer to the Perl program then (~2250 vs 1900 sec)
but replacing the objects by lists is not worth the effort because the gain is
small and the Python program becomes less readable.

Taking arrays (from module array) instead of lists increases runtime by 19%!

If it weren't for the shootout I would of course take psyco and numpy.
 
I

igouy

Peter said:
Paul said:
The advance method is the most fertile place for optimization, since it is
called approximately n(n-1)/2 times (where n=2E7). I was able to trim about
25% from the Python runtime with these changes:
[...]

My results:

Your changes: 18% runtime decrease
Your changes + objects->lists: 25% runtime decrease

The Python program is much closer to the Perl program then (~2250 vs 1900 sec)
but replacing the objects by lists is not worth the effort because the gain is
small and the Python program becomes less readable.

Taking arrays (from module array) instead of lists increases runtime by 19%!

If it weren't for the shootout I would of course take psyco and numpy.

--
Regards/Gruesse,

Peter Maas, Aachen
E-mail 'cGV0ZXIubWFhc0B1dGlsb2cuZGU=\n'.decode('base64')

We can have our Python and Psyco too :)

http://shootout.alioth.debian.org/g...y&p1=python-0&p2=psyco-4&p3=iron-0&p4=psyco-3
 
G

Giovanni Bajo

Peter said:
The original program has an old style class, changing it to a new
style class increases run time by 25% (version is 2.4.3 btw).

Ah yes. Years ago when I first saw this test it was still using new-style
classes.

Anyway, this is a bug on its own I believe. I don't think new-style classes are
meant to be 25% slower than old-style classes. Can any guru clarify this?
 
R

Richard Jones

Giovanni said:
Ah yes. Years ago when I first saw this test it was still using new-style
classes.

Anyway, this is a bug on its own I believe. I don't think new-style
classes are meant to be 25% slower than old-style classes. Can any guru
clarify this?

Please try 2.5 - there's been significant optimisation work put into 2.5


Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,434
Messages
2,571,689
Members
48,796
Latest member
Greg L.

Latest Threads

Top