Tightening up Shootout code


Greg Buchholz

I was browsing the C++ programs on "The Computer Language Shootout"
http://shootout.alioth.debian.org/ and for the heck of it, I thought
I'd try my hand at seeing if I couldn't shorten the entry for the
N-Body benchmark...


....Without too much concern for speed (admittedly the main emphasis of
the site), I whittled the C++ example down from 127 lines to 67 lines,
which would place it first in the lines-of-code ranking. (The Shootout
doesn't count lines consisting of only whitespace or solitary curly
braces). I'd be interested in seeing if anyone had additional ideas
for compressing it further without resorting to things like stripping
out whitespace and cramming everything onto a couple of unnaturally
long lines. Also keep in mind the constraints of the Shootout: you
only have access to the standard library, no Boost, etc. I've
included the code below, but you can see a syntax highlighted version
at ( http://sleepingsquirrel.org/cpp/body4.cpp.html ) along with an
alternative version which lays out the data structures in a different
manner ( http://sleepingsquirrel.org/cpp/body5.cpp.html ). Comments on
style would also be appreciated. (BTW, many other C++ programs on the
Shootout site look like they could use some TLC.)


Greg Buchholz

#include <iostream> /* The Great Computer Language Shootout */
#include <iomanip> /* http://shootout.alioth.debian.org/ */
#include <numeric> /* originally by:Christoph Bauer & David McCombs*/
#include <valarray>
#include <vector>

#define solar_mass (4 * M_PI * M_PI)
#define days_per_year 365.24
#define NBODIES 5
#define DIM 3

using namespace std;
typedef valarray<double> V;

template <class T> T mag(const valarray<T>& v){ return
sqrt((v*v).sum()); }

class Planet {
public: V pos;
V vel;
double mass;
Planet () : pos(DIM), vel(DIM) {};
Planet (double m, double x, double y, double z,
double vx, double vy, double vz) :
mass = m; pos[0] = x; pos[1] = y; pos[2] = z;
vel[0] = vx; vel[1] = vy; vel[2] = vz;

V momentum(V& m, const Planet& p){ return m + p.mass * p.vel; }

void offset_momentum(vector<Planet>& b)
V z(0.0,DIM);
b[0].vel = (-1/solar_mass) * accumulate(b.begin()+1,b.end(), z,

double mv_sqrd(double acc, const Planet& p)
return acc + p.mass * (p.vel * p.vel).sum();

double energy(const vector<Planet>& b)
double kinetic, potential = 0;

kinetic = 0.5 * accumulate(b.begin(),b.end(), 0.0, mv_sqrd);

for(int i=0; i<NBODIES; i++)
for(int j=i+1; j<NBODIES; j++)
potential += b.mass * b[j].mass / mag((V)(b.pos -

return kinetic - potential;

void advance(vector<Planet>& b, double dt)
for(int i=0; i<NBODIES; i++)
for(int j=i+1; j<NBODIES; j++)
V d = b.pos - b[j].pos;
double dist = mag(d);
double mag = dt / (dist * dist * dist);
b.vel -= b[j].mass * mag * d;
b[j].vel += b.mass * mag * d;
b.pos += dt * b.vel;

Planet dpy(Planet p)
p.vel *= days_per_year;
return p;

int main(int argc, char *argv[])
Planet sun(solar_mass, 0, 0, 0, 0, 0, 0);
Planet jupiter( 9.54791938424326609e-04 * solar_mass,

Planet saturn( 2.85885980666130812e-04 * solar_mass,
-2.76742510726862411e-03, 4.99852801234917238e-03,
Planet uranus( 4.36624404335156298e-05 * solar_mass,

Planet neptune( 5.15138902046611451e-05 * solar_mass,

Planet tmp[] = {sun, jupiter, saturn, uranus, neptune};
vector<Planet> bodies(tmp,tmp+NBODIES);
cout << setprecision(9) << energy(bodies) << endl;

for(long i=atoi(argv[1]); i>0; i--) advance(bodies,0.01);

cout << energy(bodies) << endl;

return 0;




Good job. I am not very surprised. The last time I looked at the
source code for some of the C++ programs on that site, I was
able to write shorter and faster C++ equivalents for at least
6 of the listed programs.

I entirely agree with you, the C++ programs need to be fixed,
firstly for performance and secondly for number of lines of code...
as you put it TLC.



Greg Buchholz

I entirely agree with you, the C++ programs need to be fixed,
firstly for performance and secondly for number of lines of code...
as you put it TLC.

Here's another update. I got almost all of the speed back (~10%
slower) by manually unrolling the implicit inner valarray loops in
advance(), while keeping the LOC constant. Maybe a sufficiently smart
compiler could figure this out on its own. The valarrays are a
constant length of 3 after all. Oh, and it does use a somewhat ugly
looking macro...


....and here's a slow (3x) but shorter version (64 non-trivial lines of


Greg Buchholz

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question