Parallel arrays revisited

T

Tom Anderson

Listen all a'yall,

You may remember we had a discussion a while ago, in the depths of a
thread called "Class Constants - pros and cons":

http://groups.google.com/group/comp.lang.java.programmer/browse_thread/thread/8a7f6b2357c59a68

About storing large amounts of data as parallel arrays rather than arrays
of objects. The example was an astronomical model, where, roughly
speaking:

public class Universe {
private Star[] stars;
public class Star {
private double x;
private double y;
private double z;
public double getX() {return x;}
public double getY() {return y;}
public double getZ() {return z;}
}
public Star getStar(int i) {
return stars;
}
}

Became:

public class Universe {
private double[] x;
private double[] y;
private double[] z;
public class Star {
private int i;
public double getX() {return x;}
public double getY() {return y;}
public double getZ() {return z;}
}
public Star getStar(int i) {
return new Star(i);
}
}

It was proposed that this would have better performance; this idea was
debated, and also suggested to be less important than programmer-time
concerns.

This is an interesting presentation (sadly in page-embedded PDF format or
something, although downloadable) from Michael Busch, and engineer at
Twitter, about how they optimised Lucene to handle Twitter's search needs:

http://www.box.net/shared/hivdg1hge9

The bottom line (and, indeed, the top line, given that it's the title) is
that they handle a billion searches a day, they do this with an
open-source java stack, and they had to rewrite some of the innards of
Lucene using parallel arrays to do it.

Well, and a lot of other enjoyably evil low-level tweaks.

To put a number on it, for memory-tight environments (which search engines
are, because you want to use all your memory for data, not JVM headroom),
using parallel arrays rather than small objects brought performance
improvements of "up to 400%". Where there was plenty of headroom, the
improvement was small - 4.3% in the example given.

Clearly, Twitter's search is a case where it was worth expending
programmer time to save execution time. But parallel arrays are definitely
a technique which can save execution time.

tom
 
J

Joshua Cranmer

Clearly, Twitter's search is a case where it was worth expending
programmer time to save execution time. But parallel arrays are
definitely a technique which can save execution time.

Well, a fundamental lesson of programming is that you worry about this
kind of "microoptimization" (by which I mean the willful violation of
coding practices to eke out performance gains) only after you can
demonstrate it to be a performance bottleneck.

I think, in this case, the usage patterns can make a noticeable
difference in performance. I don't doubt that parallel arrays can save
execution time; however, I am hesitant to accept the claim that they
will save (significant) execution time in many/most/all cases.

So as always, leave the optimization until after its found to be a
problem. :)
 
A

Arne Vajhøj

Listen all a'yall,

You may remember we had a discussion a while ago, in the depths of a
thread called "Class Constants - pros and cons":

http://groups.google.com/group/comp.lang.java.programmer/browse_thread/thread/8a7f6b2357c59a68


About storing large amounts of data as parallel arrays rather than
arrays of objects. The example was an astronomical model, where, roughly
speaking:

public class Universe {
private Star[] stars;
public class Star {
private double x;
private double y;
private double z;
public double getX() {return x;}
public double getY() {return y;}
public double getZ() {return z;}
}
public Star getStar(int i) {
return stars;
}
}

Became:

public class Universe {
private double[] x;
private double[] y;
private double[] z;
public class Star {
private int i;
public double getX() {return x;}
public double getY() {return y;}
public double getZ() {return z;}
}
public Star getStar(int i) {
return new Star(i);
}
}

It was proposed that this would have better performance; this idea was
debated, and also suggested to be less important than programmer-time
concerns.

This is an interesting presentation (sadly in page-embedded PDF format
or something, although downloadable) from Michael Busch, and engineer at
Twitter, about how they optimised Lucene to handle Twitter's search needs:

http://www.box.net/shared/hivdg1hge9

The bottom line (and, indeed, the top line, given that it's the title)
is that they handle a billion searches a day, they do this with an
open-source java stack, and they had to rewrite some of the innards of
Lucene using parallel arrays to do it.

Well, and a lot of other enjoyably evil low-level tweaks.

To put a number on it, for memory-tight environments (which search
engines are, because you want to use all your memory for data, not JVM
headroom), using parallel arrays rather than small objects brought
performance improvements of "up to 400%". Where there was plenty of
headroom, the improvement was small - 4.3% in the example given.

Clearly, Twitter's search is a case where it was worth expending
programmer time to save execution time. But parallel arrays are
definitely a technique which can save execution time.


Sure.

That happens all the time.

Google uses customized Linux kernel.

FaceBook uses customized MySQL.

Etc..

When you have a billion dollar business relying on
some software running on tens of thousands of computers
serving an astronomical number of requests per month,
then the value of optimizations can be high.

So here are Arne's rule of thumb: start looking
into that when you pass 100 M$ in revenue & cost for
your app - but forget about it until then.

Arne
 
E

Eric Sosman

Well, a fundamental lesson of programming is that you worry about this
kind of "microoptimization" (by which I mean the willful violation of
coding practices to eke out performance gains) only after you can
demonstrate it to be a performance bottleneck.

Microoptimization can be significant when macromagnified.
I think, in this case, the usage patterns can make a noticeable
difference in performance. I don't doubt that parallel arrays can save
execution time; however, I am hesitant to accept the claim that they
will save (significant) execution time in many/most/all cases.

They're primarily a memory-saving technique, secondarily a cache-
improvement technique (and then only for selected access patterns).
Since memory is a few hundred times slower than CPU's, such techniques
can make a significant difference -- but "can" is not "will", and
"difference" is not always in the desired direction.
So as always, leave the optimization until after its found to be a
problem. :)

Ayeh. One recurring difficulty is that we as programmers are
woefully inclined to mis-measure "macro." Despite working with
systems that churn through terabytes in nanoseconds, we fall prey
to thinking "This loop might be executed a thousand times, ooh, wow,
a thousand; I'd better optimize it."

Now, if the loop runs its thousand iterations each time its
method is called, and if the method is called a thousand times each
time its bean is used, and if the bean is used a thousand times
a day on each of a thousand machines ... Sadly, though, I see
programmers (myself included!) diligently optimizing their static
initializers ...
 
L

Lew

Eric said:
Now, if the loop runs its thousand iterations each time its
method is called, and if the method is called a thousand times each
time its bean is used, and if the bean is used a thousand times
a day on each of a thousand machines ... Sadly, though, I see
programmers (myself included!) diligently optimizing their static
initializers ...

There's good optimization (algorithmic) and bad optimization (micro-managed).
It's never too early for the first kind.

Immutability is part of a toolkit of idioms to construct stable, extensible
systems that are inevitability about as fast as you can safely get. It's
especially handy for the large class of real-world applications that are
multi-threaded.

Good optimization of static initializers is structural rather than
cycles-based. You use static final initializers of immutable instances to
spread immutability. A special class of those are compile-time constants,
which have consequences for /happens-before/ relationships. Initialization
order matters, especially in enums. These optimize for correctness, but
properly done they do tend to make the program fast.

Remember, too, that there are different kinds of fast.
 
D

Daniel Pitts

There's good optimization (algorithmic) and bad optimization
(micro-managed). It's never too early for the first kind.
Never say never. If you have an enum class, and you want to create a
method which returns one instance by some property, the first approach
should be a for-loop+compare. Only if you find out this is inside a
tighter inner loop should you replace it with a map look up. (This exact
thing happened to me in a real world situation, actually).
Immutability is part of a toolkit of idioms to construct stable,
extensible systems that are inevitability about as fast as you can
safely get. It's especially handy for the large class of real-world
applications that are multi-threaded.

Good optimization of static initializers is structural rather than
cycles-based. You use static final initializers of immutable instances
to spread immutability. A special class of those are compile-time
constants, which have consequences for /happens-before/ relationships.
Initialization order matters, especially in enums. These optimize for
correctness, but properly done they do tend to make the program fast.

Remember, too, that there are different kinds of fast.
Also, const should be something in Java. I'm very sad it is not, given
the thread-safety it could bring.
 
E

Eric Sosman

Never say never. If you have an enum class, and you want to create a
method which returns one instance by some property, the first approach
should be a for-loop+compare. Only if you find out this is inside a
tighter inner loop should you replace it with a map look up. (This exact
thing happened to me in a real world situation, actually).
Also, const should be something in Java. I'm very sad it is not, given
the thread-safety it could bring.

Personally, I pine for immutable arrays. Even better, for
immutable "views" of mutable arrays. Yes, there's unmodifiableList(),
but I don't find it a truly happy-making alternative. For one thing,
its immutability is a run-time property, not something the compiler
can warn you about if you accidentally try a modification.

Your mention of enums calls to mind their wretched values() method,
which is forced to make a brand-new free-standing array object at
each call lest the caller scramble the contents of the real version.
If only the real version could be immutable, and hence safe to show
to the outside world ... Ah, well.
 
R

Roedy Green

But parallel arrays are definitely
a technique which can save execution time.

I wonder if optimisation technology can effectively do that, or inline
objects so you have true arrays of objects rather than arrays of
pointers to objects when the objects are of fixed size which would
give you even better speed advantage.
 
T

Tom Anderson

The bottom line (and, indeed, the top line, given that it's the title)
is that they handle a billion searches a day, they do this with an
open-source java stack, and they had to rewrite some of the innards of
Lucene using parallel arrays to do it.

If you're going to hyper-optimize, then you can take this a step further.
Instead of:

int [] foo;
int [] bar;

do

long [] foobar;

and store the foo in the upper 32 bits and the bar in the lower.

Or:

int[] foobar; // this is twice as long as foo or bar would have been

And store foos and bars in adjacent elements. The memory layout will be
just the same as with the longs, but it's a little easier to work with in
code. I'm mildly curious as to how the two approaches compare in terms of
efficiency, but i doubt there's a significant difference.

tom
 
J

John B. Matthews

Tom Anderson said:
The bottom line (and, indeed, the top line, given that it's the
title) is that they handle a billion searches a day, they do this
with an open-source java stack, and they had to rewrite some of
the innards of Lucene using parallel arrays to do it.

If you're going to hyper-optimize, then you can take this a step
further. Instead of:

int [] foo;
int [] bar;

do

long [] foobar;

and store the foo in the upper 32 bits and the bar in the lower.

Or:

int[] foobar; // this is twice as long as foo or bar would have been

And store foos and bars in adjacent elements. The memory layout will
be just the same as with the longs, but it's a little easier to work
with in code. I'm mildly curious as to how the two approaches compare
in terms of efficiency, but i doubt there's a significant difference.

As a concrete example, EventListenerList uses such an arrangement to
store a type-token and listener object for each registered listener. I don't
know that it's more efficient, but it's easy to work with.

<http://download.oracle.com/javase/6/docs/api/javax/swing/event/EventListenerList.html>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,481
Members
44,900
Latest member
Nell636132

Latest Threads

Top