Optimization problem

tnorgd · Apr 30, 2009

Dear Group Members,

I would like to gather some opinion about a specific problem, how
should I store data in my program.

In general, this is a package for geometric manipulations. The basic
and heavily used class is:
public class Point {
public final double x,y,z;
public final int pointId;
// and many methods there, for example:
public double distanceTo(Point anotherPoint);
}

Points are gathered in shapes and these form a stage. This hierarchy
is very natural for my problem since I need to move shapes, check
which shape a given point belongs to, etc. So far everything works
fine. But now we come to the problem.

Important part of the program are bulk operations, e.g. center of
mass, which is a simple arithmetic average of all points. The data
structure described above is very slow for that purpose. I have nested
loops iterating over thousands of objects. So far I got the following
solution:

public class Point {
public final double[] coordinates; // <--- There we go!
// ....
}

And somewhere else I have a global array: double[N][3], where N is the
number of points, known in advance. Each Point class actually has a
reference to a single row from this array. Then a method performing a
bulk operation on points take this array as an argument. Some
preliminary tests show that this is much faster, but I couldn't say
how much without rewriting the whole code...

SOME FACTS:
- number of points N is known in advance.
- Typical stage has 300 - 1000 shapes, around 10-20 points per shape.
The big array I am thinking of should take ~300 kB. Currently I have
to iterate through 10 000 objects

QUESTIONS:
- what do you think in general about the global array Nx3 idea?
- I was told that memory paging is a very important factor when
efficiency is concerned (and that is one of my problems: my 10 000
objects have huge memory overhead). Do you think 300kB array of
doubles is small enough?
- I am worried that when I allocate my double[N][3] array, my
coordinates (of type double[3]) will be sparsely allocated in memory
and I don't earn anything. In C++ I would put all coordinates in a
single double vector double* of size 3xN. Then my coordinate object
would be also double*, pointing to somewhere in the middle of the
allocated memory. I could do something similar in Java:
public class Point {
private final double[] memoryChunk; // reference to all coordinates,
of size 3xN
private final int offset; // Address of this point in the global
array
public double getX() { return memoryChunk[offset]; }
public double getY() { return memoryChunk[offset+1]; }
public double getZ() { return memoryChunk[offset+2]; }
// ....
}
Do you thing I will gain anything from this?

- Does anybody have a better solution to this problem?

Best regards,
Dominik

Joshua Cranmer · Apr 30, 2009

Important part of the program are bulk operations, e.g. center of
mass, which is a simple arithmetic average of all points. The data
structure described above is very slow for that purpose. I have nested
loops iterating over thousands of objects. So far I got the following
solution:

Define "very slow." Did you actually measure your program and find that
the design overhead was the bottleneck in your code?

And somewhere else I have a global array: double[N][3], where N is the
number of points, known in advance. Each Point class actually has a
reference to a single row from this array. Then a method performing a
bulk operation on points take this array as an argument. Some
preliminary tests show that this is much faster, but I couldn't say
how much without rewriting the whole code...

My gut is telling me that the speed difference would amount to only a
few clock cycles under well-written, optimized, JIT-ed code; I think it
should come out to merely another memory indirection reference or three,
but I had no hand in writing a VM's JIT or optimization code, so I'm not
a reliable authority in the subject.

QUESTIONS:
- what do you think in general about the global array Nx3 idea?

I don't know how "global" this N of yours is. If it's truly global,
global enough to be a static variable on Point, then I think you could
garner much of your supposed speed improvements with some clever static
code in Point.

- I was told that memory paging is a very important factor when
efficiency is concerned (and that is one of my problems: my 10 000
objects have huge memory overhead). Do you think 300kB array of
doubles is small enough?

I'm sure everyone here was once told that Java is too slow to do
anything reasonable. The best thing I recommend is to try it and see
what works.

- I am worried that when I allocate my double[N][3] array, my
coordinates (of type double[3]) will be sparsely allocated in memory
and I don't earn anything.

The |new double[N][3]| expression would evaluate into multianewarray,
although the OpenJDK version of hotspot doesn't appear to pack that into
contiguous memory. Despite that, I think the entire allocation runs
without GC so the actual heap layout is most likely near-contiguous,
provided you have sufficient contiguous free space.

I think your fears of sparse allocation are overblown, but I'd recommend
looking at a heap map to double-check.

grendal · Apr 30, 2009

Dear Group Members,

I would like to gather some opinion about a specific problem, how
should I store data in my program.

In general, this is a package for geometric manipulations. The basic
and heavily used class is:
public class Point {
public final double x,y,z;
public final int pointId;
// and many methods there, for example:
public double distanceTo(Point anotherPoint);

}

Points are gathered in shapes and these form a stage. This hierarchy
is very natural for my problem since I need to move shapes, check
which shape a given point belongs to, etc. So far everything works
fine. But now we come to the problem.

Important part of the program are bulk operations, e.g. center of
mass, which is a simple arithmetic average of all points. The data
structure described above is very slow for that purpose. I have nested
loops iterating over thousands of objects. So far I got the following
solution:

public class Point {
public final double[] coordinates; // <--- There we go!
// ....

}

And somewhere else I have a global array: double[N][3], where N is the
number of points, known in advance. Each Point class actually has a
reference to a single row from this array. Then a method performing a
bulk operation on points take this array as an argument. Some
preliminary tests show that this is much faster, but I couldn't say
how much without rewriting the whole code...

SOME FACTS:
- number of points N is known in advance.
- Typical stage has 300 - 1000 shapes, around 10-20 points per shape.
The big array I am thinking of should take ~300 kB. Currently I have
to iterate through 10 000 objects

QUESTIONS:
- what do you think in general about the global array Nx3 idea?
- I was told that memory paging is a very important factor when
efficiency is concerned (and that is one of my problems: my 10 000
objects have huge memory overhead). Do you think 300kB array of
doubles is small enough?
- I am worried that when I allocate my double[N][3] array, my
coordinates (of type double[3]) will be sparsely allocated in memory
and I don't earn anything. In C++ I would put all coordinates in a
single double vector double* of size 3xN. Then my coordinate object
would be also double*, pointing to somewhere in the middle of the
allocated memory. I could do something similar in Java:
public class Point {
private final double[] memoryChunk; // reference to all coordinates,
of size 3xN
private final int offset; // Address of this point in the global
array
public double getX() { return memoryChunk[offset]; }
public double getY() { return memoryChunk[offset+1]; }
public double getZ() { return memoryChunk[offset+2]; }
// ....}

Do you thing I will gain anything from this?

- Does anybody have a better solution to this problem?

Best regards,
Dominik

Dominik,

Let me see if I have this right...

Ok so you have an object which is a point.
You have a second object which is a shape that contains multiple point
objects.
You have a third object which is a container called a stage.

A stage is a collection of shapes and if the set of shapes is null,
then there is no 'center of mass' and there are no dimensions to the
stage, correct?

Assuming this is the case, then a point has no center of mass but a
shape does.
So what you're really doing is finding the center of mass for each
shape which is really a point and some value for the mass of the
shape. Right?

Then for the stage, you're iterating through the set of all the
calculated center of masses to determine the center of the mass for
the stage, right?

Is this correct?
Can you see where I'm going with this?

I mean why can't you have a method within each shape object that would
return the center of mass (point, value) pair for the shape? Then for
the stage, you iterate over the shape set and get each shape's center
of mass and put that in to your array. Then have the stage perform its
own center of mass calculation?

Also if you have 10,000 objects and it takes a relatively long amount
of time to calculate the center of mass for an object, why not do this
in parallel?

Does this make sense or should I go get some sleep?

-G

Roedy Green · Apr 30, 2009

I would like to gather some opinion about a specific problem, how
should I store data in my program.

You might be getting in trouble with too much allocating and garbage
collection. If so consider using a cache of reusable objects.

The first problem is to find out where all the cycles are going.

See http://mindprod.com/jgloss/optimising.html
--
Roedy Green Canadian Mind Products
http://mindprod.com

"Nationalism is an infantile disease. It is the measles of mankind."
~ Albert Einstein

Tom Anderson · Apr 30, 2009

I would like to gather some opinion about a specific problem, how
should I store data in my program.

In general, this is a package for geometric manipulations. The basic
and heavily used class is:
public class Point {
public final double x,y,z;
public final int pointId;
// and many methods there, for example:
public double distanceTo(Point anotherPoint);
}

Points are gathered in shapes and these form a stage. This hierarchy
is very natural for my problem since I need to move shapes, check
which shape a given point belongs to, etc. So far everything works
fine. But now we come to the problem.

Important part of the program are bulk operations, e.g. center of
mass, which is a simple arithmetic average of all points. The data
structure described above is very slow for that purpose.

I don't see how you can possibly say it's "very slow". That implies you've
measured its speed, and either you've measured the speed of alternative
implementations, or you have a sensible idea of how fast it should be.

I have nested loops iterating over thousands of objects. So far I got
the following solution:

public class Point {
public final double[] coordinates; // <--- There we go!
// ....
}

And somewhere else I have a global array: double[N][3], where N is the
number of points, known in advance. Each Point class actually has a
reference to a single row from this array. Then a method performing a
bulk operation on points take this array as an argument. Some
preliminary tests show that this is much faster, but I couldn't say how
much without rewriting the whole code...

I'm surprised to hear that. And, in fact, some benchmarks of my own don't
have the same conclusion - see below.

SOME FACTS:
- number of points N is known in advance.
- Typical stage has 300 - 1000 shapes, around 10-20 points per shape.
The big array I am thinking of should take ~300 kB. Currently I have
to iterate through 10 000 objects

QUESTIONS:
- what do you think in general about the global array Nx3 idea?
- I was told that memory paging is a very important factor when
efficiency is concerned (and that is one of my problems: my 10 000
objects have huge memory overhead). Do you think 300kB array of
doubles is small enough?
- I am worried that when I allocate my double[N][3] array, my

Paging won't be a problem. Cache interactions might be.

coordinates (of type double[3]) will be sparsely allocated in memory
and I don't earn anything.

There's nothing in the spec that says they won't be, but in practice, they
won't be. Provided that you allocate them all at once.

In C++ I would put all coordinates in a single double vector double* of
size 3xN. Then my coordinate object would be also double*, pointing to
somewhere in the middle of the allocated memory. I could do something
similar in Java:
public class Point {
private final double[] memoryChunk; // reference to all coordinates,
of size 3xN
private final int offset; // Address of this point in the global
array
public double getX() { return memoryChunk[offset]; }
public double getY() { return memoryChunk[offset+1]; }
public double getZ() { return memoryChunk[offset+2]; }
// ....
}
Do you thing I will gain anything from this?

This will take up less memory than the two-dimensional array, and
more or less guarantees compactness.

I wrote a simple benchmark to compare naive points, points represented as
an N x 3 array of arrays, and points represented as a single array:

http://urchin.earth.li/~twic/tmp/PointBench/

I used a single shape with a million points, so perhaps not so similar to
your situation. But the results, in nanoseconds, are:

minimum median 95th centile
naive 111049000 124352000 185673000
N x 3 131850000 149605000 206983000
packed 96216000 106892000 146708000

So, a packed representation is ~20% faster than a naive representation,
whilst the N x 3 representation is actually slower.

However, that's only when the points are filled in in order. I wouldn't
expect this to matter for the N x 3 or packed representations, but for the
naive points, allocating all the points in order has the effect of making
them contiguous and sequential in memory. I don't think this is realistic:
i presume points in your application will actually be allocated in a more
dispersed order, some over here, then some over there. So, if we instead
fill in points in a random order, so that they aren't, we get:

minimum median 95th centile
naive 136890000 151244000 268689000
N x 3 131995000 148993000 201279000
packed 96196000 108204000 160004000

For the array representations, the times are basically identical. However,
for the object representation, there's a ~20% slowdown.

So, the conclusion is that the packed representation is at least 20%
faster than the naive object representation, and probably more like 40%
faster in a realistic situation. However, the N x 3 representation is
never faster.

But, crucially, note the spread of values in the above data: in all cases,
the width of the range of times greatly exceeds the difference between the
medians. Thus, differences in speed between different implementations may
not really be detectable in real use.

Obviously, this simple model may not apply very well to your code. You
have a lot of small shapes rather than one big one, so the gains from
organising shapes like this may be small. You'd have to apply one of these
approaches to a whole stage, as you indicated above. Nonetheless, i
imagine the general conclusion will be similar.

tom

tnorgd · May 1, 2009

Dear All,

Thanks a lot for answers. They really gave me a lot of insight. I
would also like to add two things.

1) I am afraid that simplified benchmarks are not very meaningful. My
Point class bears a lot of stuff. Many fields related to a Point (like
color, name, reference to shape that contains that Point and many
others). Therefore to iterate over all points I have to go through a
vary large part of memory. For instance I can fit far less Points in
cache than double[]

2) There is a "macro" mode of the program, that computes the same set
of operations (usually these bulky ones) for a number of stages. The
stages are loaded from file. So far when a stage is loaded I have to
create all the objects: Points, Shapes and so on.
But in some cases the stages are exactly the same, just coordinates
are different. Having the big double[] table available behind my
object, I could just read all doubles from a file into that array. Now
I am using serialization to solve it. It works, but has obvious
drawbacks.

Best regards,
Dominik

coffeymex · May 1, 2009

1) I am afraid that simplified benchmarks are not very meaningful. My
Point class bears a lot of stuff. Many fields related to a Point (like
color, name, reference to shape that contains that Point and many
others). Therefore to iterate over all points I have to go through a
vary large part of memory. For instance I can fit far less Points in
cache than double[]

I think your first sentence is a key point and is something others
have been hinting at-- when you get down to things like the usage
of CPU caches, there's a point at which rather than being too
theoretical about it, it's best to see which solution works best
overall in your particular case.

One thing I'll throw into the melting pot is that if you have simple
fields on your object, then in general the JIT can compile to code
that directly accesses those fields by offset within the object
data. Your idea of storing points in a separate array (especially
the separate *single* array) may buy you something if you're
essentially *just* manipulating that point data. But as soon as
you need to access the object as a whole, you're actually dispersing
the data for that object. So I'd guess that which is best overall
depends on what balance of operations you're doing.

Neil

Roedy Green · May 2, 2009

I don't see how you can possibly say it's "very slow".

In the end, what counts in the marketplace is the psychological
impression a program gives to its users.

You can "speed up" a program by wasting even more cycles with progress
bars and animated spinners -- the approach pioneered by Microsoft.

If the program behaves what appears to behave like a reluctant or
lazy human, it will be perceived as slow. It appears industrious, it
can actually be a pig and get away with it.

In GUIs what most often counts most is some sort of rapid
acknowledgement feedback not the total elapsed time of some process.

--
Roedy Green Canadian Mind Products
http://mindprod.com

"Species evolve exactly as if they were adapting as best they could to a changing world, and not at all as if they were moving toward a set goal."
~ George Gaylord Simpson

Java matrix problem	3	Sep 10, 2023
Struct Member Variables Problem	0	Jun 21, 2023
Struct Member Variable Problems	1	Jun 21, 2023
How to draw shapes by using mouse	1	Feb 8, 2022
Java OpenJDK Floating Point Dare	3	Jan 17, 2023
[C language] Issue in the Lotka-Volterra model.	0	Jun 28, 2023
Help for ActionPerformance and how to use HashMap.	2	Feb 10, 2022
Logic Problem with BigInteger Method	2	Aug 26, 2023

Optimization problem

tnorgd

Joshua Cranmer

grendal

Roedy Green

Tom Anderson

tnorgd

coffeymex

Roedy Green

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads