T
tnorgd
Dear Group Members,
I would like to gather some opinion about a specific problem, how
should I store data in my program.
In general, this is a package for geometric manipulations. The basic
and heavily used class is:
public class Point {
public final double x,y,z;
public final int pointId;
// and many methods there, for example:
public double distanceTo(Point anotherPoint);
}
Points are gathered in shapes and these form a stage. This hierarchy
is very natural for my problem since I need to move shapes, check
which shape a given point belongs to, etc. So far everything works
fine. But now we come to the problem.
Important part of the program are bulk operations, e.g. center of
mass, which is a simple arithmetic average of all points. The data
structure described above is very slow for that purpose. I have nested
loops iterating over thousands of objects. So far I got the following
solution:
public class Point {
public final double[] coordinates; // <--- There we go!
// ....
}
And somewhere else I have a global array: double[N][3], where N is the
number of points, known in advance. Each Point class actually has a
reference to a single row from this array. Then a method performing a
bulk operation on points take this array as an argument. Some
preliminary tests show that this is much faster, but I couldn't say
how much without rewriting the whole code...
SOME FACTS:
- number of points N is known in advance.
- Typical stage has 300 - 1000 shapes, around 10-20 points per shape.
The big array I am thinking of should take ~300 kB. Currently I have
to iterate through 10 000 objects
QUESTIONS:
- what do you think in general about the global array Nx3 idea?
- I was told that memory paging is a very important factor when
efficiency is concerned (and that is one of my problems: my 10 000
objects have huge memory overhead). Do you think 300kB array of
doubles is small enough?
- I am worried that when I allocate my double[N][3] array, my
coordinates (of type double[3]) will be sparsely allocated in memory
and I don't earn anything. In C++ I would put all coordinates in a
single double vector double* of size 3xN. Then my coordinate object
would be also double*, pointing to somewhere in the middle of the
allocated memory. I could do something similar in Java:
public class Point {
private final double[] memoryChunk; // reference to all coordinates,
of size 3xN
private final int offset; // Address of this point in the global
array
public double getX() { return memoryChunk[offset]; }
public double getY() { return memoryChunk[offset+1]; }
public double getZ() { return memoryChunk[offset+2]; }
// ....
}
Do you thing I will gain anything from this?
- Does anybody have a better solution to this problem?
Best regards,
Dominik
I would like to gather some opinion about a specific problem, how
should I store data in my program.
In general, this is a package for geometric manipulations. The basic
and heavily used class is:
public class Point {
public final double x,y,z;
public final int pointId;
// and many methods there, for example:
public double distanceTo(Point anotherPoint);
}
Points are gathered in shapes and these form a stage. This hierarchy
is very natural for my problem since I need to move shapes, check
which shape a given point belongs to, etc. So far everything works
fine. But now we come to the problem.
Important part of the program are bulk operations, e.g. center of
mass, which is a simple arithmetic average of all points. The data
structure described above is very slow for that purpose. I have nested
loops iterating over thousands of objects. So far I got the following
solution:
public class Point {
public final double[] coordinates; // <--- There we go!
// ....
}
And somewhere else I have a global array: double[N][3], where N is the
number of points, known in advance. Each Point class actually has a
reference to a single row from this array. Then a method performing a
bulk operation on points take this array as an argument. Some
preliminary tests show that this is much faster, but I couldn't say
how much without rewriting the whole code...
SOME FACTS:
- number of points N is known in advance.
- Typical stage has 300 - 1000 shapes, around 10-20 points per shape.
The big array I am thinking of should take ~300 kB. Currently I have
to iterate through 10 000 objects
QUESTIONS:
- what do you think in general about the global array Nx3 idea?
- I was told that memory paging is a very important factor when
efficiency is concerned (and that is one of my problems: my 10 000
objects have huge memory overhead). Do you think 300kB array of
doubles is small enough?
- I am worried that when I allocate my double[N][3] array, my
coordinates (of type double[3]) will be sparsely allocated in memory
and I don't earn anything. In C++ I would put all coordinates in a
single double vector double* of size 3xN. Then my coordinate object
would be also double*, pointing to somewhere in the middle of the
allocated memory. I could do something similar in Java:
public class Point {
private final double[] memoryChunk; // reference to all coordinates,
of size 3xN
private final int offset; // Address of this point in the global
array
public double getX() { return memoryChunk[offset]; }
public double getY() { return memoryChunk[offset+1]; }
public double getZ() { return memoryChunk[offset+2]; }
// ....
}
Do you thing I will gain anything from this?
- Does anybody have a better solution to this problem?
Best regards,
Dominik