How to store a large amount of 3D data points in Java?

J

James

Hello,

I am doing medical visualization using java technology.
I read the data file and try to store 3D points (about 500,000 3D
points) using collections such as HashMap.

Unfortunately, I got an error "java.lang.OutOfMemoryError".
Is there anybody can tell me how to store the large number of data in
Java? It seems "collections" cannot deal with this problem.

Thank you,

Looking forward to hearing from you!

Regards,

James
 
M

Markus Innerebner

Hi James
Unfortunately, I got an error "java.lang.OutOfMemoryError".
Is there anybody can tell me how to store the large number of data in
Java? It seems "collections" cannot deal with this problem.
Ho its because the jvm has not allocated enough memory.
Tune it by using java -Xmx<mbytes>m <yourClass>

regards Markus
 
J

John

James said:
Hello,

I am doing medical visualization using java technology.
I read the data file and try to store 3D points (about 500,000 3D
points) using collections such as HashMap.

Unfortunately, I got an error "java.lang.OutOfMemoryError".
Is there anybody can tell me how to store the large number of data in
Java? It seems "collections" cannot deal with this problem.

Thank you,

Looking forward to hearing from you!

Regards,

James

I would load only the points that are required for your particular
operation. If you do need them all at the same time then you can
increase the memory allocation (Roedy Green probably has a link), and
load them in a collection or in your own data structures.

A more complicated (but versatile) method would be to store the points
in a database and maintain a local cache of the points that you are
operating on at the time. You could have some manager class that handles
requests for particular points and returns them either from the cache
(ideally) or by loading them from the database.

John
 
S

Stefan Waldmann

James said:
Hello,

I am doing medical visualization using java technology.
I read the data file and try to store 3D points (about 500,000 3D
points) using collections such as HashMap.

Unfortunately, I got an error "java.lang.OutOfMemoryError".
Is there anybody can tell me how to store the large number of data in
Java? It seems "collections" cannot deal with this problem.

Thank you,

Looking forward to hearing from you!

Regards,

James

Hi,

1.) try to increase the Java VM's maximum memory size using the -Xmx
parameter in the command line.

2.) have a look at the "Flyweight" pattern, which is suitable when you
have a great number of Objects of the same type (which is the case in
your program). It helps to decrease the use of memory by not
instanciating a new object for every point.

The Flyweight pattern is described in Bruce Eckel's "Thinking in
Patterns", you can download it at

http://www.mindview.net/Books/TIPatterns/

HTH
Regards,
Stefan
 
J

Jesper Nordenberg

Hello,

I am doing medical visualization using java technology.
I read the data file and try to store 3D points (about 500,000 3D
points) using collections such as HashMap.

The HashMap will have some space overhead for it's internal structure.
The most memory effective way to store a set of 3D points is to use
primitive arrays. This saves the overhead of the point objects. For
example:

class PointArray {
private float[] x;
private float[] y;
private float[] z;

public float getX(int index) {
return x[index];
}

...
}
Unfortunately, I got an error "java.lang.OutOfMemoryError".
Is there anybody can tell me how to store the large number of data in
Java? It seems "collections" cannot deal with this problem.

This is not a problem related to the collection classes. You have just
run out of heap space. Have you tried increasing the heap space?

/Jesper Nordenberg
 
N

Nils F

Hi, James

I don't think Java has any restriction concerning amount of data
stored in "collections". Have you tried to increase the memory for the
JVM?

Regards, Nils F
 
C

Chris Smith

James said:
I am doing medical visualization using java technology.
I read the data file and try to store 3D points (about 500,000 3D
points) using collections such as HashMap.

Unfortunately, I got an error "java.lang.OutOfMemoryError".
Is there anybody can tell me how to store the large number of data in
Java? It seems "collections" cannot deal with this problem.


On top of your other responses:

Choice of data structures depends not just on what you're storing, but
how you plan to access the data. Storage isn't useful if you never look
at the objects you put there. Jesper is right that a primitive array is
the most compact storage you'll find, but only if it's appropriate for
your intended usage.

Your use of HashMap suggests that you intend to look up the points based
on some sort of key. Is that the case? If so, we need to know your
access requirements to access a data structure.

--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
R

Roedy Green

I am doing medical visualization using java technology.
I read the data file and try to store 3D points (about 500,000 3D
points) using collections such as HashMap.

lets say that each point is 3 doubles plus in the order of 16 bytes of
overhead.

That is around 40 bytes a point.

If you put them in a HashMap you have need some internal glue objects
for each point with hash and reference to key and value. Lets say
these run about 32 bytes each.

We are up to perhaps 80 bytes overhead per point. That is half a
million punch cards worth of data.

You need 40 megabytes of RAM to store them. This is not an outrageous
amount of RAM. You should be able to solve it with just the command
line switch.

See http://mindprod.com/jgloss/javaexe.html

You can see how easy it would be to triple your space requirements by
adding a separate key object.
 
R

Roedy Green

esper is right that a primitive array is
the most compact storage you'll find, but only if it's appropriate for
your intended usage.

you might look into ordering the points and searching with a sort of
3D binary search, like a Wierstrasse lionhunt. Then you save the
considerable overhead of a lookup structure.

One way do to this is to interleave the bits of the three
co-ordinates, and treat them like giant unsigned binary integers, and
sort on that, then do a linear binary search to find the closest point
to the one you want.
 
D

Dimitri Maziuk

James sez:
Hello,

I am doing medical visualization using java technology.
I read the data file and try to store 3D points (about 500,000 3D
points) using collections such as HashMap.

Unfortunately, I got an error "java.lang.OutOfMemoryError".
Is there anybody can tell me how to store the large number of data in
Java? It seems "collections" cannot deal with this problem.

The real questions are:
1. Are you using all 500K points at once or just a subset?
2. Is your dataset going to grow in the future?

If the answer to 2nd question is yes, and to 1st one -- no,
I'd go with a database.

If there will always be 500K points, just add a -Xmx switch
to jvm -- regardless of whether you need them all in RAM or
not.

If you plan on dealing with zillion points and you want to
have them all in RAM at once, consider using arrays of
primitive types for storage. Rewriting the whole thing in
C++ will help, too.

Dima
 
D

Dimitri Maziuk

Roedy Green sez:
how so? C++ and Java are almost identical in memory use for large
arrays of primitives.

So fscking what? The speed isn't identical when dealing wih huge
datasets.

Besides, if you don't always have exactly 123456789 data points,
you can put your ints in a vector in C++ and not worry about resizing
it. In Java, as you well know, you have to use class Integer (extends
Object) at which point memory use is not identical anymore. Not to
mention the time gc needs to keep track of all those Integers.

Sheesh
Dima
 
C

Christophe Vanfleteren

Dimitri said:
Roedy Green sez:

So fscking what? The speed isn't identical when dealing wih huge
datasets.

Besides, if you don't always have exactly 123456789 data points,
you can put your ints in a vector in C++ and not worry about resizing
it. In Java, as you well know, you have to use class Integer (extends
Object) at which point memory use is not identical anymore. Not to
mention the time gc needs to keep track of all those Integers.

So what, making a List of primitives is pretty easy to do if you actually
need it (and this would indeed be a good case). The overhead of Integer is
not necessary in such large datasets.
 
D

Dimitri Maziuk

Christophe Vanfleteren sez:
So what, making a List of primitives is pretty easy to do if you actually
need it (and this would indeed be a good case). The overhead of Integer is
not necessary in such large datasets.

Riight. Please describe in one paragraph the algorithm for resizing
array a[x] into a[y], where y > x, in Java. Please keep in mind that
we are talking large datasets and memory usage is a concern.

Specifically, trying to allocate 2*x memory for array copy is likely
to result in OutOfMemory crash.

For bonus points, name a C library call one could use to resize an
array without copying it.

HTH
Dima
 
R

Roedy Green

Besides, if you don't always have exactly 123456789 data points,
you can put your ints in a vector in C++ and not worry about resizing
it. In Java, as you well know, you have to use class Integer (extends
Object) at which point memory use is not identical anymore. Not to
mention the time gc needs to keep track of all those Integers.

The code for a Vector of primitives is not rocket science. You could
whip it up in an hour or so especially if you peeked at Sun's code for
ArrayList. On a project of this magnitude that should not hold you
back.

On the other hand you probably DON'T want to dynamically grow giant
arrays. It is quite inefficient use of RAM. You could easily triple
your RAM requirements. You want to count first then allocate
accurately. For this sort of thing, direct access to the primitive
gives you quite a speed boost.
 
D

Dimitri Maziuk

Roedy Green sez:
The code for a Vector of primitives is not rocket science. You could
whip it up in an hour or so especially if you peeked at Sun's code for
ArrayList. On a project of this magnitude that should not hold you
back.

On the other hand you probably DON'T want to dynamically grow giant
arrays. It is quite inefficient use of RAM. You could easily triple
your RAM requirements. You want to count first then allocate
accurately. For this sort of thing, direct access to the primitive
gives you quite a speed boost.

Provided, of course, you can count first: your data is in a neat file
that you can parse once to count the values, and then to read them.
Pity it don't always work that way, e.g. when data is coming out of a
pipe.

Anyhow, reading in data is only step 1. IIRC step 2 is retrieving
points based on some key (use of Hashmap), step 3 is plotting it.

Step 3 is where Java beats C++ hands down: it has standard API for
that, and it's gonna work out of the box anywhere.

Step 2, however, is where you really want to use collections instead
of arrays of primitive types. Lookup time with array is at best log(N).
And that's only if it sorted to begin with. If it's not, you're SOL
'cause sorting comes with array copy and that same 2xN memory
requirement that won't let you grow 'em dynamically.

None of which would be a problem if you could store ints in a Java
collection, like you can in C++.

Dima
 
R

Roedy Green

None of which would be a problem if you could store ints in a Java
collection, like you can in C++.

With 1.5 autoboxing you can, even if it is implemented with objects.

Further, you can easily write analogs of ArrayList, HashMap, and
HashSet that hold primitives. People imagine there must be some black
magic that makes these classes work. The guts are very simple and you
can study the source in src.jar/src.zip for the object version. In the
old days everyone wrote these classes for themselves. Read
http://mindprod.com/jgloss/hashtable.html for example to help
understand how hashed lookup works.

You keep coming across like some helpless newbie who imagines if the
class does not exist prewritten there is some insurmountable obstacle.

C++ has a slight advantage in writing such code because you can, for
example, store pairs of ints at each slot in an array. With Java, you
need two separate arrays indexed the same, or you need to store a
reference to an object containing a pair at each slot. But you
strongly exaggerate when you state that Java can't tackle the same
problems quite efficiently. The java code is slightly less convenient
to write but it runs more efficiently since indexing does not require
multiplication, just shifts which can sometime be done free.

Another tool to look at is java.util.BitSet. It is sort of like a
Collection too.
 
D

Dimitri Maziuk

Roedy Green sez:
With 1.5 autoboxing you can, even if it is implemented with objects.

Further, you can easily write analogs of ArrayList, HashMap, and
HashSet that hold primitives. People imagine there must be some black
magic that makes these classes work. The guts are very simple and you
can study the source in src.jar/src.zip for the object version. In the
old days everyone wrote these classes for themselves. Read
http://mindprod.com/jgloss/hashtable.html for example to help
understand how hashed lookup works.

You keep coming across like some helpless newbie who imagines if the
class does not exist prewritten there is some insurmountable obstacle.

C++ has a slight advantage in writing such code because you can, for
example, store pairs of ints at each slot in an array. With Java, you
need two separate arrays indexed the same, or you need to store a
reference to an object containing a pair at each slot. But you
strongly exaggerate when you state that Java can't tackle the same
problems quite efficiently. The java code is slightly less convenient
to write but it runs more efficiently since indexing does not require
multiplication, just shifts which can sometime be done free.

You keep coming across like someone with reading comprehension
skills of a doorpost and attention span of an amoeba.

ArrayList does array copy for resizing. Whether I look at src.zip
or not, it still does array copy.

You said it yourself: that is unsuitable for large dataset/low
memory situations because it requires twice the memory.

C++ has a "slight advantage" in that you can use realloc(3) to grow
your array in situ, no array copy required.

This whole discussion started because someone suggested replacing
container objects with arrays of primitive types to store large
datasets.

(The keyword here is _primitive_types_, so 1.5 autoboxing is no use
_because_ it is implemented with objects, BTW.)

You said that right way of using arrays is to count the values first
and create array of the correct size. I added that this doesn't work
in a general case where number of elements is not known in advance;
in addition, retrieveing array elements based on value rather than
array subscript is inefficient, so you'd need to also write your own
lookup tables etfc.

"The Java code" to implement lookup tables, make sure your arrays
never get out of sync, etc., "is slightly less convenient to write"
only for very large values of "slightly".

C++ has a "slight" advantage in that you can use vectors to store
your datapoints regardless of their data type. (Ever heard the
phrase "first class citizens" applied to user-defined data types?
That's what OO languages other than Java do.)

The fact that you can store pairs in arrays is _not_ an advantage
of C++: you can store pairs in Java arrays too. The only difference
is that pair comes standard in C++; in Java you need to define it
yourself.

The problem, again, is having too many objects in your Java program.
That's why you want to store your pairs in two arrays instead of a
Pair []. (And then you need to make sure your two arrays never ever
get out of sync, and that makes your Java code "slightly less
convenient to write"... but I already said that.)

And by the way, array indexing in C++ does not necessarily
require multiplication: '*' in "*a + elt_num" is _not_
a multiplication sign, appearances notwithstanding .

Dima
 
R

Roedy Green

C++ has a "slight advantage" in that you can use realloc(3) to grow
your array in situ, no array copy required.

How can it do that without either preallocating RAM or moving other
objects out the way? or shuffling objects about to get sufficient
contiguous storage? Dynamically growing arrays mean shuffling bytes
around, even if the shuffling is hidden from your eyes.

Java has the advantage over C++ in this regard is that objects CAN be
shuffled around as needed without the application needing to be aware
of their movements.
 
R

Roedy Green

You said that right way of using arrays is to count the values first
and create array of the correct size. I added that this doesn't work
in a general case where number of elements is not known in advance;

That is the case if you make so special effort to maintain the counts.

However, if you think about it, there is no excuse not to know the
counts except possibly for the initial data import, so long as you are
in control of all the code. When you are working with large arrays it
pays to go to extraordinary lengths to maintain the counts so that you
don't have to fall back on the generic, ram-inefficient and
CPU-inefficient solution of dynamic arrays.

In some cases, it would even pay to do the work in two passes, one to
get the exact count and one to do the work.

You can also use exact upper bound arrays when you don't know the
exact count, e.g. the size of two merged arrays can't be longer than
the sum of the lengths of both.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top