How to store a large amount of 3D data points in Java?

R

Roedy Green

(The keyword here is _primitive_types_, so 1.5 autoboxing is no use
_because_ it is implemented with objects, BTW.)

Exactly, but I was responding to your broad assertion that Java
"can't" deal with collections of primitives.

It can in two ways: write you own or use autoboxing. This is a side
issue to this fellow's problem of trying to deal efficiently with
giant batches of primitives in either Java or C++.
 
R

Roedy Green

The problem, again, is having too many objects in your Java program.
That's why you want to store your pairs in two arrays instead of a
Pair []. (And then you need to make sure your two arrays never ever
get out of sync, and that makes your Java code "slightly less
convenient to write"... but I already said that.)

I have written code both ways, and I definitely agree that using
multiple synched arrays is a pain in the butt. However, if you
encapsulate that logic it ends up looking to the outside world the
same as the C++, or Java with references to separate objects solution.

By localising that logic, it can't get out of sync. If you wrote it
the way you would have done it in FORTRAN II, with exposed arrays, of
course you would get a bloody unmaintainable mess.

One other thing to think about is the RAM caching behaviour. With the
C++ solution the entire object tends to float into onchip cache as a
lump. With the multiple arrays solution, the prev and next elts of
the same field do.
 
R

Roedy Green

And by the way, array indexing in C++ does not necessarily
require multiplication: '*' in "*a + elt_num" is _not_
a multiplication sign, appearances notwithstanding .

It becomes a multiply if the size of the element is not a power of 2,
and you can't reduce it to an add in a loop. In this case with three
doubles, you would not have a power of 2.


In the multiple arrays solution, addressing is always just a shift on
the index, and depending on the underlying architecture, sometimes a
free shift.
 
C

Chris Smith

Roedy said:
How can it do that without either preallocating RAM or moving other
objects out the way? or shuffling objects about to get sufficient
contiguous storage? Dynamically growing arrays mean shuffling bytes
around, even if the shuffling is hidden from your eyes.

Nope, the implementation is probablistic. If there is extra space
available at the end of the allocated block's address space, then it is
added. If not, then a newly allocated block is returned. Since most
large allocations will occur on mmap pages, there will most likely be
available address space (not necessarily mapped RAM) beyond the end of
the allocated section.
Java has the advantage over C++ in this regard is that objects CAN be
shuffled around as needed without the application needing to be aware
of their movements.

That is an advantage in some cases, but here it isn't used to provide
this advantage. In fact, by compacting memory to improve cache
locality, you're reducing the possibility that you might be able to pull
this trick... not that Java would, anyway.

--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
R

Roedy Green

Nope, the implementation is probablistic. If there is extra space
available at the end of the allocated block's address space, then it is
added. If not, then a newly allocated block is returned. Since most
large allocations will occur on mmap pages, there will most likely be
available address space (not necessarily mapped RAM) beyond the end of
the allocated section.

so what you are doing is leaving lots of free address space past the
array for it to grow into. There is not much advantage in that over
just allocating the array max size to begin with, so long as your
underlying JVM does not insist on actually zeroing everything out
before it is ever referenced.

IIRC you need two different memory allocation schemes in C++, one for
movable objects and one for immovable ones. For these large arrays,
to handle things your way, they have to be movable, with extra pinning
logic etc. That all comes out in the wash in Java. C++ has no way to
update references to a moved object. All it can do is insist you use
an indirect handle and pin the object in place, and logically
invalidate any cached pointer when the object has to move.

The test would be to write the code both ways, using teams to each do
their best and just see how different the end result is.

A scheme in either C++ or Java that could precalculate exact array
sizes should beat out any other dynamic array solution in either C++
or Java. Then arrays can be allocated once, and stay put and allow the
most direct access.
 
D

Dimitri Maziuk

Roedy Green sez:
It becomes a multiply if the size of the element is not a power of 2,
and you can't reduce it to an add in a loop. In this case with three
doubles, you would not have a power of 2.

Trust me, *(arr + i) correctly gives you an i'th element of array
arr even if arr is an array of doubles. Using addition -- at least
at the level of language syntax (I mean, very clever runtime could
use bit shift instead for some values of i, etc).

Dima
 
D

Dimitri Maziuk

Dale King sez:
Hello, Dimitri Maziuk !

Hey, your newsserver seems to have a serious propagation
delay problem. ;)
No such library call exists. You are probably thinking about
realloc, but it is not guaranteed to not doing a copy. When
dealing with large arrays it is pretty much guaranteed.

Of course; what I meant was "a Java programmer would have
to create new array, call System.arraycopy(), then clear
old array, where C programmer would just call realloc()".

IRL that C programmer will probably store his data table
in a C++ std::vector of rows anyway.

Dima
 
L

Liz

did anybody say "GZIP"


Dimitri Maziuk said:
Roedy Green sez:

So fscking what? The speed isn't identical when dealing wih huge
datasets.

Besides, if you don't always have exactly 123456789 data points,
you can put your ints in a vector in C++ and not worry about resizing
it. In Java, as you well know, you have to use class Integer (extends
Object) at which point memory use is not identical anymore. Not to
mention the time gc needs to keep track of all those Integers.

Sheesh
Dima
Small
 
D

Dale King

Hello, Roedy Green !
You said:
How can it do that without either preallocating RAM or moving other
objects out the way? or shuffling objects about to get sufficient
contiguous storage? Dynamically growing arrays mean shuffling bytes
around, even if the shuffling is hidden from your eyes.

It is definitely not guaranteed that realloc will use the same
space. It probably will if you shrink the size or expand a little
after shrinking it but is definitely not guaranteed when
expanding. And in this case the subject was expanding to very
large size. That definitely will involve lots of copying along
the way.
 
D

Dale King

Hello, Roedy Green !
You said:
Exactly, but I was responding to your broad assertion that Java
"can't" deal with collections of primitives.

It can in two ways: write you own or use autoboxing.

Or use one that someone else has written. Googling for java
primitive collection should turn up a few.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,434
Messages
2,571,691
Members
48,796
Latest member
Greg L.

Latest Threads

Top