copying a really huge long array

J

jeanlutrin

Hi all,

before anyone screams in anger at me and tell
me I should go back to some proper OO design
classes, I want to say that the following
scenario is completely hypothetical and my
question is one of curiosity (and desire to
understand better the working of the
language/JVM/you-name-it too) :)

Imagine that one hypotetical application is
using a huge long[] to represent an insane
number of "things" using a very efficient
bit encoded data structure. Say that
this long[] as a size of approx 100 Mb (!).

The "things" represented by this long[] wouldn't
be representable using Java object's because
of the memory footprint this would induce (on
the order of several Gb).

Now imagine that this long[] should be duplicated
(having the two long[] is not a problem and is
even mandatory: some different modifications will
be made on those two long[], then they'll be compared
- and "swapping" to disk is out of question due to
performance concerns).

For whatever reason, the second long[] should have
the size of the first one plus one element and all
the longs besides the last one should be the same
as in the first long[].

If there's enough memory available, what options are
there to achieve this?

I'm doing something like this (in my hypothetical simulation):

long[] sec = new long[first.length+1];
for (int i = 0; i < first.length; i++) {
sec = first;
}
sec[sec.length-1] = ...;


Would this be better (if only for the readability?) ?

long[] sec = new long[first.length+1];
System.arraycopy(first, 0, sec, 0, first.length)
sec[sec.length-1] = ...;

I also have another question, what if the memory gets
"fragmented" and there's enough memory available, but not
in a contiguous region.

Does the VM (or some VM) do some kind of "defragmentation" ?

Or some kind of pointer arithmetic ("splitting" the long[]
in pieces)?

Will the VM simply prevent the long[] from being created?

Are those behavior defined by some specs or are these
allowed to change?

Thanks in advance for any informations (I googled this
group's archives and found interesting thing, but I'm still
very curious about this),


Jean
 
A

Ann

Hi all,

before anyone screams in anger at me and tell
me I should go back to some proper OO design
classes, I want to say that the following
scenario is completely hypothetical and my
question is one of curiosity (and desire to
understand better the working of the
language/JVM/you-name-it too) :)

Imagine that one hypotetical application is
using a huge long[] to represent an insane
number of "things" using a very efficient
bit encoded data structure. Say that
this long[] as a size of approx 100 Mb (!).

The "things" represented by this long[] wouldn't
be representable using Java object's because
of the memory footprint this would induce (on
the order of several Gb).

Now imagine that this long[] should be duplicated
(having the two long[] is not a problem and is
even mandatory: some different modifications will
be made on those two long[], then they'll be compared
- and "swapping" to disk is out of question due to
performance concerns).

For whatever reason, the second long[] should have
the size of the first one plus one element and all
the longs besides the last one should be the same
as in the first long[].

If there's enough memory available, what options are
there to achieve this?

I'm doing something like this (in my hypothetical simulation):

long[] sec = new long[first.length+1];
for (int i = 0; i < first.length; i++) {
sec = first;
}
sec[sec.length-1] = ...;


Would this be better (if only for the readability?) ?

long[] sec = new long[first.length+1];
System.arraycopy(first, 0, sec, 0, first.length)
sec[sec.length-1] = ...;

I also have another question, what if the memory gets
"fragmented" and there's enough memory available, but not
in a contiguous region.

Does the VM (or some VM) do some kind of "defragmentation" ?

Or some kind of pointer arithmetic ("splitting" the long[]
in pieces)?

Will the VM simply prevent the long[] from being created?

Are those behavior defined by some specs or are these
allowed to change?

Thanks in advance for any informations (I googled this
group's archives and found interesting thing, but I'm still
very curious about this),


Jean

The code is so simple and short I am surprised you have not
done it already.
 
B

bugbear

Hi all,

before anyone screams in anger at me and tell
me I should go back to some proper OO design
classes, I want to say that the following
scenario is completely hypothetical and my
question is one of curiosity (and desire to
understand better the working of the
language/JVM/you-name-it too) :)

Imagine that one hypotetical application is
using a huge long[] to represent an insane
number of "things" using a very efficient
bit encoded data structure. Say that
this long[] as a size of approx 100 Mb (!).

In virtually any langauge, I would represent such a structure
as a linked list of sub blocks, (say 1 Mb each), with an
abtract API over the top to represent the (virtual) contiguous
array. The linked list will introduce very little
performance overhead, and greatly faciliate many operations.

BugBear (who did this in C about 2 decades ago)
 
C

Chris Uppal

Would this be better [than a simple loop] (if only for the readability?) ?

long[] sec = new long[first.length+1];
System.arraycopy(first, 0, sec, 0, first.length)
sec[sec.length-1] = ...;

Maybe better, maybe worse. It /shouldn't/ be worse since the JVM
implementation could use a simple Java loop in System.arraycopy() if that were
generally the fastest way. Either assume the competence of the JVM
implementors (not /necessarily/ an unsafe assumption ;-) or measure it in your
particular application (tricky to do, I know, when that application is purely
hypothetical...)

Does the VM (or some VM) do some kind of "defragmentation" ?

Implementaton dependent. I don't know the answer for any existing
production-grade JVM, I /suspect/ that the answer is more likely to be no than
yes, although there's no technical problem with doing it.

Or some kind of pointer arithmetic ("splitting" the long[]
in pieces)?

Implementation dependent, again. I have never heard of an implementation that
does this, though.

Incidentally, if the hypothetical application were to use a more OO interface
to the data than "raw" array access, then it would become strasightforward to
implement splitting these data structures at the application level. Similarly
it might be able to implement some kind of sharing (if the arrays were normally
immutable), so that your scenario woud be more-or-less constant-time rather
than O(n). I know you eliminated that kind of option by hypothesis, but I did
want to point out that you might also be ruling out the best (fastest,
simplest) solution at the same time.

Are those behavior defined by some specs or are these
allowed to change?

Good question. The answers are not defined in any way by the spec.

-- chris
 
R

Ryan Stewart

Chris Uppal said:
Would this be better [than a simple loop] (if only for the readability?) ?

long[] sec = new long[first.length+1];
System.arraycopy(first, 0, sec, 0, first.length)
sec[sec.length-1] = ...;

Maybe better, maybe worse. It /shouldn't/ be worse since the JVM
implementation could use a simple Java loop in System.arraycopy() if that were
generally the fastest way. Either assume the competence of the JVM
implementors (not /necessarily/ an unsafe assumption ;-) or measure it in your
particular application (tricky to do, I know, when that application is purely
hypothetical...)
In the reference implementation, arraycopy is a native method, and IIRC is much
faster than using a loop.
 
J

John C. Bollinger

Chris said:
(e-mail address removed) wrote:


Implementaton dependent. I don't know the answer for any existing
production-grade JVM, I /suspect/ that the answer is more likely to be no than
yes, although there's no technical problem with doing it.

Indeed, the ability for the JVM to do this sort of thing is one of the
advantages claimed for using references instead of pointers. As you
say, whether or not any existing JVM of any grade actually does this is
is uncertain. Considering the rapid heap fragmentation likely to result
from typical Java usage patterns (use of many short-lived objects,
interspersed with a few longer-lived ones), however, I would guess the
other way: good JVMs probably do perform heap compaction at need,
probably in conjunction with GC activity.


John Bollinger
(e-mail address removed)
 
C

Chris Smith

Ann said:
The code is so simple and short I am surprised you have not
done it already.

I have to disagree strongly here. The original question was about the
following:

a. Is there a better way to do this?
b. Does the spec guarantee this?
c. Do implementations in general do this?

*None* of those questions can be answered by writing and testing code.
Developers who use quick testing as an excuse for not paying attention
to specifications or to collected knowledge about a variety of
implementations are shooting themselves in the foot.

There's also nothing wrong with asking questions on newsgroups. It gets
tiring to see people snubbed or insulted for having done so.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
C

Chris Smith

John C. Bollinger said:
Indeed, the ability for the JVM to do this sort of thing is one of the
advantages claimed for using references instead of pointers.

I'm guessing at what you mean here, and I'd choose different
terminology. In the terminology of the JLS, references *are* pointers,
unless they are null. Trying to distinguish between references and
pointers seems a tad confusing.

I suspect you mean that the ability for the JVM to do this is one of the
advantages claimed for indirect (natively pointer-to-pointer)
references, as were implemented in Sun's reference implementation of
Java 1.1 and earlier. It is true that Sun's old 1.1 reference
implementation used these indirect references to compact the heap even
when it was doing conservative collection and didn't know what is or is
not a reference type.

However, which some runtime knowledge about which memory slots are
references, the same is even true with modern direct references in 1.2
and later of the Sun reference implementation. Copying collection
(which is implicitly heap-compacting) is performed at younger
generations; and explicit heap compaction is performed in older
generations. This is still possible, because the VM keeps tabs on what
is a reference variable and what isn't.
As you say, whether or not any existing JVM of any grade actually does
this is is uncertain.

Sun's implementations do this. So definitely some implementations do.
As for other implementations, I'd consider it somewhat out of the
ordinary to see explicit heap compaction on a heap; but I'd certainly
*expect* to see a copy collector in any decent JVM, so at least the
newer generations where temporary objects are created and disposed will
be compacted as a side-effect of the collection.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
C

Chris Uppal

Chris said:
I'm guessing at what you mean here, and I'd choose different
terminology. In the terminology of the JLS, references *are* pointers,
unless they are null. Trying to distinguish between references and
pointers seems a tad confusing.

I think that John's probably using the distinction in the same way as I would.

A "pointer" is an integer-like value that holds the address-in-memory of
something or nothing. (Not necessarily the actual physical RAM address,
there's a certain amount of abstraction allowed in the concept.) Whereas a
"reference" is an abstraction of a pointer that does not allow it to appear in
its integer-like role (so pointer arithmetic is impossible, as is "just
inventing" pointer values from thin air).

It's that cut-down-ness that is necessary to allow precise GC.

-- chris
 
C

Chris Smith

Chris Uppal said:
I think that John's probably using the distinction in the same way as I would.

A "pointer" is an integer-like value that holds the address-in-memory of
something or nothing. (Not necessarily the actual physical RAM address,
there's a certain amount of abstraction allowed in the concept.) Whereas a
"reference" is an abstraction of a pointer that does not allow it to appear in
its integer-like role (so pointer arithmetic is impossible, as is "just
inventing" pointer values from thin air).

It's that cut-down-ness that is necessary to allow precise GC.

Ah yes! That makes sense. It still doesn't agree with the use of
"pointer" in Pascal or several other languages, but nothing is ever
perfect.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
P

Patricia Shanahan

Chris said:
Chris Smith wrote:




I think that John's probably using the distinction in the
same way as I would.

A "pointer" is an integer-like value that holds the
address-in-memory of something or nothing. (Not
necessarily the actual physical RAM address, there's a
certain amount of abstraction allowed in the concept.)
Whereas a "reference" is an abstraction of a pointer that
does not allow it to appear in its integer-like role (so
pointer arithmetic is impossible, as is "just inventing"
pointer values from thin air).

It's that cut-down-ness that is necessary to allow
precise GC.

-- chris

<rant>

This form of distinction gets very weird if you look at the
history of the term. By your definition, a Pascal pointer,
as described in the Pascal Report, was a reference, not a
pointer.

It also doesn't make sense in terms of English. I have a
laser pointer for use in presentations. It points. It does
not do arithmetic. Despite its total lack of arithmetic
capability, I call it a "pointer" not a "reference".

Arithmetic on pointers was a very unfortunate idea in C.
Conversion, in both directions, between pointers and
integers would have been sufficient to support
implementation of functions such as malloc. Arrays and array
indexing could have been defined in their own right, not in
terms of pointer arithmetic.

Allowing conversion would not have perverted the term
"pointer". The conversion operations would have been
appropriate only in low level system code, and programming
standards could have excluded them in normal applications.

Personally, I wish the Java designers had reclaimed
"pointer" for anything that points, and explained that you
can't do arithmetic on them.

</rant>

Patricia
 
J

John C. Bollinger

Chris said:
Chris Smith wrote:




I think that John's probably using the distinction in the same way as I would.

A "pointer" is an integer-like value that holds the address-in-memory of
something or nothing. (Not necessarily the actual physical RAM address,
there's a certain amount of abstraction allowed in the concept.) Whereas a
"reference" is an abstraction of a pointer that does not allow it to appear in
its integer-like role (so pointer arithmetic is impossible, as is "just
inventing" pointer values from thin air).

It's that cut-down-ness that is necessary to allow precise GC.

Yes, Chris, that's just what I meant. The distinction I was trying to
draw was that a "pointer" value incorporates information about the
pointee's location (I was thinking along C lines here) whereas a
reference value does not (or at least, it is not required to do). Thus,
moving a pointee in memory invalidates all pointers pointing to it, but
moving a referent does not necessarily do the same.


John Bollinger
(e-mail address removed)
 
A

Ann

Chris Smith said:
I have to disagree strongly here. The original question was about the
following:

a. Is there a better way to do this?
b. Does the spec guarantee this?
c. Do implementations in general do this?

I really was surprised, how can you disagree about surprised? ;-)

I do not have the original post, but the OP had a suggested solution
to his/her own problem. It was a simple and short solution. In my view
it would have been faster to just try it compared to getting feedback first.
<A> if a better way is found, then refactor and get credit for that too
<B> if it works it works
<C> important only if there is a better solution
 
C

Chris Uppal

Patricia Shanahan wrote:

[in reply to:]
A "pointer" is an integer-like value that holds the
address-in-memory of something or nothing. [...]
Whereas a "reference" is an abstraction of a pointer that
does not allow it to appear in its integer-like role (so
pointer arithmetic is impossible, as is "just inventing"
pointer values from thin air).
<rant>[...]<rant/>

Far be it from me to nitpick with a good rant, but I feel that Pascal has
poisoned your brain, and that consequentially you have fallen into error ;-).
It also doesn't make sense in terms of English. I have a
laser pointer for use in presentations. It points. It does
not do arithmetic. Despite its total lack of arithmetic
capability, I call it a "pointer" not a "reference".

I'd say that common English usage is has it that a pointer can (more or less by
definition) point to /anything/, whereas a reference has its referent bound
into its identity.

Consider a passing reference to an old film. If it were a reference to a
different film, then it would be a /different/ reference. On the other hand
(literally in this case), I can point with my finger at a book, or a DVD, or at
a tree outside the window -- but it's still the same finger...

That distinction between the two concepts is reflected well in (what I would
call the normal) technical/jargon use of the two words.

Mind you, I do agree that:
Personally, I wish the Java designers had reclaimed
"pointer" for anything that points, and explained that you
can't do arithmetic on them.

but not because I think that "pointer" is a better word for Java's
"references", but because the dual meaning of "reference" seems to cause a
great deal of unecessary confusion.

-- chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top