Lists of size Integer.MAX_VALUE

Lew · Jul 7, 2007

What would happen from a call like

someList.addAll( Integer.MAX_VALUE, someCollection );

where someList contains at least Integer.MAX_VALUE elements and someCollection
is not empty?

According to

Inserts the specified element at the specified position in this list (optional operation). and throws
IndexOutOfBoundsException - if the index is out of range (index < 0 || index > size())

Suppose we have a List 'someList' containing exactly Integer.MAX_VALUE
elements and we
someList.add( element );
or
someList.add( Integer.MAX_VALUE, element );

According to the Javadocs, this is legal, but how then could we reference the
last element of the list?

(Presumably we're facing an OOME before these are testable problems in today's
JVMs.)

Guest · Jul 7, 2007

Lew said:
What would happen from a call like

someList.addAll( Integer.MAX_VALUE, someCollection );

where someList contains at least Integer.MAX_VALUE elements and
someCollection is not empty?

Suppose we have a List 'someList' containing exactly Integer.MAX_VALUE
elements and we
someList.add( element );
or
someList.add( Integer.MAX_VALUE, element );

According to the Javadocs, this is legal, but how then could we
reference the last element of the list?

I think the docs are wrong.

If the list is backed by an array, then it will fail.

Arne

Daniel Pitts · Jul 8, 2007

What would happen from a call like

someList.addAll( Integer.MAX_VALUE, someCollection );

where someList contains at least Integer.MAX_VALUE elements and someCollection
is not empty?

According to

Suppose we have a List 'someList' containing exactly Integer.MAX_VALUE
elements and we
someList.add( element );
or
someList.add( Integer.MAX_VALUE, element );

According to the Javadocs, this is legal, but how then could we reference the
last element of the list?

(Presumably we're facing an OOME before these are testable problems in today's
JVMs.)

On a 64 bit machine with > 2gigs of memory, you can test this to see
what happens. I suspect that the original implementors thought "2
billion item arrays are enough for anyone"

Roedy Green · Jul 8, 2007

(Presumably we're facing an OOME before these are testable problems in today's
JVMs.)

Since corner cases are usually where programs fail, it is wise to back
up by one or two from a corner. I don't have a a 64-bit JVM to test
this, but I would guess you will get some other exception such as an
ArrayIndexOutOfBoundsException.

Lew · Jul 8, 2007

Roedy said:
Since corner cases are usually where programs fail, it is wise to back
up by one or two from a corner. I don't have a a 64-bit JVM to test
this, but I would guess you will get some other exception such as an
ArrayIndexOutOfBoundsException.

I have but 1 GB RAM and I don't think this motherboard accepts more than 2 GB,
so I am currently unable to test this, too. I figure similarly to you and
Arne, who said:

I think the docs are wrong.

If the list is backed by an array, then it will fail.

The point here, as with the bug in quicksort that went undetected for about
forty years [1], is that even the designers missed the corner cases.

[1]
<http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html>

Tom Hawtin · Jul 8, 2007

Lew said:
What would happen from a call like

someList.addAll( Integer.MAX_VALUE, someCollection );

where someList contains at least Integer.MAX_VALUE elements and
someCollection is not empty?

It should work correctly.

ArrayList will actually throw ArrayIndexOutOfBounds. That's a subtype of
IllegalArgumentException, which I guess makes some kind of sense.

I guess there are probably bugs if you, say, addAll a collection of more
than Integer.MAX_VALUE elements to an ArrayList.

Integer types with limited ranges suck.

At least it isn't a problem for ArrayList until you have one taking over
16 GB.

Tom Hawtin

Eric Sosman · Jul 8, 2007

Lew said:
[...]
The point here, as with the bug in quicksort that went undetected for
about forty years [1], is that even the designers missed the corner cases.

[1]
<http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html>

An interesting reference, but one which does not contain
the word "quicksort" nor even "quick" ...

Twisted · Jul 8, 2007

Roedy said:
Roedy said:

quoted or indirectly quoted someone who said :

Click to expand...

Since corner cases are usually where programs fail, it is wise to back
up by one or two from a corner. I don't have a a 64-bit JVM to test
this, but I would guess you will get some other exception such as an
ArrayIndexOutOfBoundsException.

Click to expand...

I have but 1 GB RAM and I don't think this motherboard accepts more than 2 GB,
so I am currently unable to test this, too. I figure similarly to you and
Arne, who said:

I think the docs are wrong.

Click to expand...

If the list is backed by an array, then it will fail.

Click to expand...

The point here, as with the bug in quicksort that went undetected for about
forty years [1], is that even the designers missed the corner cases.

Integer overflow is the worst, for some reason; probably because it
violates the usual mathematical intuitions about arithmetic
operations. One of those expensive space disasters -- rocket blowing
up or going in the wrong direction, Mars probe or satellite failing,
or some such -- was eventually traced to a 16-bit integer overflow bug
in some microcontroller code running on an ancient 286-era microchip
in the guts of the thing.

As for Java collections, they're pretty much all backed by arrays
aren't they? I think we may want to eventually propose HugeFoo
versions of these collections. There'd be HugeArray, with long indices
and under the hood an array of arrays indexed by the high 32 and low
32 bits, or something like that. Or perhaps the TreeArray,
conceptually an array but actually a deeper tree, which grows in depth
for every factor of 2^31 in the array length, and uses BigIntegers.
These are a somewhat slower-performing normal array that doesn't take
up much extra space for sizes below 2^31. Your more sophisticated
collection classes, such as HugeArrayList, are straight ports of their
non-Huge cousins to using a HugeArray or TreeArray and using long or
BigInteger parameters.

With TreeArray and BigIntegers, memory capacity is the ONLY limit,
unless BigIntegers have some limitation under the hood. It's likely
they're based on arrays of ints, so limited to (2^32)^(2^31) possible
values, or 2^(2^36). A HugeInteger class, whose implementation is
backed by a TreeArray class implemented to use HugeInteger indices*,
seems warranted ...

* Figuring out how to avoid infinite loops in recursion is left as an
exercise to the reader, but a hint: the HugeInteger should store the
lowest-order 31 bits in a loose int rather than the TreeArray.

Joshua Cranmer · Jul 8, 2007

As for Java collections, they're pretty much all backed by arrays aren't
they?

LinkedList is not, but ArrayList is.

All sets map to respective maps, and I believe that TreeMap does not use
arrays whereas {Linked}HashMap (I believe) is backed by an array.

PriorityQueue is probably array-backed, ConcurrentLinkedQueue not,
SynchronousQueue doesn't even count (you can't even have one element),
and DelayQueue I have no idea.

That sums up to about half of all real implementations of Collections,
which makes sense (one random access, one not for each type), and is a
far cry from "pretty much all".

Patricia Shanahan · Jul 8, 2007

Twisted wrote:
....

As for Java collections, they're pretty much all backed by arrays
aren't they? I think we may want to eventually propose HugeFoo
versions of these collections. There'd be HugeArray, with long indices
and under the hood an array of arrays indexed by the high 32 and low
32 bits, or something like that. Or perhaps the TreeArray,
conceptually an array but actually a deeper tree, which grows in depth
for every factor of 2^31 in the array length, and uses BigIntegers.
These are a somewhat slower-performing normal array that doesn't take
up much extra space for sizes below 2^31. Your more sophisticated
collection classes, such as HugeArrayList, are straight ports of their
non-Huge cousins to using a HugeArray or TreeArray and using long or
BigInteger parameters.

....

I saw similar workarounds going from 16 bits to 24 bits, and from 24
bits to 32 bits. Also going from 32 bits to 64 bits for file sizes and
offsets.

Each time, the final conclusion was that we should be able to have a
single array as large as the memory can support and a single file as
large as the disks can support. I was really hoping we could skip the
intermediate steps this time, and go straight where we are going to end
up anyway.

Patricia

Lew · Jul 8, 2007

Eric said:
Lew said:

[...]
The point here, as with the bug in quicksort that went undetected for
about forty years [1], is that even the designers missed the corner
cases.

[1]
<http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html>

Click to expand...

An interesting reference, but one which does not contain
the word "quicksort" nor even "quick" ...

Abashed - I meant binary search. (And two decades, not four.) I got my
buzzwords crossed.

It still makes the point about corner cases, though, and that large memory
requirements are not so unlikely.

Tom Hawtin · Jul 8, 2007

Twisted said:
Integer overflow is the worst, for some reason; probably because it
violates the usual mathematical intuitions about arithmetic
operations. One of those expensive space disasters -- rocket blowing
up or going in the wrong direction, Mars probe or satellite failing,
or some such -- was eventually traced to a 16-bit integer overflow bug
in some microcontroller code running on an ancient 286-era microchip
in the guts of the thing.

You're thinking of Logica's(?) software in Ariane-5, the first launch of
which in 1996 veered off course during ascent. The software used
Ariane-4 as the base, and wasn't sufficiently checked for assumptions.

As for Java collections, they're pretty much all backed by arrays
aren't they? I think we may want to eventually propose HugeFoo
versions of these collections. There'd be HugeArray, with long indices
and under the hood an array of arrays indexed by the high 32 and low
32 bits, or something like that.

You would just add a 64-bit array to the JRE with intrinsics. NIO
buffers already use longs for addresses under the covers. I think there
is an RFE for 64-bit indexes arrays though (personally, I don't think
it's necessary to place it in the language).

The trouble is that the List *interface* insists on using ints.

Or perhaps the TreeArray,
conceptually an array but actually a deeper tree, which grows in depth
for every factor of 2^31 in the array length, and uses BigIntegers.
These are a somewhat slower-performing normal array that doesn't take
up much extra space for sizes below 2^31. Your more sophisticated
collection classes, such as HugeArrayList, are straight ports of their
non-Huge cousins to using a HugeArray or TreeArray and using long or
BigInteger parameters.

There are lots of approaches to do that sort of thing. IIRC, the
relevant RFE evaluations suggests there is PhD material in that sort of
thing.

With TreeArray and BigIntegers, memory capacity is the ONLY limit,
unless BigIntegers have some limitation under the hood. It's likely
they're based on arrays of ints, so limited to (2^32)^(2^31) possible
values, or 2^(2^36). A HugeInteger class, whose implementation is
backed by a TreeArray class implemented to use HugeInteger indices*,
seems warranted ...

Hmm. I think I'd prefer to see a MediumInteger first., BigInteger isn't
very efficient dealing with integers of only a few hundred bits.

Tom Hawtin

Twisted · Jul 8, 2007

On Jul 8 said:
PriorityQueue is probably array-backed, ConcurrentLinkedQueue not,
SynchronousQueue doesn't even count (you can't even have one element),
and DelayQueue I have no idea.

PriorityQueue is actually probably backed by a disjoint set forest,
and so by trees (plural).

In any event I stand by what I originally said; the most frequently
used concrete collections are HashFoo and ArrayList, which are all
array-backed, with infrequent use of the queues and LinkedList, and
use of TreeFoo downright rare in my experience.

Twisted · Jul 8, 2007

Hmm. I think I'd prefer to see a MediumInteger first., BigInteger isn't
very efficient dealing with integers of only a few hundred bits.

What's needed is a TrueInteger with no range limits and the strategy
pattern used to supply appropriate implementations depending on size.

Roedy Green · Jul 8, 2007

<http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html>

The code Bloch gives to get around the overflow problem fails in other
cases. I explore the various flaws of his three algorithms at

http://mindprod.com/jgloss/average.html

Twisted · Jul 8, 2007

Eric said:
Eric said:

Lew said:

[...]
The point here, as with the bug in quicksort that went undetected for
about forty years [1], is that even the designers missed the corner
cases.
[1]
<http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about...>

Click to expand...

Click to expand...

An interesting reference, but one which does not contain
the word "quicksort" nor even "quick" ...

Click to expand...

Abashed - I meant binary search. (And two decades, not four.) I got my
buzzwords crossed.

Every binary divide and conquer algorithm is likely to be affected in
a majority of implementations. Quicksort included. The net effect
being to restrict utility to half the nominal capacity, e.g. 2^30
rather than 2^31 items in a typical signed-32-bit-integer-based
implementation, and often without clean failure behavior if that is
exceeded (except in Java, where an exception should be thrown, which
is fairly clean). C and C++ programmers of course are long since used
to unclean failure behavior. If your code is hardened against walking
off the ends of arrays or corrupting memory in the event of integer
overflow or other corner cases or simply bad inputs, you can bet at
least one of the libraries you're using (possibly including the
standard C runtime library) isn't, even after you avoid known hazards
like strcpy in favor of e.g. strncpy.

Eric Sosman · Jul 8, 2007

Lew said:
Eric said:

Lew said:

[...]
The point here, as with the bug in quicksort that went undetected for
about forty years [1], is that even the designers missed the corner
cases.

[1]
<http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html>

Click to expand...

An interesting reference, but one which does not contain
the word "quicksort" nor even "quick" ...

Click to expand...

Abashed - I meant binary search. (And two decades, not four.) I got my
buzzwords crossed.

It still makes the point about corner cases, though, and that large
memory requirements are not so unlikely.

The defect Bloch's note describes is not an algorithmic
defect, but a defect in the translation of the algorithm into
terms of a concrete implementation: the algorithm requires the
computation of a midpoint index, and the implementation assumes
this can be done by summing the two extreme indices and dividing
by two. Unfortunately, the assumption fails when the indices
are large enough to make their sum overflow.

Curiously, this particular error is one a C programmer would
not have made, or at least would have been less likely to make.
The C programmer would probably have been dealing not with array
indices but with pointers, and it wouldn't occur to him to try to
form the sum of two pointers (and if it did, the compiler would
scold him). Instead, the C programmer would have found the distance
between the two endpoints, divided the distance by two, and
added/subtracted that half-step to/from the low/high endpoint:

middle = low + (high - low) / 2;

(The boundary conditions would probably be a little different from
those in Java, too, so let's not fret about a plus-or-minus one.)
There's a faint chance the `high - low' computation could overflow,
but only if searching a table of more than 2G single-byte values --
and if the programmer wastes 2GB on such a table, he deserves all
the bad things that happen!

Languages foment (and maybe ferment) their own bug idioms.

Piotr Kobzda · Jul 9, 2007

Lew said:
Suppose we have a List 'someList' containing exactly Integer.MAX_VALUE
elements and we
someList.add( element );
or
someList.add( Integer.MAX_VALUE, element );

According to the Javadocs, this is legal, but how then could we
reference the last element of the list?

someList.listIterator(Integer.MAX_VALUE).next();

piotr

Googmeister · Jul 9, 2007

The code Bloch gives to get around the overflow problem fails in other
cases. I explore the various flaws of his three algorithms at

http://mindprod.com/jgloss/average.html

Right, but these cases aren't applicable to Bloch's updated
binary search code. The original (lo + hi) / 2 code was
definitely a flaw, but I don't think it's fair to say that
his updated ones are flawed since they work exactly as claimed.

Piotr Kobzda · Jul 9, 2007

Piotr said:
someList.listIterator(Integer.MAX_VALUE).next();

Of course, it requires the implementation of ListIterator which don't
fully support its interface (in particular previousIndex()/nextIndex()
methods cannot be fully supported).

Safer approach (in terms of achieving working solution) would be:

lastElement = null;
for(Iterator it = someList.iterator();
it.hasNext(); lastElement = it.next());

But it's poor from the performance point of view...

In fact, the same performance problem applies to the ListLterator's use
(a list must be a bit bigger only to observe that, for example
2*Integer.MAX_VALUE, or more, elements...).

So, currently defined List interface disallows the efficient working
with a "large lists" IMHO.

Some additional methods are needed to support that, for example a list
implementing the following interface would be better:

public interface LargeList<T> extends List<T> {
/** Returns a view of the portion of this list between the
specified fromIndex, inclusive, and this list's last element, inclusive. */
List<T> subList(int fromIndex);
}

Unfortunately, most of the current uses of lists assumes that a maximum
list size is Integer.MAX_VALUE, so the above-like large list won't work
as expected with other libraries (e.g. Collections.reverse() is useless
with such a list...).

Conclusion: large lists are impractical.

piotr

array of lists	14	Dec 10, 2006
Dynamic lists of strings in C	28	Mar 31, 2007
Mergesort algorithm for linked lists	0	Jan 26, 2007
Dynamic select lists using optgroups again - without frameworks	19	Jan 19, 2008
fixing absence of operator[] and at in list	3	Apr 22, 2013
Engineering a List container Part 2: Implementations	20	Dec 8, 2013
Where do two linked lists merge?	0	Jan 26, 2008
Mergesort algorithm for linked lists	51	Jan 22, 2007

Lists of size Integer.MAX_VALUE

Lew

Guest

Daniel Pitts

Roedy Green

Lew

Tom Hawtin

Eric Sosman

Twisted

Joshua Cranmer

Patricia Shanahan

Lew

Tom Hawtin

Twisted

Twisted

Roedy Green

Twisted

Eric Sosman

Piotr Kobzda

Googmeister

Piotr Kobzda

Ask a Question

Similar Threads

Staff online

Members online

Forum statistics

Latest Threads