Java Indexing- Historical question

T

Tom Anderson

That was pretty much my argument above, replying to Norbert. He thinks
that being able to specify lower and upper bounds is unnecessary,
gimmicky and capricious. Well, if he considered the number of programmer
hours that's been wasted in tracking down bugs related to offset errors
in array indexing, he might think differently.

Hang on, are you saying that letting programmers specify starting indices
would *reduce* the number of hours spent doing that?

tom
 
T

Tom Anderson

Me too, but only because I always forget it.

Same here! Compounded by the fact that i very rarely write xpaths which
involve indices - they feel like gotos to me. But when i do need one, i
often screw it up.

tom
 
L

Lew

Eric said:
Martin said:
This is an interesting and somewhat religious topic. I'm fairly
agnostic on the choice between the two. Nor am I sure the benefits of
allowing the user to specify the bounds would outweigh the cost of
ambiguity in understanding what an array index means in an arbitrary
context. [
The ability to specify upper and lower bounds is probably more useful
in engineering and scientific programs than in commercial code and
processing logic. [...]

long[] endOfYearBalancesThisDecade = new long[2001:2010];

(Imaginary syntax, of course.)

= new long[2000:2009];
 
L

Lew

Eric said:
long[] endOfYearBalancesThisDecade = new long[2001:2010];

(Imaginary syntax, of course.)
= new long[2000:2009];

Patricia said:
This is an interesting illustration of the merits of defining arrays
with meaningful index ranges.

With zero based index ranges, all we could have told from the
declaration is whether the programmer thought there were ten years in
this decade or not. The issue of which years are in the decade would
only have arisen from comments or from looking at some of the code using
the array.

I can't tell if your argument supports varying index ranges or zero-based ones.

The term "decade" means an arbitrary span of ten years. The definition of
"decade" for a particular organization or project depends on the business
rules or domain rules for that organization or project. Things like when a
fiscal year starts affect the decision. Mergers and acquisitions, refactoring
of business processes, and other factors can cause the definition to change
withing the lifetime of the software system.

Defining an array to have its decade start with a given year, say, limits its
flexibility compared to defining only the length of the array and letting
domain-centric logic determine which years belong in it. I imagine that in
the normal case it would be better to store the start year of a "decade" as
data rather than to hard-code it into the source.
 
W

Wojtek

Patricia Shanahan wrote :
Lew said:
Eric said:
Martin Gregorie wrote:
On Sun, 25 Jan 2009 19:27:07 -0800, Tom McGlynn wrote:

This is an interesting and somewhat religious topic. I'm fairly
agnostic on the choice between the two. Nor am I sure the benefits of
allowing the user to specify the bounds would outweigh the cost of
ambiguity in understanding what an array index means in an arbitrary
context. [

The ability to specify upper and lower bounds is probably more useful in
engineering and scientific programs than in commercial code and
processing logic. [...]

long[] endOfYearBalancesThisDecade = new long[2001:2010];

(Imaginary syntax, of course.)

= new long[2000:2009];

This is an interesting illustration of the merits of defining arrays
with meaningful index ranges.

With zero based index ranges, all we could have told from the
declaration is whether the programmer thought there were ten years in
this decade or not. The issue of which years are in the decade would
only have arisen from comments or from looking at some of the code using
the array.

So how do you loop through this? Define constants for min/max? What
happens after 2010, do you need to recompile the application? If you
use variables (persistant storage) then what happens to the data before
the minimum value (2000 and before from the example)? Or does the array
"stretch"?

For storage of this type I use a Map object, usually a TreeMap with the
year being the key. Arbitrary lookup yet I can loop in order.
 
J

John B. Matthews

[QUOTE="Lew said:
long[] endOfYearBalancesThisDecade = new long[2001:2010];

(Imaginary syntax, of course.)
= new long[2000:2009];

Patricia said:
This is an interesting illustration of the merits of defining
arrays with meaningful index ranges.

With zero based index ranges, all we could have told from the
declaration is whether the programmer thought there were ten years
in this decade or not. The issue of which years are in the decade
would only have arisen from comments or from looking at some of the
code using the array.

I can't tell if your argument supports varying index ranges or zero-based
ones.

The term "decade" means an arbitrary span of ten years. The
definition of "decade" for a particular organization or project
depends on the business rules or domain rules for that organization
or project. Things like when a fiscal year starts affect the
decision. Mergers and acquisitions, refactoring of business
processes, and other factors can cause the definition to change
withing the lifetime of the software system.

Defining an array to have its decade start with a given year, say,
limits its flexibility compared to defining only the length of the
array and letting domain-centric logic determine which years belong
in it. I imagine that in the normal case it would be better to store
the start year of a "decade" as data rather than to hard-code it into
the source.[/QUOTE]

I think Patricia meant that explicit index ranges might offer improved
expressiveness. Your valid points address whether the given values
express a particular program's intent. Of course, both are essential to
evaluating correctness.

In the following examples, I compare Java to Ada not to express a
preference but merely for reference. Java strings are indexed from 0,
slightly complicating the API invariants with -1; Ada's most primitive
String type are indexed from 1, with arguably simpler invariants.
Similarly, Java arrays representing matrices are indexed from 0. In
Ada, a polynomial's coefficient array may start with index 0, while the
corresponding array of roots may start with index 1. The Ada library
support for matrices is generic, so any suitable discrete type may be
used as the index.

As another example, an enumerated type in Ada may serve as an array
index:

type Season is (Winter, Spring, Summer, Fall);
S : Season;
A : array(Season) of Natural := (others => 0);
for S in Suit'Range loop
-- only valid values of S are possible
A(S) := 1;
end loop;

Similarly, in Java, a program may safely traverse the values of an enum:

enum Season { WINTER, SPRING, SUMMER, FALL }
static int[] a = new int[Season.values().length];
public static void main(String[] args) {
for (Season season : Season.values()) {
// only valid values of season are possible
a[season.ordinal()] = 1;
}
}

Would there be any value to allowing a[season]? An EnumSet seems more
apt.

Along the same lines, Wojtek asks "So how do you loop through this?
Define constants for min/max?" In languages that support it, the array
bounds are typically defined by the declaration itself. In Java, the
for-each construct could use the explicit bounds for iteration, and the
usual array bounds checking would apply to normal indexing.
 
W

Wojtek

Patricia Shanahan wrote :
These are all very good questions about the "this decade" concept. In a
language with meaningful array index ranges, the array declaration would
effectively propose a set of answers to all of them. For example, if the
program treated "this decade" as meaning the ten year range starting at
the start of a 0 mod 10 year number containing the current year, the
array index range declaration would show that.

The point I am making is not that either of the answers implied in the
quoted messages is right, but that it is good to make the declaration
raise, and answer, the questions. Forcing arrays to start at 0 puts the
questions and their answers off, implicit in the code using the array
rather than explicit in its declaration.

I understand the concept.

However when I am dealing with arbitrary keys such as years, I tend not
to use an array. I can see no real benefit in using an array and having
to manage indice ranges as opposed to using a Map of some kind.

Languages that allow indice ranges were born before collections were
easy to use and had good speed (I think...). And trying to retrieve a
key outside the indice range would produce an array out of bounds
exception, whereas you can test for a null map.get(), which I think is
cleaner to read than a try/catch block.

YMMV
 
M

Mark Thornton

Wojtek said:
Languages that allow indice ranges were born before collections were
easy to use and had good speed (I think...). And trying to retrieve a

Arrays are still a lot faster where the key is a dense range of
integers. The early languages were mostly targeted at applications where
performance was (and remains) critical.

Mark Thornton
 
A

Arved Sandstrom

Hang on, are you saying that letting programmers specify starting
indices would *reduce* the number of hours spent doing that?

tom

Yes, I am saying that.

AHS
 
T

Tom McGlynn

Arrays are still a lot faster where the key is a dense range of
integers. The early languages were mostly targeted at applications where
performance was (and remains) critical.

Mark Thornton


And when the array is of primitives there should be a dramatic benefit
in memory usage as well. I often manipulate 16K x 16k or larger
images. I think it will be a while before one of these babies is
going to fit nicely into a collection. Though I confess I've never
actualy tried... Perhaps someday I'll be surprised.

Tom
 
T

Tom Anderson

And when the array is of primitives there should be a dramatic benefit
in memory usage as well. I often manipulate 16K x 16k or larger images.
I think it will be a while before one of these babies is going to fit
nicely into a collection. Though I confess I've never actualy tried...
Perhaps someday I'll be surprised.

It would be simple enought to write an ArrayMap<T> (or
RestrictedRangeIntegerKeyedMap<T> if you prefer) implements Map<Integer,
T> that only accepted keys in a given range, and stored its elements in an
array. Like EnumMap does. It would be as memory-efficient as an array, and
as fast apart from the boxing (which we all hope future JVMs will be
better at optimising away).

tom
 
M

Mark Thornton

Tom said:
It would be simple enought to write an ArrayMap<T> (or
RestrictedRangeIntegerKeyedMap<T> if you prefer) implements Map<Integer,
T> that only accepted keys in a given range, and stored its elements in
an array. Like EnumMap does. It would be as memory-efficient as an
array, and as fast apart from the boxing (which we all hope future JVMs
will be better at optimising away).

Boxing of primitives costs memory too. The ability to eliminate the
boxing involved in Integer[] vs int[] is still a long way off (and may
require a change in the language so that new Integer.valueOf(x) ==
Integer.valueOf(x) for all x).

Mark Thornton
 
L

Lew

Mark said:
Boxing of primitives costs memory too. The ability to eliminate the
boxing involved in Integer[] vs int[] is still a long way off (and may
require a change in the language so that
new Integer.valueOf(x) == Integer.valueOf(x) for all x).

You mean "so that mew Integer(x) == Integer.valueOf(x) for all x", do
you not?

That could run into memory trouble, too. You wouldn't be able to GC
old Integer instances.

The ability to eliminate many, if not most uses of auto(un)boxing is
already present in principle with Hotspot compilation. There is not a
whole lot of call to box and unbox entire arrays at once; generally
the need is element by element. For that, Hotspot should be able to
bypass the unnecessary conversions at run time. If it's doing so to
build an Integer[], then presumably it would use 'Integer.valueOf()'
even today, and duplicate values will share their instances. This
does not require that all Integer values always share instances for
equal values.
 
M

Mark Thornton

Lew said:
Mark said:
Boxing of primitives costs memory too. The ability to eliminate the
boxing involved in Integer[] vs int[] is still a long way off (and may
require a change in the language so that
new Integer.valueOf(x) == Integer.valueOf(x) for all x).

You mean "so that mew Integer(x) == Integer.valueOf(x) for all x", do
you not?

I meant to write "Integer.valueOf(x) == Integer.valueOf(x) for all x"
Currently this is only true for small values of x. Essentially the
problem is that == and equals() are different, while to be able to
replace Integer[] with int[] you need them to be the same, otherwise the
replacement is visible. An int element takes just 4 bytes while (on a 32
bit VM) an Integer element of an Integer array takes 20 bytes.

Mark Thornton
 
T

Tom Anderson

Boxing of primitives costs memory too.

True. I was really responding to your original comment about situations
where the key is a dense range of integers, rather than Mr McGlynn's
comment about collections where the values are also primitives. I'm not
sure why i responded to his post rather than yours - apology for any
confusion.

Anyway, in the integer-key case, there's no memory cost: the boxed
integers wouldn't be stored, they'd only exist to get the indices/keys
between the calling code and the map innards. They'd live their brief
lives and die in eden.

To expand on my description above, i'm thinking of something like:

public class ArrayMap<V> implements SortedMap<Integer, V> {
private int base;
private Object[] values; // because you can't declare a V[]
public ArrayMap(int base, int size) {
this.base = base;
values = new Object[size];
}
public V get(Integer key) {
int index = key - base;
@SuppressWarnings("unchecked")
V value (V)values[index];
return value;
}
}

Used like this:

SortedMap<Integer, Foo> foosByYear = new ArrayMap<Foo>(2005, 10);
for (int year: foosByYear.keySet()) {
process(year, foosByYear.get(year);
}

You'd actually write that as a loop over the entrySet, but that serves to
illustrate the general use pattern.

If the run-time compiler could work out that the map was definitely an
ArrayMap, i think the key operations can all be reduced to some very
simple and efficient code. Hopefully.
The ability to eliminate the boxing involved in Integer[] vs int[] is
still a long way off

Perhaps. It would certainly be needed for the array-of-pixels type
situation.
(and may require a change in the language so that new Integer.valueOf(x)
== Integer.valueOf(x) for all x).

Escape analysis means you don't need that to be able to do deboxing in at
least some cases, but yes, a change along those lines might help make it
more widely possible.

tom
 
M

Mark Thornton

Tom said:
Anyway, in the integer-key case, there's no memory cost: the boxed
integers wouldn't be stored, they'd only exist to get the indices/keys
between the calling code and the map innards. They'd live their brief
lives and die in eden.

I was thinking of primitive values as well.

Mark Thornton
 
L

Lew

Peter said:
What's "mew"?  Why would Mark have meant that instead of "new"?

Good catch - typo for "new".

And it wasn't instead of "new", it was instead of "new Integer.valueOf
(x)".
 
T

Tom McGlynn

It would be simple enought to write an ArrayMap<T> (or
RestrictedRangeIntegerKeyedMap<T> if you prefer) implements Map<Integer,
T> that only accepted keys in a given range, and stored its elements in an
array. Like EnumMap does. It would be as memory-efficient as an array, and
as fast apart from the boxing (which we all hope future JVMs will be
better at optimising away).

tom
You're absolutely right. I was thinking using more standard, existing
Collections classes, since what you are proposing simply wraps the
existing arrays without giving me the advantages that the standard
collections classes have over smaller arrays. E.g., I couldn't grow
such a huge collection without writing my own specialized routines
that handle how to efficiently grow such large arrays. So I lose the
nice notation of arrays without any corresponding benefit.

Regards,
Tom McGlynn
 
B

blue indigo

Mark said:
Boxing of primitives costs memory too. The ability to eliminate the
boxing involved in Integer[] vs int[] is still a long way off (and may
require a change in the language so that
new Integer.valueOf(x) == Integer.valueOf(x) for all x).

You mean "so that mew Integer(x) == Integer.valueOf(x) for all x", do
you not?

You mean "so that new Integer(x) == Integer.valueOf(x) for all x", do
you not?
That could run into memory trouble, too. You wouldn't be able to GC
old Integer instances.

Sure you could. You'd just have to make sure that you tracked live
instances, and if one was called for with a given value, and there was an
existing one with that value, the existing one was returned.

The more serious difficulty is that the semantics of "new" are violated if
it sometimes produces a pre-existing instance, and the semantics of "=="
are violated if it compares equal two distinct objects. In particular, new
Integer(3) == new Integer(3) must evaluate to false per the present spec,
but can't per your suggestion above.

Better is just to specify that valueOf will recycle an instance if at all
possible, and that boxing uses valueOf. I think the latter is already
true. Then it is required that Integer.valueOf(3) == Integer.valueOf(3),
and so optimizations that avoid boxing an integer don't change the ==
semantics of the integers versus if they were still boxed. (This would
also require that if code compared an int with an Integer using ==, this
was either illegal or returned true if the integers had the same value.)

That said, the specific case at issue, a specialized Map<Integer,Foo>
implementation, is probably already heavily optimizable. If the class
internals use an array with int indices, then typical usage has class
methods called with int arguments, autoboxing them, and the methods
immediately unboxing them again. Once Hotspot determines that the virtual
method calls of Map methods from some piece of heavily-used code is always
going to this particular Map implementation, it can inline the methods in
question, and then the integer is being boxed and unboxed immediately in
the same flat piece of code, the Integer reference provably doesn't
escape, and the optimizer can therefore get rid of the boxing entirely.

Of course, Hotspot will only do this in heavily-executing code, but then,
it only really matters in heavily-executing code. Anywhere else, the CPU
overhead of making Integer objects is irrelevant and it's not
memory-inefficient either since the objects will die long before being
tenured by the garbage collector. Their memory-management CPU cost becomes
a pointer-bump and their memory use is too temporary to be a concern.
 
M

Mark Thornton

blue said:
Mark said:
Boxing of primitives costs memory too. The ability to eliminate the
boxing involved in Integer[] vs int[] is still a long way off (and may
require a change in the language so that
new Integer.valueOf(x) == Integer.valueOf(x) for all x).
You mean "so that mew Integer(x) == Integer.valueOf(x) for all x", do
you not?

You mean "so that new Integer(x) == Integer.valueOf(x) for all x", do
you not?

No, because outside a small range Integer.valueOf(x) is implements as
new Integer(x).
Sure you could. You'd just have to make sure that you tracked live
instances, and if one was called for with a given value, and there was an
existing one with that value, the existing one was returned.

This is just too expensive.
The more serious difficulty is that the semantics of "new" are violated if
it sometimes produces a pre-existing instance, and the semantics of "=="
are violated if it compares equal two distinct objects. In particular, new
Integer(3) == new Integer(3) must evaluate to false per the present spec,
but can't per your suggestion above.
Exactly.

Mark Thornton
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,175
Latest member
Vinay Kumar_ Nevatia
Top