Java Indexing- Historical question

L

Lew

Ben Kaufman wrote:

I'm guessing James Gosling, and everyone else who moved Oak into Java.

From the very start - January 23, 1996, wasn't it?

Because it's the right way to do array indexes if you only allow a single
index base.

It allows convenient [start, end) semantics, 'length' represents the next
insertion index, ((rangeEnd - rangeBeginning) == rangeLength), for() loops are
compactly expressible, and it's familiar to anyone trained in Algol-family
languages, just to recap a few of the many answers upthread.
 
A

Arne Vajhøj

Because it's the right way to do array indexes if you only allow a
single index base.

It allows convenient [start, end) semantics, 'length' represents the
next insertion index, ((rangeEnd - rangeBeginning) == rangeLength),
for() loops are compactly expressible, and it's familiar to anyone
trained in Algol-family languages, just to recap a few of the many
answers upthread.

But Algol arrays does not always start in zero as in Java
AFAIK (never used ALGOL myself).

Arne
 
L

Lew

Patricia said:
I don't understand the "for() loops are compactly expressible" argument.

for(int i=1; i <= array.length; i++) is only one character longer than
for(int i=0; i < array.length; i++), not a very significant gain.

I didn't say much more compactly.

I was thinking of descending indexes, but wrongly since in that case there is
no difference:

for ( int n = arr.length; n-- > 0; perform( arr [n] )); // 0
for ( int n = arr.length; n > 0; perform( arr [n--] )); // 1

You're absolutely correct.
 
A

Arne Vajhøj

Patricia said:
Arne said:
Lew said:
Now I suspect that when Java was designed, an earlier convention
was followed. I'm not sure that I would believe that it was a
capricious design decision - so does anyone know who decided that
indexing from zero was the way to go,
Because it's the right way to do array indexes if you only allow a
single index base.

It allows convenient [start, end) semantics, 'length' represents the
next insertion index, ((rangeEnd - rangeBeginning) == rangeLength),
for() loops are compactly expressible, and it's familiar to anyone
trained in Algol-family languages, just to recap a few of the many
answers upthread.

But Algol arrays does not always start in zero as in Java
AFAIK (never used ALGOL myself).

And Fortran array dimensions start at one by default, presumably because
that matches conventional vector and matrix notation, simplifying
translation of mathematical formulas relating to them.

I don't understand the "for() loops are compactly expressible" argument.

for(int i=1; i <= array.length; i++) is only one character longer than
for(int i=0; i < array.length; i++), not a very significant gain. There
would be no difference at all in the for-each style of loop.

In the 1980's I worked on a Fortran compiler. Converting indexes into a
multidimensional array with a specified starting index for each
dimension to zero based requires care, but that care is only required at
one place in a compiler to be applied to all arrays in all programs
compiled by it. I would prefer to trust a piece of compiler code like
that, and let programmers use the index ranges that simplify their
programs.

I am pretty sure that the 0 versus 1 based is more tradition than
technology.

Arne
 
A

Arne Vajhøj

Lew said:
Lew said:
... it's familiar to anyone trained in Algol[sic]-family languages, ...
But Algol arrays does not always start in zero as in Java
AFAIK (never used ALGOL myself).

Most of its descendants do, however. But fair enough: "ALGOL-family
languages other than ALGOL, ..."

Pascal, Modula-2, Ada do not have always start in zero arrays.

It is only the C branch of the family tree (C, C++, Java, C# etc.)
that use that convention.

Well - in all fairness - that branch is very dominant in computing.

Arne
 
M

Mark Thornton

Lew said:
Lew said:
... it's familiar to anyone trained in Algol[sic]-family languages, ...
But Algol arrays does not always start in zero as in Java
AFAIK (never used ALGOL myself).

Most of its descendants do, however. But fair enough: "ALGOL-family
languages other than ALGOL, ..."

Pascal arrays don't have to start at zero either. In the early days at
least I think 1 would have been the most common lower bound in Pascal code.

Mark Thornton
 
T

Tom Anderson

Lew said:
Lew said:
... it's familiar to anyone trained in Algol[sic]-family languages, ...
But Algol arrays does not always start in zero as in Java
AFAIK (never used ALGOL myself).

Most of its descendants do, however. But fair enough: "ALGOL-family
languages other than ALGOL, ..."

Pascal, Modula-2, Ada do not have always start in zero arrays.

It is only the C branch of the family tree (C, C++, Java, C# etc.)
that use that convention.

Well - in all fairness - that branch is very dominant in computing.

FWIW, XPath's list indices start at 1, which i find incredibly annoying.
DOM lists start at 0. It seems very weird that 1-basing was chosen for
XPath.

tom
 
A

Arved Sandstrom

On Sun, 25 Jan 2009 12:41:16 -0800, Patricia Shanahan wrote:

[ SNIP ]
In the 1980's I worked on a Fortran compiler. Converting indexes into a
multidimensional array with a specified starting index for each
dimension to zero based requires care, but that care is only required at
one place in a compiler to be applied to all arrays in all programs
compiled by it. I would prefer to trust a piece of compiler code like
that, and let programmers use the index ranges that simplify their
programs.

Patricia

That was pretty much my argument above, replying to Norbert. He thinks
that being able to specify lower and upper bounds is unnecessary,
gimmicky and capricious. Well, if he considered the number of programmer
hours that's been wasted in tracking down bugs related to offset errors
in array indexing, he might think differently.

AHS
 
S

Stefan Ram

Patricia Shanahan said:
for(int i=1; i <= array.length; i++) is only one character longer than
for(int i=0; i < array.length; i++), not a very significant gain. There

Assume, int values e (»ending«) and s (»start«), with e >= s.

The form »for(i=s;i<e;++i)« has the property that the number
of iterations is e-s. Some even prefer »for(i=s;i!=e;++i)«,
which has the same property, but makes it obvious that the
postcondition is »i==e« (if this postcondition is needed in
a proof).

With »for(i=s;i<=e;++i)«, the number of iterations is e-s+1,
which is slightly more complicated, and to have 0 iterations,
one must use e < s (the ending must come /before/ the start, while
in reality one goes exactly 0 miles if the ending is /at/
the start), which is less intuitive. Also, the form »i!=e«
can not be used to get the postcondition »i==e«.

Therefore, some programmers prefer the first form.
 
J

John W Kennedy

Lew said:
... it's familiar to anyone trained in Algol[sic]-family languages, ...
But Algol arrays does not always start in zero as in Java
AFAIK (never used ALGOL myself).

Most of its descendants do, however. But fair enough: "ALGOL-family
languages other than ALGOL, ..."

PL/I also has variable-base indexing, though it defaults to 1, probably
from its FORTRAN ancestry. It traditionally minimizes the performance
issue by pre-calculating the address where the (0, 0, ... 0) element
would be, even if there isn't any.
 
D

Daniel Pitts

Patricia said:
Of course. In some cases, the best code (simplest, easiest to prove
required properties ...) will result from zero based, in some cases from
one based, and I've seen code that was simpler because the programmer
had defined a three dimensional array with each dimension range -1
through +1.

I don't think replacing forced zero based with forced one based would be
any improvement.

Patricia
At the same time, I would say that the semantics of primitive types
(lists, arrays, etc..) should be consistent across the language. If it
makes more sense to have a 1 based index into a data structure, then
have that be the conventions on the domain data-structure, rather than
complicate the primitive abstraction.
 
T

Tom McGlynn

Ben Kaufman wrote:

I'm guessing James Gosling, and everyone else who moved Oak into Java.

From the very start - January 23, 1996, wasn't it?

Because it's the right way to do array indexes if you only allow a single
index base.

It allows convenient [start, end) semantics, 'length' represents the next
insertion index, ((rangeEnd - rangeBeginning) == rangeLength), for() loops are
compactly expressible, and it's familiar to anyone trained in Algol-family
languages, just to recap a few of the many answers upthread.

This is an interesting and somewhat religious topic. I'm fairly
agnostic on the choice between the two. Nor am I sure the benefits of
allowing the user to specify the bounds would outweigh the cost of
ambiguity in understanding what an array index means in an arbitrary
context. [

One thing that I believe affects how we see this is our paradigm of
the array index. Is it measuring the distance from the beginning of
the array? Here the zero index makes sense. This is a natural model
for image arrays, or arrays specifying a temporal sequence.
Alternatively an array index can be enumerating a set in which there
is no natural order, e.g., people in an organization or a particle in
a simulation. In this view starting counting at one is natural (in
English at least).

A couple of points in favor of one-indexing that I haven't seen
discussed fully here -- apologies if I missed it.

As the OP's noted ot makes it easier and more natural to indicate the
last element, it's just arr[n] rather than arr[n-1]. Forgetting to
subtract one when looking at the last element is a nit that I
occasionally forget [probably about as often as I would forget the =
in the <= I'd need in the for loop were one-based arrays adopted].

Also, when talking about about arrays either orally or in
documentation, there's a real disconnect when you talk about the first
element and second elements of the array and their indices are 0 and
1. I have heard people talk about the 'zeroth' element -- not
realizing that they are thus suggesting that the 'first' element is
actually the second element. At best this is awkward and at worst it
can lead to real misunderstanding of what's going on. There is no
natural way to match ordinal numbers as used in standard English and 0-
based array indexes.

Tom McGlynn
 
L

Lew

Stefan said:
Some even prefer »for(i=s;i!=e;++i)«,
which has the same property, but makes it obvious that the
postcondition is »i==e« (if this postcondition is needed in
a proof).

The problem with that idiom is it is sensitive to the loop body mucking with
'i'. Suppose there were an extraneous '++i' inside the loop.

Suppose 'e' were a more complex collection, say a 'List'.

for( int ix = 0; ix != list.size(); ++ix )
{
foo( list.get( ix ));
}

Something that changes the size of the list could mess up the logic.
for( int ix = 0; ix < list.size(); ++ix )
is safer.
 
M

Mike Schilling

John said:
PL/I also has variable-base indexing, though it defaults to 1,
probably from its FORTRAN ancestry. It traditionally minimizes the
performance issue by pre-calculating the address where the (0, 0,
...
0) element would be, even if there isn't any.

I don't (I'm glad to say) know enough COBOL to know how it does
arrays. Can anyone enlighten me?
 
M

Martin Gregorie

I don't (I'm glad to say) know enough COBOL to know how it does arrays.
Can anyone enlighten me?
COBOL table indexing is 1 based.

01 TABLE-TOTAL PIC S9(18) COMP SYNC.
01 I PIC 9(3) COMP SYNC.
01 MY-TABLE.
05 MY-ELEMENT PIC S9(9) COMP SYNC OCCURS 100 TIMES.

MOVE ZERO TO TABLE-TOTAL.
PERFORM VARYING I FROM 1 BY 1 UNTIL I > 100
ADD MY-ELEMENT(I) TO TABLE-TOTAL
END-PERFORM.
DISPLAY "Table total is " TABLE-TOTAL.


But note the classic COBOL bugbear: the full stop MATTERS but is small
and can be hard to see, especially in the output from some line printers.
A full stop at the end of either of the first two lines of the in-line
perform would cause a compilation error.
 
M

Martin Gregorie

This is an interesting and somewhat religious topic. I'm fairly
agnostic on the choice between the two. Nor am I sure the benefits of
allowing the user to specify the bounds would outweigh the cost of
ambiguity in understanding what an array index means in an arbitrary
context. [
The ability to specify upper and lower bounds is probably more useful in
engineering and scientific programs than in commercial code and
processing logic. For instance, if you're calculating performance for a
small UAV wing section across a speed range you'll probably be using
Reynolds numbers. The interesting range might be from Re 100,000 to
500,000, so declaring lift coefficient array as

double Cl[100000..500000];

or even better (with dynamic declarations) as:

int lowRe = 100000;
int highRe = 500000;
double Cl[lowRe..highRe];

makes for a more readable program.
 
S

Stefan Ram

Eric Sosman said:
long[] endOfYearBalancesThisDecade = new long[2001:2010];

In Java, one might consider arrays not be intended to be
used in applications, but rather as a low-level means to
be used in libraries, which in turn are used by applications.

Thus,

final BalanceList endOfYearBalances = new BalanceList( 2001, 2010 );

.
 
M

Martin Gregorie

Eric Sosman said:
long[] endOfYearBalancesThisDecade = new long[2001:2010];

In Java, one might consider arrays not be intended to be used in
applications, but rather as a low-level means to be used in libraries,
which in turn are used by applications.

Thus,

final BalanceList endOfYearBalances = new BalanceList( 2001, 2010 );

.
Besides, most non-trivial applications of this type will be using a
database table for, e.g. year-related data, rather than an array.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,175
Latest member
Vinay Kumar_ Nevatia
Top