Java Indexing- Historical question

G

Gilbert

I was asked this weekend by one of our trainee programmers, why for a list with a single entry does list.size() report '1' but to access it you need to use list.get(0). Now it's a convention that I have accepted for years without really thinking about it - and he has a point, it is slightly counter-intuitive. It's not always been like this, I remember when I started programming in the mid 19<mumbles> on IBM gear using RPG & RPGII, that array indexing started at 1.

Now I suspect that when Java was designed, an earlier convention was followed. I'm not sure that I would believe that it was a capricious design decision - so does anyone know who decided that indexing from zero was the way to go, when, and most interestingly why?

Regards
 
M

Mike Schilling

Gilbert said:
I was asked this weekend by one of our trainee programmers, why for
a
list with a single entry does list.size() report '1' but to access
it
you need to use list.get(0). Now it's a convention that I have
accepted for years without really thinking about it - and he has a
point, it is slightly counter-intuitive. It's not always been like
this, I remember when I started programming in the mid 19<mumbles>
on
IBM gear using RPG & RPGII, that array indexing started at 1.

Now I suspect that when Java was designed, an earlier convention was
followed. I'm not sure that I would believe that it was a capricious
design decision - so does anyone know who decided that indexing from
zero was the way to go, when, and most interestingly why?


All languages that derive from C use zero-based indexing. Whether
that's counter-intuitive or nor depends on what you're used to.
 
J

Joshua Cranmer

Gilbert said:
Now I suspect that when Java was designed, an earlier convention was
followed. I'm not sure that I would believe that it was a capricious
design decision - so does anyone know who decided that indexing from
zero was the way to go, when, and most interestingly why?

Java was originally based off of C++, whose lineage came from C. C is
designed to be close to the machine architecture (indeed, I have seen it
called "portable assembly"). If you're working directly at the level of
memory, 0-based indexing makes a lot more sense: RAM chips start
counting at 0; pointers start pointing at 0, etc. An array access
arr[ind] is translated into *(arr + ind)--if you start at 0, the pointer
points to the first element.

Edgar Dijkstra also makes a case for 0-based indexing based on the fact
that an array iteration should be of the form |a <= i < b| over |a < i
<= b|, |a < i < b|, |a <= i <= b|, which leads to the more natural
choice of 0 <= i < N. The latter two forms are disfavored because (b-a)
is not the length of the array; the second form is disfavored because a
lower bound of |<| is more difficult to type.

I strongly dislike any programming language with 1-based indexing.
 
N

Norbert Ziegler

Gilbert said:
I was asked this weekend by one of our trainee programmers, why for a list with a single entry does list.size() report '1' but to access it you need to use list.get(0). Now it's a convention that I have accepted for years without really thinking about it - and he has a point, it is slightly counter-intuitive. It's not always been like this, I remember when I started programming in the mid 19<mumbles> on IBM gear using RPG & RPGII, that array indexing started at 1.

Now I suspect that when Java was designed, an earlier convention was followed. I'm not sure that I would believe that it was a capricious design decision - so does anyone know who decided that indexing from zero was the way to go, when, and most interestingly why?

Regards

1. Next time could you be so kind as to use an normal
newsgroup-textformat an not to make longlines which needs hor-scrolling?
That makes it nearly impossible to read and to answer to it.
Thanks.

2. If an array / collection starts with Index 0 or 1 has nothing to do
with the size of the array and even if there is only 1 element or
thousands in it.

The decision to define the first Index by 0 or 1 is just a matter of
taste of those guys who invented it. Both sides have advantages:
- starting with 0: you will make the experience that working with
indices of an array you'll need less use of using offset (adding /
subtracting 1)

- starting with 1: seems to be a little more intuitive. But that's only
the force of habit. If you work with the "0" for a long time and switch
to 1 than you will just think the other way round

3. When Java was designed they had several design goals. Starting with 0
or 1 has - in my opinion - not a lot to do with design: simple think
about it a short while an than decide. And in this case I suppose they
just took over the 0 from C.
Design is -in this case- something like "don't let us make the same
mistakes again" that leads to things like GC (no nees to clean up the
memory yourself) and avoiding multiple inheritance.

By the way: some programming languages allow it to specify the
start-index of an array as you like: e.g. array starts at index -42.
That's just a gimmick and capricious, not really necessary.

Regards, Norbert
 
D

Daniel Pitts

Gilbert said:
I was asked this weekend by one of our trainee programmers, why for a list with a single entry does list.size() report '1' but to access it you need to use list.get(0). Now it's a convention that I have accepted for years without really thinking about it - and he has a point, it is slightly counter-intuitive. It's not always been like this, I remember when I started programming in the mid 19<mumbles> on IBM gear using RPG & RPGII, that array indexing started at 1.

Now I suspect that when Java was designed, an earlier convention was followed. I'm not sure that I would believe that it was a capricious design decision - so does anyone know who decided that indexing from zero was the way to go, when, and most interestingly why?

Regards
It is only counter intuitive if you don't think about the "array access"
concept. In C (as well as many other languages, including many machine
languages), an "array" is stored as a pointer to the first value.
The "index" is the number you need to add to that pointer to get the
specified value. So, zero makes sense for the first value, one for the
second, etc...

The way that I think about lists/arrays is that the valid elements are
[0,lengthof list)
 
A

angrybaldguy

I was asked this weekend by one of our trainee programmers, why for a list with a single entry does list.size() report '1' but to access it you need to use list.get(0). Now it's a convention that I have accepted for years without really thinking about it - and he has a point, it is slightly counter-intuitive.  It's not always been like this, I remember when I started programming in the mid 19<mumbles> on IBM gear using RPG & RPGII, that array indexing started at 1.

Now I suspect that when Java was designed, an earlier convention was followed. I'm not sure that I would believe that it was a capricious design decision - so does anyone know who decided that indexing from zero was the way to go, when, and most interestingly why?

Regards

Dijkstra explains it all:
http://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html

-o
 
J

Joshua Cranmer

Wayne said:
Joshua has it right. Starting with zero means arrays can be used
with:
base + (index * sizeof(element) )

Most hardware has efficient support for this multiply-and-add
operation, making zero-based arrays extremely efficient. To
start with 1 or an arbitrary lower limit the calculation must be:

base + ( (index-low_limit) * sizeof(element) )

Two rebuttals that make this point rather moot in modern days:
1. You could just offset base by low_limit*sizeof(element) (although
that is annoying)
2. In many cases, optimizers are probably making tight loop array
accesses mere pointer arithmetic anyways, so there's no overbearing
performance limitation.
On a related note, Java is missing an integer range which could
be used something like this:
Foo [] foo = new Foo[1..10];
or as:
int[1..100] num;

Then you get the problem that foo[1] could be the 1st element, the 2nd
element, 9th element, etc. And what would you do when passing around arrays?

I would classify user-specified array ranging as "good in theory, not so
good in practice."
 
M

Mike Schilling

Wayne said:
<demented_rambling>
I dislike programming languages that force a decision like this
on the programmer and sometimes have wished for constructors that
take the initial lower bound of the index as an optional argument,
so when it makes sense to do so I could create an array or List
with a lower index of 1, -7 (which came up recently when coding
a backtracking solution to the 8-queens problem), or any value
that simplifies the implementation of the logic.

IIRC, Pascal allows that, or even indexing on an enumerated type. In
Java, it would be pretty trivial to create a List-like class that
allows indexes between an arbitrary m and n. Strictly speaking, it
wouldn't be a List, of course.
 
M

Martin Gregorie

Then you get the problem that foo[1] could be the 1st element, the 2nd
element, 9th element, etc. And what would you do when passing around
arrays?
Simple. In a well-designed language that supports a variable lower bound
and bound checking the bounds are part of the array object, so you could
write something like this:

real a[-5..149];
for int i from a.lwb to a.upb by 1 do
a=0;

An array passed as a procedure argument takes its bounds with it. The
more dimensions the array has the more convenient this becomes. Just
don't ask me to remember the syntax for that - its 30 years since I used
Algol 68!
 
J

John B. Matthews

Martin Gregorie said:
Then you get the problem that foo[1] could be the 1st element, the 2nd
element, 9th element, etc. And what would you do when passing around
arrays?
Simple. In a well-designed language that supports a variable lower bound
and bound checking the bounds are part of the array object, so you could
write something like this:

real a[-5..149];
for int i from a.lwb to a.upb by 1 do
a=0;

An array passed as a procedure argument takes its bounds with it. The
more dimensions the array has the more convenient this becomes. Just
don't ask me to remember the syntax for that - its 30 years since I used
Algol 68!


This versatile approach inspired that found in Ada, as suggested by the
examples near the end:

<http://www.adaic.com/standards/05rm/html/RM-3-6.html>

Like the Java for-each loop, Ada attributes make iteration more readable
and less error-prone:

<http://www.adaic.com/standards/05rm/html/RM-3-6-2.html>

When used for index ranges in higher dimensions, this syntax is
especially convenient:

<http://www.adaic.com/standards/05rm/html/RM-G-3-1.html>
 
A

Arved Sandstrom

Then you get the problem that foo[1] could be the 1st element, the 2nd
element, 9th element, etc. And what would you do when passing around
arrays?
Simple. In a well-designed language that supports a variable lower bound
and bound checking the bounds are part of the array object, so you could
write something like this:

real a[-5..149];
for int i from a.lwb to a.upb by 1 do
a=0;

An array passed as a procedure argument takes its bounds with it. The
more dimensions the array has the more convenient this becomes. Just
don't ask me to remember the syntax for that - its 30 years since I used
Algol 68!


FORTRAN, starting with FORTRAN 77, supports variable lower bounds.
FORTRAN 90 added LBOUND and UBOUND as intrinsic bounds checking functions.

Haskell arrays also make the defined bounds available to the programmer.

AHS
 
A

Arved Sandstrom

On Fri, 23 Jan 2009 21:32:08 +0100, Norbert Ziegler wrote:

[ SNIP ]
By the way: some programming languages allow it to specify the
start-index of an array as you like: e.g. array starts at index -42.
That's just a gimmick and capricious, not really necessary.

Regards, Norbert

It's nothing to do with necessary - every programming language has loads
of "unnecessary" stuff.

Being able to specify array bounds, as opposed to just specifying
dimension extents, is not gimmicky (I don't see how it could be
capricious anyway :)). It can remove one level of indirection between
the indices into the array and the meaning of those indices in the
problem domain.

Indices often enough are just counting numbers, in which case it doesn't
really matter. But frequently they are more along the lines of
coordinates (perhaps integer, perhaps floating point), in which case you
need to at most scale your coordinates to map to indices; you no longer
need to translate...if you can specify arbitrary array bounds.

In languages where this facility is available, _and_ intelligently used,
arrays with explicit bounds specified provide more information than
arrays with only extents. For example, in FORTRAN declaring

COMPLEX :: mat(1:10,1:10)

has the same indices as

COMPLEX :: mat(10,10)

but the fact that you explicitly declared the bounds makes a logical
statement. And clearly

COMPLEX :: mat(-10:10,-10:10)

says quite a bit more than

COMPLEX :: mat(21,21)

AHS (not a datatype)
 
R

Roedy Green

Now I suspect that when Java was designed, an earlier convention was followed. I'm not sure that I would believe that it was a capricious design decision - so does anyone know who decided that indexing from zero was the way to go, when, and most interestingly why?

In a simple CPU, an array of bytes is indexed by adding the index to
the base address. It then more convenient to use 0 for the first slot
since it is at offset 0 from the base.

When C was designed as a sort of portable assembler, this mode of
thinking was built in. Java copied C.

--
Roedy Green Canadian Mind Products
http://mindprod.com

We are almost certainly going to miss our [global warming] deadline.
We cannot get the 10 lost years back, and by the time a new global agreement to
replace the Kyoto accord is negotiated and put into effect, there will probably
not be enough time left to stop the warming short of the point where we must not
go. ~ Gwynne Dyer
 
R

Roedy Green

I was asked this weekend by one of our trainee programmers, why for a list with a single entry does list.size() report '1' but to access it you need to use list.get(0). Now it's a convention that I have accepted for years without really thinking about it - and he has a point, it is slightly counter-intuitive. It's not always been like this, I remember when I started programming in the mid 19<mumbles> on IBM gear using RPG & RPGII, that array indexing started at 1.

there are some advantages to starting with 0. See
http://mindprod.com/jgloss/array.html#GOTCHAS
--
Roedy Green Canadian Mind Products
http://mindprod.com

We are almost certainly going to miss our [global warming] deadline.
We cannot get the 10 lost years back, and by the time a new global agreement to
replace the Kyoto accord is negotiated and put into effect, there will probably
not be enough time left to stop the warming short of the point where we must not
go. ~ Gwynne Dyer
 
A

Arne Vajhøj

Roedy said:
In a simple CPU, an array of bytes is indexed by adding the index to
the base address. It then more convenient to use 0 for the first slot
since it is at offset 0 from the base.

If array indexes started with 1 then they would need to add to
base address - 1 byte.

Other languages use indexes starting with 1 and they work fine
also on simple CPU's.
When C was designed as a sort of portable assembler, this mode of
thinking was built in.

C has nothing to do with assembler.

But it is a low level high level language.
Java copied C.

Java copied C++. C++ copied C.

Arne
 
A

Arne Vajhøj

Mike said:
IIRC, Pascal allows that, or even indexing on an enumerated type.

The entire Pascal/Modula-2/Ada family has the this feature.

It is one of the features that make those languages real type safe.

Arne
 
A

Arne Vajhøj

Wayne said:
You remember correctly, that's what I was thinking of when
I wrote that. Some modern languages (e.g., PHP) allow this
as well AFAIK.

Not really.

PHP arrays are more like Map<>.

Arne
 
B

Ben Kaufman

I was asked this weekend by one of our trainee programmers, why for a list with a single entry does list.size() report '1' but to access it you need to use list.get(0). Now it's a convention that I have accepted for years without really thinking about it - and he has a point, it is slightly counter-intuitive. It's not always been like this, I remember when I started programming in the mid 19<mumbles> on IBM gear using RPG & RPGII, that array indexing started at 1.

Now I suspect that when Java was designed, an earlier convention was followed. I'm not sure that I would believe that it was a capricious design decision - so does anyone know who decided that indexing from zero was the way to go, when, and most interestingly why?

Regards

I think that this is a very intuitive way of thinking about indexes.
If you ever programmed assembly language you would see that indexed addressing
is an offset from the base. The first element being at the base is accessed
with offset 0.
the size(), also is the location where the next entry index goes.

Ben
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top