Latest Usenet C Argument

SM Ryan · Dec 21, 2007

# Why does C use zero-based indexing?
#
# Many people find zero-based indexing difficult and non-intuitive. Why
# doesn't C start indices from 1 instead? This would be much easier for
# people.

#include <stdio.h>

#define array(lwb,upb,Element,name) \
Element name##_ARRAY[(upb)-(lwb)+1], \
*name = name##_ARRAY-(lwb)

int main(int N,char **P) {
array(1,5,int,twice); int i;
for (i=1; i<=5; i++) twice = 2*i;
for (i=1; i<=5; i++) printf("i=%d twice=%d\n",i,twice);
return 0;
}

Philip Potter · Dec 21, 2007

SM said:
# Why does C use zero-based indexing?
#
# Many people find zero-based indexing difficult and non-intuitive. Why
# doesn't C start indices from 1 instead? This would be much easier for
# people.

#include <stdio.h>

#define array(lwb,upb,Element,name) \
Element name##_ARRAY[(upb)-(lwb)+1], \
*name = name##_ARRAY-(lwb)

int main(int N,char **P) {
array(1,5,int,twice); int i;
for (i=1; i<=5; i++) twice = 2*i;
for (i=1; i<=5; i++) printf("i=%d twice=%d\n",i,twice);
return 0;
}

Undefined behaviour. The offending expression is twice_ARRAY-(1).

n1256 6.5.6p9 states "If both the pointer operand and the result point
to elements of the same array object, or one past the last element of
the array object, the evaluation shall not produce an overflow;
otherwise, the behavior is undefined."

Phil

Richard Harter · Dec 21, 2007

This might be what you are talking about:
link: http://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html

Probably - it's been a decade or two since I've read the book. I
have the vague impression that that wasn't the only time he
discussed the subject.

SM Ryan · Dec 21, 2007

# n1256 6.5.6p9 states "If both the pointer operand and the result point

That's nice.

It's also cow doo-doo as anyone who has worked on compilers knows. Any compiler
that can't deal with this should flushed down the toilet as the refuse it is.

Any compiler that requires the code displacement portion of a relative address
to be nonnegative is broken by design. This has been known since the fifties
and the first assemblers.

Philip Potter · Dec 21, 2007

SM said:
# n1256 6.5.6p9 states "If both the pointer operand and the result point

That's nice.

It's also cow doo-doo as anyone who has worked on compilers knows. Any compiler
that can't deal with this should flushed down the toilet as the refuse it is.

Perhaps, but the Standard allows such a compiler, and strictly
conforming code will work on such a compiler.

Any compiler that requires the code displacement portion of a relative address
to be nonnegative is broken by design. This has been known since the fifties
and the first assemblers.

C does not place such a requirement. a[-1] is perfectly well designed
provided that a is a pointer to any element but the first in an array.

dj3vande · Dec 21, 2007

Why does C use zero-based indexing?

Many people find zero-based indexing difficult and non-intuitive. Why
doesn't C start indices from 1 instead? This would be much easier for
people.

A. It doesn't. At least, it doesn't have to. See ISO/IEC 9899:1999
6.5.2.1 'Array subscripting', which doesn't even mention zero-based
indexing. The indexing base is implementation-defined, and is often
configurable via a compiler switch, a bit like OPTION BASE 0 / OPTION BASE
1 in various BASICs.
B. An array index is merely another way of describing the number of
objects that sit between the first element in the array and the element we
wish to access. Clearly, if we are trying to access the first element, the
number of objects between is 0, so it makes sense to use 0 as the array
index for the first element.
C. One-based indexing leads to unnecessarily complicated arithmetic,
especially when dealing with multi-dimensional arrays. These complications
just fall away into nothing if zero-based indexing is used.
D. None of the above, because...

It falls naturally out of the definition of array indexing in terms of
pointer arithmetic.
This is close enough to (B) that I'm not going to commit to whether I'm
giving an explanation of (B) or a completion of (D) without more
coffee.

This isn't quite a reason for why array indexing was defined that way,
though. I suspect that the root cause is "because B/BCPL/the PDP-11
did it that way and nobody thought it was worth changing since".

dave

Neil Cerutti · Dec 21, 2007

It falls naturally out of the definition of array indexing in
terms of pointer arithmetic. This is close enough to (B) that
I'm not going to commit to whether I'm giving an explanation of
(B) or a completion of (D) without more coffee.

No, B is actually nonsense. Yes, there are no elements between
the first and the first element. But there are *also* no elements
between the first and the second element.

This isn't quite a reason for why array indexing was defined
that way, though. I suspect that the root cause is "because
B/BCPL/the PDP-11 did it that way and nobody thought it was
worth changing since".

I think the fact that array indexing is syntactic sugar for
pointer arithmetic neatly explains the reasoning. Imagine using
pointer arithmetic if a + 1 equaled a.

Kenneth Brody · Dec 21, 2007

santosh said:
Would they? What if the compilers did the necessary translation. A
recompile would be all that's needed.

Are you sure?

What happens with for-loops starting at zero, and using the loop
variable as a subscript?

If "p[1]" is now the first element, what does "p[0]" mean?

What happens when "*p" and "p[1]" now mean the same thing?

What happens when "*(p+1)" and "p[1]" now mean different things?

So.. no matter how much you complain and how many points you make, it
won't happen, not in C.

Click to expand...

Perhaps you missed the wood for the trees? Richard started this thread
in respose to the other thread asking for regular "quiz" questions.
This is a quiz question. He is very well aware of the answer
~~himself.~~

dj3vande · Dec 21, 2007

[...]

It falls naturally out of the definition of array indexing in
terms of pointer arithmetic. This is close enough to (B) that
I'm not going to commit to whether I'm giving an explanation of
(B) or a completion of (D) without more coffee.

Click to expand...

No, B is actually nonsense. Yes, there are no elements between
the first and the first element. But there are *also* no elements
between the first and the second element.

Ahh. Yes, I was reading "between the first element in the array
and..." as "between the beginning of the array and...".

So it looks like I was at least right about needing more coffee.

dave

Jean-Marc Bourguet · Dec 21, 2007

D - Historical. Whether to use 0 based or 1 based is a language
design question. I can't find my copy of Dijkstra at the moment
but he had some marvelously snarky comments about why 0 based is
better. Fortran used 1 based, though I believe that the later
versions let you use anything you like. I have the impression
that Algol used 0 based and that BCPL was out of the Algol
tradition.

Algol had VLA with index starting where you wanted. Far more sophisticated
than C

See http://www.fh-jena.de/~kleine/history/languages/Algol60-Naur.pdf

Kenneth Brody · Dec 21, 2007

James said:
That's covered by D.

That merely moves the question back one step. Why did BCPL use
zero-based indexing? Keep recursing backwards as needed until you find
an actual reason.

Well, let's all go back to day zero...

In the beginning Brian created the stdio.h and the main().

And the main() was without form, and void; and darkness was
upon the face of the kernel. And the Spirit of Brian moved upon
the face of the kernel.

And Brian said, Let there be int and there was int main().

And Brian saw the int, that it was good: and Brian divided the
int from the void.

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:[email protected]>

Flash Gordon · Dec 21, 2007

Richard Heathfield wrote, On 21/12/07 10:05:

santosh said:

Perhaps you missed the wood for the trees? Richard started this thread
in respose to the other thread asking for regular "quiz" questions.
This is a quiz question. He is very well aware of the answer
~~himself.~~

Click to expand...

Thanks, santosh. The question is already archived, btw - at
http://www.cpax.org.uk/prg/portable/c/luca/luca0.php

but Flash's Wiki might be a better home?

I don't claim to own the Wiki, I only provide the server it runs on.
Having said that, yes, I think the Wiki would be a good home. Then
others who want to post more questions can do so.

pete · Dec 21, 2007

Philip said:
a[-1] is perfectly well designed
provided that a is a pointer to any element but the first in an array.

a[-1] is a valid lvalue if (a) points one past the array, also.

christian.bau · Dec 21, 2007

Why does C use zero-based indexing?

You claim that many people find zero-based indexing counter-intuitive.
Many people find any kind of indexing non-intuitive. However, look
back a few years. How many people were convinced that this millennium
started on Jan. 1st 2000 instead of Jan. 1st 2001? They just couldn't
understand that years form a one-based array.

Now seriously. If we wanted to have non-zero based arrays, they could
easily be added to C in a backwards compatible way. I won't try to
write down the syntax, but you could have

int a [1 : 10];

mean an array with ten elements, indexed from 1 to 10. And

int b [1900 : 2100, -5 : 5];

could mean a two-dimensional array of 201 x 11 elements, with the
first index ranging from 1900 to 2100, the second index from -5 to 5.
Such an array could only be used with the array [] operator, and for
multi-dimensional arrays (or all arrays with lower and upper bound),
comma expressions would not be allowed for indices, same as for
function parameters.

It wouldn't be possible to (reasonably) have an assignment

int *p = a;

Either p would point to the first array element, in which case p[0]
and a[1] would refer to the same array element, which would be more
than weird. On the other hand, p cannot be a pointer to int with the
property that p[1] and a[1] point to the same array element, because
you cannot have a pointer p such that p+1 points to the first element
of an array.

You could obviously write

int *p = &a [1];

which would work as everyone thinks. So with the "enhanced" array
syntax, you wouldn't get automatic decay from array to pointer, and
you wouldn't get the automatic decay from parameters looking like
arrays to parameters of pointer type.

Dik T. Winter · Dec 22, 2007

> I have the impression
> that Algol used 0 based and that BCPL was out of the Algol
> tradition.

In Algol 60 you could use any base. Actually, in the array declaration
you had to supply both lower- and upper-bound of the index. Algol 68
was similar, but when the bound pair started with "1:", the "1:" could be
omitted. I do not think Algol 58 (from which BCPL derives) was sufficiently
developed to have any idea about it.

Bart C · Dec 22, 2007

<snip>

Because array access in C is done in terms of pointer arithmetic. If
indexing began with 1, compilers would very likely need to do an
unnecessary subtraction step in implementing the array access.

This is a fallacy. Sometimes, an offset may incur a runtime penalty, but so
what? And it would be 'offset' by not needing a -1 in user code sometimes.

Having a choice of 0- or 1-based wouldn't have been difficult and would have
kept everyone happy except those wanting any base.

C aims to "deal with the same sort of abstractions" as real machines,
and most hardware architectures start memory addressing at zero. C
follows this for the sake of efficiency and logical clarity.

Processors also have means for applying offsets to addresses.

I think answer C is the one I would choose, if I had to do so.

I would have said D (or F on the link), because 1-based (or a mix) would
have upset the special relationship between arrays and pointer arithmetic.

Bart

Spiros Bousbouras · Dec 22, 2007

Richard said:
Richard said:

Why does C use zero-based indexing?

Click to expand...

C. One-based indexing leads to unnecessarily complicated arithmetic,
especially when dealing with multi-dimensional arrays. These complications
just fall away into nothing if zero-based indexing is used.

Click to expand...

I can't imagine how. Here is a loop over a multidimensional array:
for(i = 0; i < isize; i++)
for(j = 0; j < jsize; j++)
for(k = 0; k < ksize; k++)
f(x[j][k]);

In a C where arrays are indexed from one, the same loop is as follows:
for(i = 1; i <= isize; i++)
for(j = 1; j <= jsize; j++)
for(k = 1; k <= ksize; k++)
f(x[j][k]);

Perhaps this answer is referring to multidimensional arrays which are
faked through single-dimension arrays, in a manner such as:

x[i*rowsize+j]

which accesses the cell at (i,j) where 0 <= i < rowsize and 0 <= j <
colsize, and x has size rowsize*colsize. The same index calculation for
one-based arrays is:

x[(i-1)*rowsize+j]

where 1 <= i <= rowsize and 1 <= j <= colsize.

Or in cases where one has to transform from 2-dimensional
to 1-dimensional arrays and vice versa. Once I was helping
a statistician friend who's rather weak at programming to
write some programmes in Matlab, a language I don't actually
know. As far as I could tell Matlab has fully fledged
multi-dimensional arrays , you don't have to fake anything,
but arrays are 1 based. As part of his statistical
manipulations he often had to transform from 2-d to 1-d or
vice versa and these 1-off corrections were driving me crazy.

Paul Hsieh also gave a good example elsethread.

Other languages are able to be one-indexed because their focus is
different; the arithmetic inefficiencies introduced by one-based
indexing either do not arise or are inconsequential compared to making
the language more familiar to those who prefer one-based indexing (such
as mathematicians).

Click to expand...

What makes you think that mathematicians prefer 1-based ?
I'm one and I prefer 0-based.

Old versions of Perl allowed you to change the starting array index
through the $[ variable. It defaulted to 0, but allowed you to change it
to 1 to make it more familiar to awk and Fortran users.

Click to expand...

Arrays in awk start from 1 ? That's news to me , the versions
of awk I'm familiar with use associative arrays so a "start"
cannot be defined in a natural way.

Any discussion on where array indexing should start ought
to mention the well known quote by Stan Kelly-Bootle:

"Should array indices start at 0 or 1? My compromise of
0.5 was rejected without, I thought, proper consideration."

SM Ryan · Dec 22, 2007

# SM Ryan wrote:
# >
# > # n1256 6.5.6p9 states "If both the pointer operand and the result point
# >
# > That's nice.
# >
# > It's also cow doo-doo as anyone who has worked on compilers knows. Any compiler
# > that can't deal with this should flushed down the toilet as the refuse it is.
#
# Perhaps, but the Standard allows such a compiler, and strictly
# conforming code will work on such a compiler.
#
# > Any compiler that requires the code displacement portion of a relative address
# > to be nonnegative is broken by design. This has been known since the fifties
# > and the first assemblers.
#
# C does not place such a requirement. a[-1] is perfectly well designed

a[-1] and a-1 are different expressions. Any competent compiler optimiser
will analyze subscript into a constant and variant part, A[K+V], and allow
the precomputation of A+K on machines where this is a win so the run time
operation is only biasedA[V] adding the constant during compilation or load
instead of runtime. This is elementary compiler theory.

CBFalconer · Dec 23, 2007

SM said:
.... snip ...

a[-1] and a-1 are different expressions. Any competent compiler
optimiser will analyze subscript into a constant and variant part,
A[K+V], and allow the precomputation of A+K on machines where this
is a win so the run time operation is only biasedA[V] adding the
constant during compilation or load instead of runtime. This is
elementary compiler theory.

Actually, this is WRONG. The C standard specifies that array
accesses are to be converted into pointer + offset accesses. This
leaves only one variety to deal with.

Please fix your non-standard quote character. It makes it
impossible to have proper quotation through multiple levels.

David Thompson · Dec 30, 2007

Old versions of Perl allowed you to change the starting array index
through the $[ variable. It defaulted to 0, but allowed you to change it
to 1 to make it more familiar to awk and Fortran users.

Click to expand...

Arrays in awk start from 1 ? That's news to me , the versions
of awk I'm familiar with use associative arrays so a "start"
cannot be defined in a natural way.

A few language-defined ones are integers:
ARGV [1..ARGC]
the result of split()
(in gawk) the result of asort() and asorti()
$1..NF if you count that as an array (it is in perl FWTW)

and (as a result?) IME it is usual (but not required) for user-defined
arrays whose indices _are_ an integer range to be 1-based.

- formerly david.thompson1 || achar(64) || worldnet.att.net

Command Line Arguments	0	Mar 7, 2023
C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022
jQuery Attribute Summit--Latest Coverage	16	Dec 20, 2009
Argument types	7	Apr 14, 2009
underlying implementation of array and pointer ?	6	Sep 8, 2011
C equivalent of C++ new()?	18	Oct 14, 2010
ccl and stl: design differences	0	Apr 17, 2011
C language now truly universal	0	Jan 1, 2011

Latest Usenet C Argument

SM Ryan

Philip Potter

Richard Harter

SM Ryan

Philip Potter

dj3vande

Neil Cerutti

Kenneth Brody

dj3vande

Jean-Marc Bourguet

Kenneth Brody

Flash Gordon

pete

christian.bau

Dik T. Winter

Bart C

Spiros Bousbouras

SM Ryan

CBFalconer

David Thompson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads