Why index starts in C from 0 and not 1

K

kapilk

Sir,

I know that the array index starts in C from 0 and not 1 can any
body pls. tell me the reason.

Is it because in the subscript i can have a unsigned integer and
these start from 0

Thanks
 
A

Allan Bruce

kapilk said:
Sir,

I know that the array index starts in C from 0 and not 1 can any
body pls. tell me the reason.

Is it because in the subscript i can have a unsigned integer and
these start from 0

Thanks

I think it is due to the way that the compilers work. If you have an array
of sometype then the way to access these uses the notation

addressOfStartOfArray + (index * sizeof(sometype))

if the accesses were from 1, then this would add extra computation and
therefore be slower. Also, almost every programming language adopts 0 as
the initial index.

Allan
 
M

Marco Parrone

kapilk said:
Sir,

I know that the array index starts in C from 0 and not 1 can any
body pls. tell me the reason.

Is it because in the subscript i can have a unsigned integer and
these start from 0

IMHO no, it was an arbitrary decision, it just made sense that way.

You can think of the index like a value to add to the basic pointer.

#include <stdio.h>

int main (int argc, char *argv[])
{
/* here test is a pointer to the first of these 5 characters.
the five characters are consecutive in memory */
char test [5] = {'t', 'e', 's', 't', '\n'};
printf ("%c == %c\n", test [0], * (test + 0));
printf ("%c == %c\n", test [1], * (test + 1)); /* adding 1 you point
to the next character */
printf ("%c == %c\n", test [2], * (test + 2));
printf ("%c == %c\n", test [3], * (test + 3));
return 0;
}
 
D

Does It Matter

Sir,

I know that the array index starts in C from 0 and not 1 can any
body pls. tell me the reason.

Is it because in the subscript i can have a unsigned integer and
these start from 0

I would suspect it has something to do with the fact that C language is a
language designed to work closely with the hardware architecture and most
assembly languages that has an indexed addressing mode start at zero.

On the other hand, C language originated on a PDP-11. The PDP-11 assembly
language just uses a fixed source and destination for things like
assignment (MOV), addition (ADD), subtraction (SUB) and comparison (CMP).
In other words, there is not address+offset mode like the C68000 or more
modern processors.

If you believe this is why C starts at zero you'll have to ask the
question, why does assembly language start at zero? But you'll have to ask
it in an assembly language newsgroup.
 
T

Thomas Dickey

Does It Matter said:
On the other hand, C language originated on a PDP-11. The PDP-11 assembly
language just uses a fixed source and destination for things like
assignment (MOV), addition (ADD), subtraction (SUB) and comparison (CMP).
In other words, there is not address+offset mode like the C68000 or more
modern processors.

That's incorrect (the PDP-11 has 8 addressing modes - including offsets
from a register value).
 
T

Thomas Dickey

Does It Matter said:
If you believe this is why C starts at zero you'll have to ask the
question, why does assembly language start at zero? But you'll have to ask
it in an assembly language newsgroup.

....or Pascal, or other languages that don't date from 1959.
 
D

Default User

kapilk said:
Sir,

I know that the array index starts in C from 0 and not 1 can any
body pls. tell me the reason.

Is it because in the subscript i can have a unsigned integer and
these start from 0


Probably because the array indexing operator is really syntactic sugar
for pointer operations.


ptr == *(ptr + i);


Obviously, when using pointer arithmetic, the first element is at ptr +
0, so the first element when using [] to access it is ptr[0].




Brian Rodenborn
 
G

Gordon Burditt

I know that the array index starts in C from 0 and not 1 can any
body pls. tell me the reason.

Is it because in the subscript i can have a unsigned integer and
these start from 0

My answer to this is that C starting from zero is likely to be influenced
by a lot of *MATHEMATICS* starting from zero.

Also, it is more likely that loading or storing an element of an array can
be accomplished with a single machine instruction if you don't have to
deal with the offset of 1.
Gordon L. Burditt
 
T

Thomas Stegen

kapilk said:
Sir,

I know that the array index starts in C from 0 and not 1 can any
body pls. tell me the reason.

Is it because in the subscript i can have a unsigned integer and
these start from 0

Maybe because the index value is reallythe offset from the start
of the array...

One never knows though.
 
L

Lew Pitcher

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Thomas said:
Maybe because the index value is reallythe offset from the start
of the array...

Bingo!

"Rather more surprising, at least at first sight, is the fact that a reference
to a can also be written as *(a+i). In evaluating a, C converts it to
*(a+i) immediately; the two forms are completely equivalent. Applying the
operator & to both parts of this equivalence, it follows that &a and a+i are
identical: a+i is the address of the i-th element beyond a." (from Section 5.3
of "The C Programming Language" by Brian W. Kernighan and Dennis M. Ritchie, (c)
1978)

So, the genesis of C has a+i being the same as a. If a is an array, then
&a[1] is the same as a+1, and thus a+0 must be the same as &a[0]. This makes
arrays zero based.


This is not to say that the C standard retains this bias. Simply that it came
from the fact that the index value of an array was really the offset of the
specific item from the start of the array.

- --
Lew Pitcher
IT Consultant, Enterprise Application Architecture,
Enterprise Technology Solutions, TD Bank Financial Group

(Opinions expressed are my own, not my employers')
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)

iD8DBQFBIQ6FagVFX4UWr64RAu9NAKD0AjpIVqgsBerdAA3Rt355FnHdjACfTyUG
293Wn2tpoVhKs4IHcx2PwIY=
=ck5V
-----END PGP SIGNATURE-----
 
E

E. Robert Tisdale

Lew said:
No.
Maybe because the index value is really
the offset from the start of the array...

Bingo!

"Rather more surprising, at least at first sight,
is the fact that a reference to a can also be written as *(a+i).
In evaluating a, C converts it to *(a+i) immediately;
the two forms are completely equivalent.
Applying the operator & to both parts of this equivalence,
it follows that &a and a+i are identical:
a+i is the address of the i-th element beyond a."
(from Section 5.3 of "The C Programming Language"
by Brian W. Kernighan and Dennis M. Ritchie, (c) 1978)

So, the genesis of C has a+i being the same as a.
If a is an array, then &a[1] is the same as a+1,
and thus a+0 must be the same as &a[0]. This makes arrays zero based.

This is not to say that the C standard retains this bias.
Simply that it came from the fact that the index value of an array
was really the offset of the specific item from the start of the array.


You forgot to answer, "Why?"

In order to reference element a,
the computer must first calculate its address.
If you use a [one-based] index,
the compiler would be obliged to calculate

(a + i - 1)

Today, good optimizing C compilers
would eliminate the superfluous subtraction
but, when K & R were designing C,
compilers usually didn't have the resources
(fast processors and large memories)
required to perform such optimizations.
 
M

Martin Ambuhl

Does It Matter wrote:

On the other hand, C language originated on a PDP-11. The PDP-11 assembly
language just uses a fixed source and destination for things like
assignment (MOV), addition (ADD), subtraction (SUB) and comparison (CMP).
In other words, there is not address+offset mode like the C68000 or more
modern processors.

This is just silly. Please check the eight addressing modes in the
PDP-11 before posting more (just barely topical) "information."
 
J

Joe Wright

kapilk said:
Sir,

I know that the array index starts in C from 0 and not 1 can any
body pls. tell me the reason.

Is it because in the subscript i can have a unsigned integer and
these start from 0

Thanks

Because I like it that way! But really, it's hard to say.

IBM was the first major OEM disk drive maker. IBM numbers tracks
from 0 and sectors from 1. Why? Seagate, Western Digital, Maxtor,
etc. do the same. Why?

Bytes in a record are numbered from 0 while columns on a punch card
number from 1. Go figure.
 
D

Dan Pop

In said:
I know that the array index starts in C from 0 and not 1 can any
body pls. tell me the reason.

Because the language designers decided to make array an alternate
notation for *(array + i). They could have chosen to make array
an alternate notation for *(array + i - 1), in which case array
indices would have been 1-based, but they didn't.

I don't know if this is an original C feature or merely inherited from
one of its predecessors (CPL, BCPL, B).

To someone with a solid assembly background, 0-based indexing appears as
the most natural option, because this is how indexed addressing modes
work on most processors supporting them. And the processor for which
C was originally designed was no exception.

Dan
 
D

Dan Pop

In said:
if the accesses were from 1, then this would add extra computation and
therefore be slower. Also, almost every programming language adopts 0 as
the initial index.

The most popular languages at the time C was designed used 1-based
indexing: FORTRAN, BASIC, Pascal.

Dan
 
R

Richard Tobin

Allan Bruce said:
I think it is due to the way that the compilers work. If you have an array
of sometype then the way to access these uses the notation

addressOfStartOfArray + (index * sizeof(sometype))

if the accesses were from 1, then this would add extra computation and
therefore be slower.

Only if the compilers were particularly stupid.

Real compilers would just produce

(addressOfStartOfArray - sizeof(sometype)) + (index * sizeof(sometype))

where the first parenthesized expression is known at compile time.

C arrays start at zero because it's The Right Thing to do.

-- Richard
 
B

boa

Richard said:
Only if the compilers were particularly stupid.

Real compilers would just produce

(addressOfStartOfArray - sizeof(sometype)) + (index * sizeof(sometype))

where the first parenthesized expression is known at compile time.

Always? Even when the "array" is a pointer to dynamically allocated memory?
C arrays start at zero because it's The Right Thing to do.

Agreed. ;-)

boa@home
 
R

Richard Tobin

Real compilers would just produce

(addressOfStartOfArray - sizeof(sometype)) + (index * sizeof(sometype))

where the first parenthesized expression is known at compile time.
[/QUOTE]
Always? Even when the "array" is a pointer to dynamically allocated memory?

True, I was assuming addressOfStartOfArray was supposed to be a constant.

But in many common cases, other optimizations will remove the
overhead. For example, when looping over the array, the index can be
adjusted instead of the base.

-- Richard
 
K

Keith Thompson

In <[email protected]> "Allan Bruce"


The most popular languages at the time C was designed used 1-based
indexing: FORTRAN, BASIC, Pascal.

Quibble: Pascal allows arrays to be based however the user specifies.
For example (if I remember the syntax correctly):

type
My_Array = array[37 .. 42] of Integer;
 
K

kal

True, I was assuming addressOfStartOfArray was supposed to be a constant.

But in many common cases, other optimizations will remove the
overhead. For example, when looping over the array, the index can be
adjusted instead of the base.

Not a satisfactory explanation. Your earlier statement was incorrect.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,902
Latest member
Elena68X5

Latest Threads

Top