type of array index?

S

shmartonak

For maximum portability what should the type of an array index be? Can
any integer type be used safely? Or should I only use an unsigned type?
Or what?

If I'm using pointers to access array elements as *(mptr+k) where I've
declared

MYTYPE *mptr;

what should be the type of 'k'? Should it be ptrdiff_t?

OT: near as I can tell my implementation of gcc doesn't have an
<stddef.h> file with ptrdiff_t defined. Am I overlooking something?

--
 
J

Jonathan Bartlett

what should be the type of 'k'? Should it be ptrdiff_t?

It should be any type of integer.

Jon
 
J

Jonathan Bartlett

If you really want ptrdiff_t, it appears to be in malloc.h. stdint.h
has its limits. Interestingly, obstack.h thinks that it appears in
stddef.h, too. Perhaps someone should report that to the maintainers.
Using gcc 3.3.1.

Jon
 
M

Michael Mair

For maximum portability what should the type of an array index be? Can
any integer type be used safely? Or should I only use an unsigned type?
Or what?

Essentially, there are two choices:
1. Use int whenever sufficient, use a wider type when necessary.
Sufficient means sufficient with respect to the minimal requirements
from the standard.
2. Always use size_t.
size_t is sufficient for all arrays and for indexing all storage you
can allocate (bytewise).

Personally, I favor the second approach but I am not religious
about it. Drawbacks: You have to be more careful due to the inherent
nonnegativity of the index type, e.g.
for (i=MAX-1; i>=0; i--)
operation_on(array);
either has to become
for (i=MAX; i>0; i--)
operation_on(array[i-1]);
or
for (i=MAX-1; i!=-1; i--)
operation_on(array);
which I like better.
As for int: It is possible that int indices are "faster" as int usually
is the "natural" integer type of the system but if this is really
the case, one can still optimise where necessary. Undefined behaviour
due to forgetting about "old-fashioned" 16 bit ints and resulting
strange errors are less nice.

However, there have been many discussions about that.
Do as you like and look for range/sanity checks as necessary (in both
cases).

If I'm using pointers to access array elements as *(mptr+k) where I've
declared

MYTYPE *mptr;

what should be the type of 'k'? Should it be ptrdiff_t?

Any integer type which suffices for your range requirements.
I would use ptrdiff_t only where appropriate.

If ssize_t had made it into the standard, I'd suggest that.

OT: near as I can tell my implementation of gcc doesn't have an
<stddef.h> file with ptrdiff_t defined. Am I overlooking something?

That you cannot find it in the header file does not mean that
it is not there by some magic (or simply comes in from another
included file).

Just try it out:
If

#include <stddef.h>
int main (void) { int a[3]; ptrdiff_t b = (a+2)-(a+1); return 0; }

does not compile, file a bug report.
If it does compile without the #include, do the same.


Cheers
Michael
 
M

Michael Mair

Jonathan said:
If you really want ptrdiff_t, it appears to be in malloc.h. stdint.h

malloc.h is not a C standard header, thus offtopic.
has its limits. Interestingly, obstack.h thinks that it appears in

dito for obstack.h
stddef.h, too. Perhaps someone should report that to the maintainers.
Using gcc 3.3.1.

If including <stddef.h> has the effect that size_t or ptrdiff_t
are available when they were not before, everything is fine:
The implementation may do each and everything behind the scenes.

I did not yet have any problems with gcc's compliance in _this_
respect.


Cheers
Michael
 
A

Andrey Tarasevich

For maximum portability what should the type of an array index be? Can
any integer type be used safely? Or should I only use an unsigned type?
Or what?

If I'm using pointers to access array elements as *(mptr+k) where I've
declared

MYTYPE *mptr;

what should be the type of 'k'? Should it be ptrdiff_t?

It depends.

In concrete (non-generic) code the type is usually dictated by the
natural properties of the application area. You should normally have a
type that designates the total amount of objects of 'MYTYPE' already.
This type is an obvious candidate for the array index type. For example,
if this is an array of, say, file handles and you use 'unsigned short'
object to store the total number of files, then 'unsigned short' would
be a natural choice for the index type for this array. This, of course,
applies only if you don't care about negative indexing. For negative
indexing you'd have to use either 'short' or 'int', depending on the
required range.

In generic code 'ptrdiff_t' is the first candidate for array index type,
which also supports negative indices. If you don't care about negative
indices or you want to emphasize the fact that negative indices are not
allowed in some context, then you might want to go with 'size_t'. This
unsigned type is large enough for indexing of any array.

However, from the pedantic point of view, 'size_t' is intended to
implement a concept of "object size", while array index is more related
to the concept of "container element count". These two concepts are not
related and using 'size_t' for array indexing is a conceptual error. It
might be more elegant to "hide" the 'size_t' behind a typedef name as
follows

typedef size_t pos_ptrdiff_t;

and use 'pos_ptrdiff_t' for generic non-negative array indexing.
 
M

Michael Mair

Andrey said:
It depends.

In concrete (non-generic) code the type is usually dictated by the
natural properties of the application area. You should normally have a
type that designates the total amount of objects of 'MYTYPE' already.
This type is an obvious candidate for the array index type. For example,
if this is an array of, say, file handles and you use 'unsigned short'
object to store the total number of files, then 'unsigned short' would
be a natural choice for the index type for this array. This, of course,
applies only if you don't care about negative indexing. For negative
indexing you'd have to use either 'short' or 'int', depending on the
required range.

In generic code 'ptrdiff_t' is the first candidate for array index type,
which also supports negative indices. If you don't care about negative
indices or you want to emphasize the fact that negative indices are not
allowed in some context, then you might want to go with 'size_t'. This
unsigned type is large enough for indexing of any array.

However, from the pedantic point of view, 'size_t' is intended to
implement a concept of "object size", while array index is more related
to the concept of "container element count". These two concepts are not
related and using 'size_t' for array indexing is a conceptual error. It
might be more elegant to "hide" the 'size_t' behind a typedef name as
follows

typedef size_t pos_ptrdiff_t;

and use 'pos_ptrdiff_t' for generic non-negative array indexing.

Nicely put.
Something for a confessing size_t-user like me to think about :)
Grabbed and stored.

Cheers
Michael
 
K

Keith Thompson

Jonathan Bartlett said:
If you really want ptrdiff_t, it appears to be in malloc.h. stdint.h
has its limits. Interestingly, obstack.h thinks that it appears in
stddef.h, too. Perhaps someone should report that to the
maintainers. Using gcc 3.3.1.

ptrdiff_t is declared in <stddef.h>. <malloc.h> is not a standard
header (the malloc() function is declared in <stdlib.h>), nor is
<obstack.h>. And <stdint.h> is a standard header in C99, but not in
C90, so not all current implementations will provide it (though it's
not too difficult to roll your own).
 
T

those who know me have no need of my name

in comp.lang.c i read:
OT: near as I can tell my implementation of gcc doesn't have an
<stddef.h> file with ptrdiff_t defined. Am I overlooking something?

there is no requirement by the standard that a file exist -- it's allowed
to be magical, perhaps entirely internal. though in fact it does exist,
it's just that it's not where you are looking.
 
S

Stephen Sprunk

Andrey Tarasevich said:
It depends. ....
In generic code 'ptrdiff_t' is the first candidate for array index type,
which also supports negative indices. If you don't care about negative
indices or you want to emphasize the fact that negative indices are not
allowed in some context, then you might want to go with 'size_t'. This
unsigned type is large enough for indexing of any array.

However, from the pedantic point of view, 'size_t' is intended to
implement a concept of "object size", while array index is more related
to the concept of "container element count". These two concepts are not
related and using 'size_t' for array indexing is a conceptual error.

Since an array is an object, and size_t is defined to be large enough to
represent the size of any object, size_t naturally must be able to represent
any possible array subscript.

In _The Standard C Library_, P. J. Plauger writes:

"Unlike ptrdiff_t, however, size_t is /very/ useful. It is the safest type
to represent any integer data object you use as an array subscript. You
don't have to worry if a small array evolves into a very large one as the
program changes. Subscript arithmetic will never overflow when performed in
type size_t." and "You should make a point of using type size_t /anywhere/
your program performs array subscripting or address arithmetic." (emphasis
in original)

The only reason one might prefer ptrdiff_t for array subscripts is where,
due to looping, you need the ability to count down past zero. In this case,
I would prefer ssize_t (where available) over ptrdiff_t, but first I'd
consider whether it was feasible to restructure the loop condition such that
a signed subscript wasn't needed.

S
 
A

Andrey Tarasevich

Stephen said:
Since an array is an object, and size_t is defined to be large enough to
represent the size of any object, size_t naturally must be able to represent
any possible array subscript.

That's the very reason why I suggest using 'size_t' for array indexing
in generic context. For example, in a generic array-support library.

At application level there's rarely a need to have an array just for the
sake of having an array. What is normally needed at application level is
a container with index-based random-access interface. This could be a
"traditional" array, this could be an associative array, this could be
something like a deque. Today it can be one, tomorrow - another. At this
conceptual level there's no relation between container element count and
object size. The fact that this relation holds for a "traditional" array
is noting more than an accident, a low level detail, which has
absolutely no reason to play any role in the process of choosing the
index type.
In _The Standard C Library_, P. J. Plauger writes:

"Unlike ptrdiff_t, however, size_t is /very/ useful. It is the safest type
to represent any integer data object you use as an array subscript. You
don't have to worry if a small array evolves into a very large one as the
program changes. Subscript arithmetic will never overflow when performed in
type size_t." and "You should make a point of using type size_t /anywhere/
your program performs array subscripting or address arithmetic." (emphasis
in original)

Great, but applies mostly to library-level (generic) code.

If in my program a have a dedicated typename for designating the day of
the week, say

typedef enum DayOfTheWeek { /* ... */ } DayOfTheWeek;

and in some place I need to iterate through an array indexed by the day
of the week, then I'd make a point of using 'DayOfTheWeek' to represent
the index, never the completely irrelevant 'size_t'.
The only reason one might prefer ptrdiff_t for array subscripts is where,
due to looping, you need the ability to count down past zero. In this case,
I would prefer ssize_t (where available) over ptrdiff_t, but first I'd
consider whether it was feasible to restructure the loop condition such that
a signed subscript wasn't needed.

I would do the latter.
 
T

Tor Rustad

Any integer type is allowed. For example main() uses argc:

int main(int argc, char* argv[])

If I'm using pointers to access array elements as *(mptr+k) where
I've declared
[...]

However, from the pedantic point of view, 'size_t' is intended to
implement a concept of "object size", while array index is more
related to the concept of "container element count". These two
concepts are not related and using 'size_t' for array indexing
is a conceptual error.

Not correct, an array is an object and

sizeof array

return the size of that object. Hence, size_t is a good type choice
for an array index . Note that number of array elements

size_t N = sizeof (array) / sizeof (array[0]);

will never overflow with size_t.
 
A

Andrey Tarasevich

Tor said:
...
However, from the pedantic point of view, 'size_t' is intended to
implement a concept of "object size", while array index is more
related to the concept of "container element count". These two
concepts are not related and using 'size_t' for array indexing
is a conceptual error.

Not correct, an array is an object and

sizeof array

return the size of that object. Hence, size_t is a good type choice
for an array index . Note that number of array elements

size_t N = sizeof (array) / sizeof (array[0]);

will never overflow with size_t.
...

And? This is exactly what I said in my message before the quoted part.
But how is this relevant within the context of paragraph quoted above?

Once again, the concept of "object size" is not related to the concept
of "container element count". The connection between the two for
built-in C arrays is purely parasitic. It is a mere coincidence, which
makes absolutely no difference at conceptual level.
 
S

Stephen Sprunk

Andrey Tarasevich said:
That's the very reason why I suggest using 'size_t' for array indexing
in generic context. For example, in a generic array-support library.

At application level there's rarely a need to have an array just for
the sake of having an array.

Right, but if you get in the habit of writing code where the index is an
int, sooner or later you'll write code that (after it's been edited by a
dozen other coders) overflows that int but size_t would have worked.

Sure, size_t may be less efficient on some systems, but IMHO rarely enough
to offset the possibility of bugs in the future.
Great, but applies mostly to library-level (generic) code.

IMHO, it applies to nearly all application code as well.
If in my program a have a dedicated typename for designating
the day of the week, say
...

In that case, yes, I might declare the index as an int or even short, but
clear-cut cases like that are exceptions.

In most cases, the original coder has no idea what his code will eventually
evolve into, how other components of the same system will abuse it, or
whether the index will overflow ten years later and crash some
mission-critical system. Why take the chance unless you can prove it's
affecting performance?

S
 
K

Keith Thompson

Andrey Tarasevich said:
Once again, the concept of "object size" is not related to the concept
of "container element count". The connection between the two for
built-in C arrays is purely parasitic. It is a mere coincidence, which
makes absolutely no difference at conceptual level.

Object size and container element count are the same for character
arrays, and since an array element can't be smaller than one byte,
you're guaranteed that size_t is at least big enough to hold any array
index.

If you're writing code that deals with arrays generically (such as the
standard qsort() function), it's probably safest to use size_t. If
you're using something more specific to a given problem domain, and
you happen to know that an array will never have more than INT_MAX
elements, you can use int.
 
A

Andrey Tarasevich

Keith said:
[...]
Once again, the concept of "object size" is not related to the concept
of "container element count". The connection between the two for
built-in C arrays is purely parasitic. It is a mere coincidence, which
makes absolutely no difference at conceptual level.

Object size and container element count are the same for character
arrays, and since an array element can't be smaller than one byte,
you're guaranteed that size_t is at least big enough to hold any array
index.

That's pretty much what I said in the message quoted by the previous poster.
If you're writing code that deals with arrays generically (such as the
standard qsort() function), it's probably safest to use size_t.

Also, exactly what I said in that message.
If
you're using something more specific to a given problem domain, and
you happen to know that an array will never have more than INT_MAX
elements, you can use int.

Same here.
 
K

Keith Thompson

Andrey Tarasevich said:
Keith said:
[...]
Once again, the concept of "object size" is not related to the concept
of "container element count". The connection between the two for
built-in C arrays is purely parasitic. It is a mere coincidence, which
makes absolutely no difference at conceptual level.

Object size and container element count are the same for character
arrays, and since an array element can't be smaller than one byte,
you're guaranteed that size_t is at least big enough to hold any array
index.

That's pretty much what I said in the message quoted by the previous poster.

You said that "object size" and "container element count" are not
related. I showed how they are related. They're not identical, of
course, but the nature of their relationship is such that it's
(almost) always safe to use size_t as an array index.

(If you use a pointer into the middle of an array object to provide
negative indices, then size_t won't work, but that's fairly unusual.
If you have a data structure that implements bit arrays, size_t may
not be big enough, but such a data structure can't be implemented with
array syntax in standard C.)
 
A

Andrey Tarasevich

Keith said:
[...]
Once again, the concept of "object size" is not related to the concept
of "container element count". The connection between the two for
built-in C arrays is purely parasitic. It is a mere coincidence, which
makes absolutely no difference at conceptual level.

Object size and container element count are the same for character
arrays, and since an array element can't be smaller than one byte,
you're guaranteed that size_t is at least big enough to hold any array
index.

That's pretty much what I said in the message quoted by the previous poster.

You said that "object size" and "container element count" are not
related.

I said that they are not related at conceptual level. These are two
different concepts.
I showed how they are related.

Not as concepts.
They're not identical, of
course, but the nature of their relationship is such that it's
(almost) always safe to use size_t as an array index.

As an array index - of course. As an index into a generic container -
no. As I said before, is most cases application does not specifically
need an array. It just needs a container with similar interface. In
situations like this an array can be easily "unplugged" from the client
code and another container can be "plugged-in" instead (assuming that
the interface specifications match). Hardcoding an array-specific index
type in this case will only introduce another unnecessary coupling
between the client code and the concrete type of the container.
 
A

Andrey Tarasevich

Stephen said:
Right, but if you get in the habit of writing code where the index is an
int, sooner or later you'll write code that (after it's been edited by a
dozen other coders) overflows that int but size_t would have worked.

Well, just as easily you can get into a situation where 'size_t'
overflows and a dedicated application-specific type works. One day one
of those other coders decides to replace the built-in array with a
custom implementation of some "disjoint" array, like 'deque', for
example. Size of that container is no longer limited by the range of
'size_t'.
Sure, size_t may be less efficient on some systems, but IMHO rarely enough
to offset the possibility of bugs in the future.


IMHO, it applies to nearly all application code as well.


In that case, yes, I might declare the index as an int or even short, but
clear-cut cases like that are exceptions.

In my opinion, a closer look at most practical cases usually reveals
that in essence they are indeed such clear-cut cases.
In most cases, the original coder has no idea what his code will eventually
evolve into, how other components of the same system will abuse it, or
whether the index will overflow ten years later and crash some
mission-critical system. Why take the chance unless you can prove it's
affecting performance?

That's exactly what I'm talking about. The code can easily evolve from
using an array to using a different type of container, with element
count is no longer restrained by the range of 'size_t'. In situations
like this one has to look through the entire code and carefully replace
'size_t' with some other type. This usually involves a lot of chance-taking.
 
S

Stephen Sprunk

Andrey Tarasevich said:
Well, just as easily you can get into a situation where 'size_t'
overflows and a dedicated application-specific type works.

Per the standard, size_t CANNOT overflow when used as an array index.
One day one of those other coders decides to replace the built-in
array with a custom implementation of some "disjoint" array, like
'deque', for example. Size of that container is no longer limited by
the range of 'size_t'.

A 'deque' is not an array and therefore any problem indexing them with
size_t is moot.

[ Rest of post snipped because it doesn't apply to arrays. ]

The OP specifically asked about the type to use for an array index.
Hijacking the thread to generalize to other non-standard container types is
irrelevant at best.

S
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,227
Latest member
Daniella65

Latest Threads

Top