Array indexing

jacob navia · Nov 17, 2006

Suppose an implementation where
sizeof int == 4
sizeof void * == 8
sizeof long long == 8

When indexing an array

array[integer expression]

this would mean that arrays are limited to 2GB. To overcome this,
is an implementation allowed to cast array indexes to long long?

Thanks in advance for your attention.

References:
6.5.2.1 Array subscripting
Constraints
1 One of the expressions shall have type ‘‘pointer to object type’’, the
other expression shall have integer type, and the result has type
‘‘type’’.

jacob

Walter Roberson · Nov 17, 2006

Suppose an implementation where
sizeof int == 4
sizeof void * == 8
sizeof long long == 8

When indexing an array

array[integer expression]

this would mean that arrays are limited to 2GB.

No it wouldn't.

To overcome this,
is an implementation allowed to cast array indexes to long long?

No need for the implementation to cast them.

References:
6.5.2.1 Array subscripting
Constraints
1 One of the expressions shall have type ‘‘pointer to object type’’, the
other expression shall have integer type, and the result has type
‘‘type’’.

Notice that's "the other expression shall have integer type", not
"the other expression shall have int type". Any integer type is allowed.
The expression is equivilent to pointer arithmetic followed by a
dereference, and any integral type can be added to a pointer,
as long as the result points within the same object.

jacob navia · Nov 17, 2006

Walter Roberson a écrit :

jacob navia said:
jacob navia said:

Suppose an implementation where
sizeof int == 4
sizeof void * == 8
sizeof long long == 8

When indexing an array

array[integer expression]

this would mean that arrays are limited to 2GB.

Click to expand...

No it wouldn't.

To overcome this,
is an implementation allowed to cast array indexes to long long?

Click to expand...

No need for the implementation to cast them.

References:
6.5.2.1 Array subscripting
Constraints
1 One of the expressions shall have type ‘‘pointer to object type’’, the
other expression shall have integer type, and the result has type
‘‘type’’.

Click to expand...

Notice that's "the other expression shall have integer type", not
"the other expression shall have int type". Any integer type is allowed.
The expression is equivilent to pointer arithmetic followed by a
dereference, and any integral type can be added to a pointer,
as long as the result points within the same object.

RIGHT!!!

Damm, I just didn't see it.

Thanks a lot for your quick answer.

Walter Roberson · Nov 17, 2006

Minor correction about array indexing:

The expression is equivilent to pointer arithmetic followed by a
dereference, and any integral type can be added to a pointer,
as long as the result points within the same object.

The dereference would not necessarily be there; the dereference
would be inferred in contexts in which a value was required.
In particular, not in the context of & or sizeof, or of the array
indexing appearing as an lvalue.

Chris Torek · Nov 17, 2006

jacob navia said:
jacob navia said:

array[integer expression] ...
To overcome this,
is an implementation allowed to cast array indexes to long long?

Click to expand...

(This, of course, is why the "best" type for array indexing is
generally size_t, or a signed variant of size_t: size_t can [and
should] be "unsigned long long" on implementations with memory
sizes exceeding UINT_MAX.)

No need for the implementation to cast them.

Moreover, if an implementation does an internal conversion of whatever
integral type is being added to a pointer value, so that:

ptr + i

and:

*(ptr + i) /* aka ptr */

convert the value in "i" to (signed or unsigned) "long long" before
doing the addition, this is still an "internal conversion", *not* a
"cast". A cast is the syntactic construct in which a type-name
enclosed in parentheses is used to force an explicit conversion.

(Note that most implementations do indeed convert "i", if it is
plain, signed, or unsigned char; plain or signed short; or unsigned
short. That is:

unsigned char i;
...
use(ptr);

has the same effect as a version with a cast:

use(ptr[(int)i]);

It is tempting to call internal conversions "implicit casts", but
ever since the original ANSI C standard -- which defined "cast" as
the explicit conversion, and nothing else -- came out, this temptation
should be resisted.)

Frederick Gotham · Nov 17, 2006

jacob navia:

Suppose an implementation where
sizeof int == 4
sizeof void * == 8
sizeof long long == 8

When indexing an array

array[integer expression]

this would mean that arrays are limited to 2GB.

No, for two reasons:

(1) CHAR_BIT is 797525.
(2) The implementation may provide an integer type bigger than "long long"
(or at least I think so).

SM Ryan · Nov 17, 2006

# Suppose an implementation where
# sizeof int == 4
# sizeof void * == 8
# sizeof long long == 8
#
# When indexing an array
#
# array[integer expression]
#
# this would mean that arrays are limited to 2GB. To overcome this,
# is an implementation allowed to cast array indexes to long long?

It's not worth having more than a 32 bit index if you only have
a 32 bit address. And while 64 bit addresses date back to the Star 100,
they are becoming widely available in the last few years, openning
a whole new world of debugging.

May you live in interesting times.

(Once the 64-bit barrier is crossed, then we get to deal with the
Y2K38 bug.)

Malcolm · Nov 17, 2006

jacob navia said:
Suppose an implementation where
sizeof int == 4
sizeof void * == 8
sizeof long long == 8

When indexing an array

array[integer expression]

this would mean that arrays are limited to 2GB. To overcome this,
is an implementation allowed to cast array indexes to long long?

Thanks in advance for your attention.

References:
6.5.2.1 Array subscripting
Constraints
1 One of the expressions shall have type ‘‘pointer to object type’’, the
other expression shall have integer type, and the result has type
‘‘type’’.

An int should be a natural register type.
So you've got to ask how the implementation can address more memory than
will fit in a register, ignoring the signed /unsigned issue.

Keith Thompson · Nov 17, 2006

Malcolm said:
jacob navia said:

Suppose an implementation where
sizeof int == 4
sizeof void * == 8
sizeof long long == 8

Click to expand...

[...]
An int should be a natural register type.
So you've got to ask how the implementation can address more memory than
will fit in a register, ignoring the signed /unsigned issue.

The implementer has to think about that, but a programmer doesn't.

On many systems, int and size_t happen to be the same size, so the
question doesn't even come up. But if size_t is bigger than int, then
presumably the implementation will do whatever is necessary to make
that work. As a programmer, though I might be curious, I don't need
to know how it's done, merely that it works.

(On a number of systems I use every day, int is 32 bits, but pointers
are 64 bits.)

Malcolm · Nov 18, 2006

Keith Thompson said:
Malcolm said:

jacob navia said:

Suppose an implementation where
sizeof int == 4
sizeof void * == 8
sizeof long long == 8

Click to expand...

[...]
An int should be a natural register type.
So you've got to ask how the implementation can address more memory than
will fit in a register, ignoring the signed /unsigned issue.

Click to expand...

The implementer has to think about that, but a programmer doesn't.

On many systems, int and size_t happen to be the same size, so the
question doesn't even come up. But if size_t is bigger than int, then
presumably the implementation will do whatever is necessary to make
that work. As a programmer, though I might be curious, I don't need
to know how it's done, merely that it works.

I'm programming MPI. The code is mainly in C but has to call Fortran
subroutines.
So size_t's are a big no-no. I'm not convinced the MPI will serialise them
correctly and portably, and they certainly are in danger of breaking if
passed to the Fortran routines.

My own view is that portability, reusability, readability etc is improved if
we reduce the number of types. Basically the programmer wants characters and
numbers. However you need to split numbers into integers and reals as a
concession to efficiency, and also because many type are naturally integral.
The C needs pointers for its internals. So that would be four types -
characters, integers, reals and pointers. char, int, double and the * types.
Finally you also need an 8 bit "byte" as a concession to efficient layout of
patterns in memory. So, sadly, my proposal has ended up adding a keyword
after all.

Malcolm · Nov 18, 2006

Chris Torek said:
array[integer expression] ...
To overcome this,
is an implementation allowed to cast array indexes to long long?

Click to expand...

Click to expand...

(This, of course, is why the "best" type for array indexing is
generally size_t, or a signed variant of size_t: size_t can [and
should] be "unsigned long long" on implementations with memory
sizes exceeding UINT_MAX.)

If there is a problem using an integer to index an array, in this case that
array sizes can go beyond the range, then really the implementation needs to
be fixed, or the standard.

There are lots of problems with size_t. It looks ugly, it is unsigned, it
doesn't interface with other languages, it is a misnomer when not used to
describe a size of memory, it adds another type to the pool and complicates
the language.

Frederick Gotham · Nov 18, 2006

Malcolm:

There are lots of problems with size_t. It looks ugly,

Subjective.

it is unsigned,

Which is briliant.

Keith Thompson · Nov 18, 2006

Malcolm said:
Chris Torek said:

To overcome this,
is an implementation allowed to cast array indexes to long long?

Click to expand...

(This, of course, is why the "best" type for array indexing is
generally size_t, or a signed variant of size_t: size_t can [and
should] be "unsigned long long" on implementations with memory
sizes exceeding UINT_MAX.)

Click to expand...

If there is a problem using an integer to index an array, in this case that
array sizes can go beyond the range, then really the implementation needs to
be fixed, or the standard.

You said "integer". Did you mean "int"?

There are lots of problems with size_t. It looks ugly, it is unsigned, it
doesn't interface with other languages, it is a misnomer when not used to
describe a size of memory, it adds another type to the pool and complicates
the language.

size_t is an alias for another type that exists anyway. I don't find
name ugly at all, and it's unsigned because it needs to be.

Reality isn't as simple as you seem to want C to be.

Malcolm · Nov 19, 2006

Keith Thompson said:
Malcolm said:

Chris Torek said:

To overcome this,
is an implementation allowed to cast array indexes to long long?

(This, of course, is why the "best" type for array indexing is
generally size_t, or a signed variant of size_t: size_t can [and
should] be "unsigned long long" on implementations with memory
sizes exceeding UINT_MAX.)

Click to expand...

If there is a problem using an integer to index an array, in this case
that
array sizes can go beyond the range, then really the implementation needs
to
be fixed, or the standard.

Click to expand...

You said "integer". Did you mean "int"?

Due to this size_t nonsense integers are no longer ints, which is the heart
of the problem.

Flash Gordon · Nov 19, 2006

Malcolm said:
Keith Thompson said:

Malcolm said:

To overcome this,
is an implementation allowed to cast array indexes to long long?
(This, of course, is why the "best" type for array indexing is
generally size_t, or a signed variant of size_t: size_t can [and
should] be "unsigned long long" on implementations with memory
sizes exceeding UINT_MAX.)

If there is a problem using an integer to index an array, in this case
that
array sizes can go beyond the range, then really the implementation needs
to
be fixed, or the standard.

Click to expand...

You said "integer". Did you mean "int"?

Click to expand...

Due to this size_t nonsense integers are no longer ints, which is the heart
of the problem.

Ever since any of char, short, long or unsigned was first added to the
language an integer has not been an int. I think you will find that
pre-dates size_t by just a little bit.

In any case, nothing stops you from using an int as an array index or
size parameter when that makes sense in your opinion. Unless, of course,
you need an object larger than an int can represent in which case the
fact that size_t is allowed to represent larger numbers (if it makes
sense on the implementation) means you might be able to do it instead of
being stuffed.

Not all implementations use the same size registers for addresses as for
integers so it makes sense to be able to use a large enough type for
address type operations, such as array indexing, without the
implementation having to make int too large to be sensible.

Keith Thompson · Nov 19, 2006

Malcolm said:
Keith Thompson said:

Malcolm said:

To overcome this,
is an implementation allowed to cast array indexes to long long?

(This, of course, is why the "best" type for array indexing is
generally size_t, or a signed variant of size_t: size_t can [and
should] be "unsigned long long" on implementations with memory
sizes exceeding UINT_MAX.)

If there is a problem using an integer to index an array, in this
case that array sizes can go beyond the range, then really the
implementation needs to be fixed, or the standard.

Click to expand...

You said "integer". Did you mean "int"?

Click to expand...

Due to this size_t nonsense integers are no longer ints, which is
the heart of the problem.

I don't see that it's a problem.

If you want a language in which "int" is the only integer type,
C is not that language, and I don't believe it ever has been.
The earliest C reference manual I can find, which predates K&R1,
has both char (8 bits) and int (16 bits).

Once again, the term "integer" refers to a number of distinct types:
(signed|unsigned|plain) char, (signed|unsigned) (short|int|long|long long),
and zero or more extended types. size_t is merely an alias for one of
them. That's just not going to change. If you dislike it, that's
certainly your right (though I'm at a loss to understand why you think
it's a problem), but it's how the language works.

Perhaps you'd prefer B or BCPL?

You're free to use a different language that meets your needs if you
can find one, or design and implement one, or get someone else to do
so, if you're technically and/or financially able. The result is
likely to be something that *I* wouldn't want to use, but that's
perfectly fine.

But if you're going to post here, I suggest that keeping the
distinction between "int" and "integer" clear is going to make
communication much easier.

Frederick Gotham · Nov 19, 2006

Keith Thompson:

Once again, the term "integer" refers to a number of distinct types:
(signed|unsigned|plain) char, (signed|unsigned) (short|int|long|long long),
and zero or more extended types. size_t is merely an alias for one of
them.

A conforming implementation can provide an extra integer type, right?
Something like:

__uint128 i = 5;

Can it then go on to make "size_t" an alias of it?

typedef __uint128 size_t;

If so, the existance of this extra integer type could be transparent to the
programmer, yielding a conformant C implementation. The important thing about
this, however, is that the following expression could be true:

(size_t)-1 > (long long unsigned)-1

Keith Thompson · Nov 19, 2006

Frederick Gotham said:
Keith Thompson:

A conforming implementation can provide an extra integer type, right?
Something like:

__uint128 i = 5;

Yes. I mentioned "zero or more extended types"; __uint128 could be
one of them. See C99 6.2.5.

Can it then go on to make "size_t" an alias of it?

typedef __uint128 size_t;

Yes, but I think there's a DR recommending (but not requiring) that
size_t be no wider than unsigned long. In C90, a portable way to
print a size_t value is:

size_t s;
printf("s = %lu\n", (unsigned long)s);

In C99, this works *unless* the implementation chooses to make size_t
wider than unsigned long. (C99 introduces a new format for size_t,
and another for intmax_t, but any code that uses them will not be
portable to C90 implementations; the point of the recommendation is to
make valid C90 code remain valid in C99 as much as possible.)

If so, the existance of this extra integer type could be transparent
to the programmer, yielding a conformant C implementation. The
important thing about this, however, is that the following
expression could be true:

(size_t)-1 > (long long unsigned)-1

Right, which means it's not entirely transparent.

But practically speaking, unsigned long long is required to be at
least 64 bits, which should be more than big enough for size_t for the
forseeable future. Even assuming a continuation of exponential growth
in memory sizes, 64-bit address spaces will last longer than 32-bit
address spaces did -- and we *might* run into some serious physical
limits before we hit 64 bits.

Note that there are two separate issues here. Keeping size_t no
bigger than unsigned long maintains compatibility with C90 -- but
unsigned long can be as small as 32 bits, and we have memory spaces
bigger than 4 gigabyts already. Keeping size_t bigger than unsigned
long long will be easy until we reach 16 exabytes. (Even where I
work, our biggest archives are just a few petabytes, and they're not
linearly addressible.)

Old Wolf · Nov 20, 2006

Malcolm said:
There are lots of problems with size_t. It looks ugly,
Introspection?

it is unsigned,

When was the last time you had a chunk of memory with
a negative size?

it doesn't interface with other languages,

C doesn't interface with other languages.

it is a misnomer when not used to describe a size of memory,

You should use it for describing a size of memory. Would you
also criticise 'char' because it is a misnomer when not used to
refer to a character?

it adds another type to the pool

Usually it is an alias for an existing type

and complicates the language.

How would you implement malloc without size_t ?

0.5 out of 6 for you (at best)

Malcolm · Nov 20, 2006

Old Wolf said:
Introspection?

No. It is ugly. And that isn't a trival issue. When code looks like line
noise it becomes unreadable, and then it gets hard to maintain, and you get
bugs.
As happened to Perl and C++. size-t is admittedly only a small step in that
direction, but its a step in the wrong direction.

>

When was the last time you had a chunk of memory with
a negative size?

When was the last time you had a negative amount of money in your pocket?
However you can have an negative amount in your account. Intermediate
calculations of the sizes of memory objects may give negative results.

C doesn't interface with other languages.

Yes it does. C frequently has to call non-C code, or be called by it.
Anything that makes that process easier is a good thing.

You should use it for describing a size of memory. Would you
also criticise 'char' because it is a misnomer when not used to
refer to a character?

Yes I would. The fact that char and byte are the same thing in C is a major
headache to anyone who has had to use a non-English language. However that
is somethign we are stuck with. size_t is a newcomer which can still, just,
be syuppressed if we are determined enough to squeeze it out of our nice C
code.

Usually it is an alias for an existing type

It is another type swilling about. The fact that it probaly has the same
number of bits as another type is neither here nor there.
Now Bloggs writes

setpixels(size_t *x, size_t *y, size_t N)

because size_t is the right thing for an index, right?

Muggins writes
int *xvals;
int *yvals;
int N;

Because his pixel co-ordiantes are integers.
Now he calls Blogg's setpixels() routine. Oh dear. The code becomes a mass
of castings and type conversions. It's the two standards problem. We need
one standard way of representign integers in the machine.

How would you implement malloc without size_t ?

Standard:
malloc() takes an numerical type as its argument which must be an integer.

Normal application

void *malloc(int N)
{
assert(N >= 0);
}

Because on normal machines there simply isn't enough memory to overflow the
range of a singed integer, so this isn't a problem.

squeezed compiler, maybe on an embedded machine

void *malloc( unsigned long long N)
{
}

weird compiler (why not do this?)

void *malloc(double N)
{
assert(N == floor(N));
}

0.5 out of 6 for you (at best)

Don't be arrogant. Maybe you don't have the intellectual capacity to
appreciate the strength of the point being made.

Dynamic indexing (multi-dimensional-indexing) (probably my most important/valuable posting up to thi	30	Jul 1, 2011
Setting array size with a variable - What does the C compiler do?	3	Feb 25, 2022
underlying implementation of array and pointer ?	6	Sep 8, 2011
Assigning an array to another array using C's assignment operator	0	Feb 1, 2013
Assigning an array to another array using C's assignment operator	13	Jan 31, 2013
Assigning an array to another array using C's assignment operator	0	Feb 1, 2013
Assigning an array to another array using C's assignment operator	1	Feb 1, 2013
An empty initializer is invalid for an array with unspecified bound	0	Jul 1, 2020

Array indexing

jacob navia

Walter Roberson

jacob navia

Walter Roberson

Chris Torek

Frederick Gotham

SM Ryan

Malcolm

Keith Thompson

Malcolm

Malcolm

Frederick Gotham

Keith Thompson

Malcolm

Flash Gordon

Keith Thompson

Frederick Gotham

Keith Thompson

Old Wolf

Malcolm

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads