array subscripting

A

Andrey Tarasevich

buda said:
Hi,
I've been wondering for a while now (and always forgot to ask :) what is the
exact quote from the Standard that forbids the use of (&array[0][0])[x]
(when x >= number_of_columns) as stated in the FAQ 6.19
(http://www.eskimo.com/~scs/C-faq/q6.19.html).
...

See 6.5.6/8.

'&array[0][0]' is a pointer to the first element of the array
'array[0]'. This array has NCOLUMNS elements in it.

'(&array[0][0])[x]' is equivalent to '*(&array[0][0] + x)'. According to
the rules of pointer arithmetics (6.5.6/8), the expression '&array[0][0]
+ x' is only allowed to form pointers to elements of array 'array[0]'
and one imaginary element immediately after the last one. Otherwise, the
behavior is undefined. I.e. in this case negative values of 'x' and
values that are greater than NCOLUMNS lead to UB.
 
B

buda

Andrey Tarasevich said:
See 6.5.6/8.

'&array[0][0]' is a pointer to the first element of the array
'array[0]'. This array has NCOLUMNS elements in it.

'(&array[0][0])[x]' is equivalent to '*(&array[0][0] + x)'. According to
the rules of pointer arithmetics (6.5.6/8), the expression '&array[0][0]
+ x' is only allowed to form pointers to elements of array 'array[0]'
and one imaginary element immediately after the last one. Otherwise, the
behavior is undefined. I.e. in this case negative values of 'x' and
values that are greater than NCOLUMNS lead to UB.

--
Best regards,
Andrey Tarasevich

P.S. BTW, '(&array[0][0])[x]' is equivalent to 'array[0][x]'.
OK, this is perfectly clear and logical to me, but I still see this kind of
code somewhat often (even the FAQ uses it, and only mentions as a side note
that it is not strictly conforming). The use being passing a "flattened"
array to a function and then doing manual subscripting like a[i*NCOL+j]
where a is the "flattened array" passed to the function like
f(&array[0][0]). AFAIK, the Standard requires that multidimensional arrays
are stored in row major order (last subscript varies fastest), so people
expect this to work. Doesn't the above mean that multidimensional arrays are
stored (are required to be stored by the Standard) in one continuous block
of memory like [[row0][row1]...[rowN-1]] or is that my misconception ?
 
M

Michael Mair

buda said:
See 6.5.6/8.

'&array[0][0]' is a pointer to the first element of the array
'array[0]'. This array has NCOLUMNS elements in it.

'(&array[0][0])[x]' is equivalent to '*(&array[0][0] + x)'. According to
the rules of pointer arithmetics (6.5.6/8), the expression '&array[0][0]
+ x' is only allowed to form pointers to elements of array 'array[0]'
and one imaginary element immediately after the last one. Otherwise, the
behavior is undefined. I.e. in this case negative values of 'x' and
values that are greater than NCOLUMNS lead to UB.

--
Best regards,
Andrey Tarasevich

P.S. BTW, '(&array[0][0])[x]' is equivalent to 'array[0][x]'.

OK, this is perfectly clear and logical to me, but I still see this kind of
code somewhat often (even the FAQ uses it, and only mentions as a side note
that it is not strictly conforming). The use being passing a "flattened"
array to a function and then doing manual subscripting like a[i*NCOL+j]
where a is the "flattened array" passed to the function like
f(&array[0][0]). AFAIK, the Standard requires that multidimensional arrays
are stored in row major order (last subscript varies fastest), so people
expect this to work. Doesn't the above mean that multidimensional arrays are
stored (are required to be stored by the Standard) in one continuous block
of memory like [[row0][row1]...[rowN-1]] or is that my misconception ?

Hint: Read comp.lang.c ;-)

Use this link:
http://groups.google.com/groups?thl...94027,883915280,883890862,883862102,883726282
or search for the thread "contiguity of arrays" to find a
discussion/debate about that issue.

In short: Some people think it is as you think, some people
say that the wording of the standard permits that between
array[0][NCOLUMNS-1] and array[1][0] padding or debug info
can be inserted, in short: the memory for the array is contiguous
but the effectively used parts of it are not necessarily.
By playing tricks with pointers in a way not permitted by the
standard, you might hide the access to not used parts from the
compiler, ending up with strange effects in your program.

However, nobody did give an example of a conforming compiler
which does not work with the flattened access.

Personally, I tell my students: Do not do this (if you can
avoid it ;-)).


Cheers
Michael
 
A

Andrey Tarasevich

buda said:
See 6.5.6/8.

'&array[0][0]' is a pointer to the first element of the array
'array[0]'. This array has NCOLUMNS elements in it.

'(&array[0][0])[x]' is equivalent to '*(&array[0][0] + x)'. According to
the rules of pointer arithmetics (6.5.6/8), the expression '&array[0][0]
+ x' is only allowed to form pointers to elements of array 'array[0]'
and one imaginary element immediately after the last one. Otherwise, the
behavior is undefined. I.e. in this case negative values of 'x' and
values that are greater than NCOLUMNS lead to UB.
...
OK, this is perfectly clear and logical to me, but I still see this kind of
code somewhat often (even the FAQ uses it, and only mentions as a side note
that it is not strictly conforming). The use being passing a "flattened"
array to a function and then doing manual subscripting like a[i*NCOL+j]
where a is the "flattened array" passed to the function like
f(&array[0][0]). AFAIK, the Standard requires that multidimensional arrays
are stored in row major order (last subscript varies fastest), so people
expect this to work. Doesn't the above mean that multidimensional arrays are
stored (are required to be stored by the Standard) in one continuous block
of memory like [[row0][row1]...[rowN-1]] or is that my misconception ?
...

You are absolutely right. If one takes into account all the requirements
the standard imposes on the array layout, one has to come to the
conclusion that the layout of a multi-dimensional array is the same as
the layout of a single-dimensional array of appropriate size. I.e. the
common sense says that it should work.

The only problem here is that formally the standard seems to prohibit
this method of access, meaning that implementations are allowed to take
more or less deliberate steps to prevent it from working, if they please
to do so. Although I don't know of any implementation that does that.
 
L

Lawrence Kirby

On Wed, 03 Nov 2004 11:57:44 +0100, Michael Mair wrote:

Since I'm replying to a fairly old article I won't snip.
buda said:
See 6.5.6/8.

'&array[0][0]' is a pointer to the first element of the array
'array[0]'. This array has NCOLUMNS elements in it.

'(&array[0][0])[x]' is equivalent to '*(&array[0][0] + x)'. According to
the rules of pointer arithmetics (6.5.6/8), the expression '&array[0][0]
+ x' is only allowed to form pointers to elements of array 'array[0]'
and one imaginary element immediately after the last one. Otherwise, the
behavior is undefined. I.e. in this case negative values of 'x' and
values that are greater than NCOLUMNS lead to UB.

--
Best regards,
Andrey Tarasevich

P.S. BTW, '(&array[0][0])[x]' is equivalent to 'array[0][x]'.

OK, this is perfectly clear and logical to me, but I still see this kind of
code somewhat often (even the FAQ uses it, and only mentions as a side note
that it is not strictly conforming). The use being passing a "flattened"
array to a function and then doing manual subscripting like a[i*NCOL+j]
where a is the "flattened array" passed to the function like
f(&array[0][0]). AFAIK, the Standard requires that multidimensional arrays
are stored in row major order (last subscript varies fastest), so people
expect this to work. Doesn't the above mean that multidimensional arrays are
stored (are required to be stored by the Standard) in one continuous block
of memory like [[row0][row1]...[rowN-1]] or is that my misconception ?

Hint: Read comp.lang.c ;-)

Use this link:
http://groups.google.com/groups?thl...94027,883915280,883890862,883862102,883726282
or search for the thread "contiguity of arrays" to find a
discussion/debate about that issue.

In short: Some people think it is as you think, some people
say that the wording of the standard permits that between
array[0][NCOLUMNS-1] and array[1][0] padding or debug info
can be inserted, in short: the memory for the array is contiguous
but the effectively used parts of it are not necessarily.

That doesn't make much sense to me. The standard requires there to be no
padding between array elements. When you have an array of arrays this
applies at all levels. Consider this:

ELTYPE (*p)[NCOLUMNS] = malloc(sizeof(ELTYPE) * NROWS * NCOLUMNS);

If the malloc() succeeds (and the byte size of the desired array is
representable as a size_t) is it guaranteed to allocate enough space
to hold the array? I would be worried if not. If it is then there
is exactly enough space for NROWS * NCOLUMNS elements and no
room for extra padding. Don't worry about the possibility of malloc()
being able to allocate more than the requested amount, there are similar
examples using memcpy(), fread()/fwrite() etc. as well as accessing
an object as an array of unsigned char. Essentially, given

ELTYPE array[NELEMS];

then sizeof array == sizeof(ELTYPE) * NELEMS

which applies to every level of an array of arrays.
By playing tricks with pointers in a way not permitted by the
standard, you might hide the access to not used parts from the
compiler, ending up with strange effects in your program.

The question is whether "not used parts" can exist at all. There
can certainly be padding in elements but I would suggest not
between them anywhere in the array.
However, nobody did give an example of a conforming compiler
which does not work with the flattened access.

A compiler that implements strict bounds checking would catch this.
There is a bounds checking gcc, I'm not sure if it does however.
It is a legitimate thing to want to catch: bounds overflow
resulting in data in a different row being corrupted would
be as much a problem as any other bounds overflow.

There are also code generation/optimisation issues. Let's say you
have an architecture that can calculate shorter offsets more
efficiently. An example of this might be the "huge" memory model
used on 16 bit x86 architectures. This permits objects larger
than 64K but since registers are 16 bits wide offset calculations
on large objects i.e. >= 64K take more effort. If you have an array
of arrays such as

int values[100][1000];

then given that sizeof(int) is 2 on the implementation the overall
object is 200000 bytes but each subarray is 2000 bytes. So when
calculating the address of

values[x][y]

it needs internally to evaluate effectively (char *)&values + 2000*x +
2*y. 2*y can be evaluated using 16 bit arithmetic because the result must
always be in the range 0-2000 (allowing for 1 past the end pointers). So
if I write

values[0][40000]

this will not evaluate as values[40][1000] because 2*40000 i.e. 80000 is
not representable in 16 bits.

(This is a description of a potentially conforming implementation,
not of real-word huge memory models which are much weirder and generally
not conforming).
Personally, I tell my students: Do not do this (if you can
avoid it ;-)).

It is always avoidable. The next question then is given

int *p = (int *)values; /* or (int *)&values */

whether p[40000] is valid C.

Lawrence
 
M

Michael Mair

Lawrence said:
On Wed, 03 Nov 2004 11:57:44 +0100, Michael Mair wrote:

Since I'm replying to a fairly old article I won't snip.

Thanks. Did you read the thread I suggested? Because this
covers most of your questions/answers and my response might
lead to just a reiteration of all arguments and counter-
arguments -- which is exactly what I wanted to avoid.

[snip]
Use this link:
http://groups.google.com/groups?thl...94027,883915280,883890862,883862102,883726282
or search for the thread "contiguity of arrays" to find a
discussion/debate about that issue.

In short: Some people think it is as you think, some people
say that the wording of the standard permits that between
array[0][NCOLUMNS-1] and array[1][0] padding or debug info
can be inserted, in short: the memory for the array is contiguous
but the effectively used parts of it are not necessarily.

That doesn't make much sense to me. The standard requires there to be no
padding between array elements. When you have an array of arrays this
applies at all levels. Consider this:

ELTYPE (*p)[NCOLUMNS] = malloc(sizeof(ELTYPE) * NROWS * NCOLUMNS);

If the malloc() succeeds (and the byte size of the desired array is
representable as a size_t) is it guaranteed to allocate enough space
to hold the array? I would be worried if not. If it is then there
is exactly enough space for NROWS * NCOLUMNS elements and no
room for extra padding. Don't worry about the possibility of malloc()
being able to allocate more than the requested amount, there are similar
examples using memcpy(), fread()/fwrite() etc. as well as accessing
an object as an array of unsigned char. Essentially, given

ELTYPE array[NELEMS];

then sizeof array == sizeof(ELTYPE) * NELEMS

which applies to every level of an array of arrays.

Well, the thing is that the question is answered differently
for allocated memory and automatic/static versions.
The compiler (except for VLAs) can provide the value of
an expression sizeof variable or sizeof(cast-expression)
at compile-time.
Please see the discussion in the suggested thread.

The question is whether "not used parts" can exist at all. There
can certainly be padding in elements but I would suggest not
between them anywhere in the array.

Once again: It depends on what kind of storage duration we have.
The short of the aforementioned discussion is that
yes, an array has to be contiguous, and
yes, an array of arrays uses contiguous memory, too,
but the standard does not force you to use every byte of it.
Example: int a[3][3]
a[0][0] a[0][1] a[0][2] ---- a[1][0] a[1][1] a[1][2] ---- a[2][0] ...
leaves in memory int-sized holes between a[2] and a[i+1][0]
but the "rows" are stored one after the other and the "columns",
too, and everything is contiguous and the compiler only needs to
tweak sizeof a to be 3*3*sizeof(int).
The question is not whether this is sensible or helps much for
an internal bounds checking compiler but whether this is permitted.
One issue also is the question what happens at access by unsigned
char (or unsigned char *). It should be legal to have a look at
padding elements (if there are any).
I once more refer to the original discussion.

A compiler that implements strict bounds checking would catch this.
There is a bounds checking gcc, I'm not sure if it does however.
It is a legitimate thing to want to catch: bounds overflow
resulting in data in a different row being corrupted would
be as much a problem as any other bounds overflow.

AFAIK, it catches only if you "miss" the object at all but
not if, in an array-of-arrays, you run across "row" boundaries
which is really a pretty nasty error source in and by itself.


[snip: Code generation/optimisation]
Well, I used the same arguments/suggestions in the discussion :)

Personally, I tell my students: Do not do this (if you can
avoid it ;-)).


It is always avoidable. The next question then is given

int *p = (int *)values; /* or (int *)&values */

whether p[40000] is valid C.

*g* The thing is: I at least once show them the (usual) memory
layout by a program using a pointer to access arrays of arrays
(and at least have a look at the actual representation of
doubles and ints and arrays and arrays of arrays thereof using
unsigned char *).
So the "Do not do this at home, children" is necessary but
seeing the actual "pointers" / addresses printed out helps some
people enormously in their struggle to understand pointers.
I could use pointers to "array rows" but this was, in my
experience, at this stage, not helpful. When revisiting the
topic later on, I do it right.


Cheers
Michael
 
C

Christian Staudenmayer

That doesn't make much sense to me. The standard requires there to be no
padding between array elements. When you have an array of arrays this
applies at all levels. Consider this:

ELTYPE (*p)[NCOLUMNS] = malloc(sizeof(ELTYPE) * NROWS * NCOLUMNS);


Perhaps he confused it with structures, where, if i recall correctly, padding may occur.

Greetings, Chris
 
M

Michael Mair

Christian said:
That doesn't make much sense to me. The standard requires there to be no
padding between array elements. When you have an array of arrays this
applies at all levels. Consider this:

ELTYPE (*p)[NCOLUMNS] = malloc(sizeof(ELTYPE) * NROWS * NCOLUMNS);

Perhaps he confused it with structures, where, if i recall correctly, padding may occur.

I did not -- why don't you just read up on the suggested thread?
Then you can come back and remark on the error of my ways or
what I might or might not have meant.

Argh.

-Michael
 
P

pete

Michael said:
See 6.5.6/8.

'&array[0][0]' is a pointer to the first element of the array
'array[0]'. This array has NCOLUMNS elements in it.

'(&array[0][0])[x]' is equivalent to '*(&array[0][0] + x)'. According to
the rules of pointer arithmetics (6.5.6/8), the expression '&array[0][0]
+ x' is only allowed to form pointers to elements of array 'array[0]'
and one imaginary element immediately after the last one. Otherwise, the
behavior is undefined. I.e. in this case negative values of 'x' and
values that are greater than NCOLUMNS lead to UB.

--
Best regards,
Andrey Tarasevich

P.S. BTW, '(&array[0][0])[x]' is equivalent to 'array[0][x]'.

OK, this is perfectly clear and logical to me, but I still see this kind of
code somewhat often (even the FAQ uses it, and only mentions as a side note
that it is not strictly conforming). The use being passing a "flattened"
array to a function and then doing manual subscripting like a[i*NCOL+j]
where a is the "flattened array" passed to the function like
f(&array[0][0]). AFAIK, the Standard requires that multidimensional arrays
are stored in row major order (last subscript varies fastest), so people
expect this to work. Doesn't the above mean that multidimensional arrays are
stored (are required to be stored by the Standard) in one continuous block
of memory like [[row0][row1]...[rowN-1]] or is that my misconception ?

Hint: Read comp.lang.c ;-)

Use this link:
http://groups.google.com/groups?thl...94027,883915280,883890862,883862102,883726282
or search for the thread "contiguity of arrays" to find a
discussion/debate about that issue.

In short: Some people think it is as you think, some people
say that the wording of the standard permits that between
array[0][NCOLUMNS-1] and array[1][0] padding or debug info
can be inserted, in short: the memory for the array is contiguous
but the effectively used parts of it are not necessarily.
By playing tricks with pointers in a way not permitted by the
standard, you might hide the access to not used parts from the
compiler, ending up with strange effects in your program.

However, nobody did give an example of a conforming compiler
which does not work with the flattened access.

Personally, I tell my students: Do not do this (if you can
avoid it ;-)).

When you try to get into the specific mechanisms of undefined behavior,
you should begin to see the appeal of just saying
"It's undefined, forget about it."
 
L

Lawrence Kirby

Thanks. Did you read the thread I suggested? Because this
covers most of your questions/answers and my response might
lead to just a reiteration of all arguments and counter-
arguments -- which is exactly what I wanted to avoid.

Yes, I've read it. AFAICS it discusses how an array can be accessed
and what can be accessed as an array. The issue here is different,
it is how an array can be laid out in memory (from the
program's viewpoint).
[snip]
Use this link:
http://groups.google.com/groups?thl...94027,883915280,883890862,883862102,883726282
or search for the thread "contiguity of arrays" to find a
discussion/debate about that issue.

In short: Some people think it is as you think, some people
say that the wording of the standard permits that between
array[0][NCOLUMNS-1] and array[1][0] padding or debug info
can be inserted, in short: the memory for the array is contiguous
but the effectively used parts of it are not necessarily.

That doesn't make much sense to me. The standard requires there to be no
padding between array elements. When you have an array of arrays this
applies at all levels. Consider this:

ELTYPE (*p)[NCOLUMNS] = malloc(sizeof(ELTYPE) * NROWS * NCOLUMNS);

If the malloc() succeeds (and the byte size of the desired array is
representable as a size_t) is it guaranteed to allocate enough space
to hold the array? I would be worried if not. If it is then there
is exactly enough space for NROWS * NCOLUMNS elements and no
room for extra padding. Don't worry about the possibility of malloc()
being able to allocate more than the requested amount, there are similar
examples using memcpy(), fread()/fwrite() etc. as well as accessing
an object as an array of unsigned char. Essentially, given

ELTYPE array[NELEMS];

then sizeof array == sizeof(ELTYPE) * NELEMS

which applies to every level of an array of arrays.

Well, the thing is that the question is answered differently
for allocated memory and automatic/static versions.

Absolutely not. The layout of arrays must be consistent irrespective
of how they are allocated. For example when I pass a pointer to an
array to a function that function does not know how the array was
allocated. Assuming valid code and no issues such as constness it must
able to access the array correctly.

If the layout is the same the number of bytes the array takes up will be
the same.
The compiler (except for VLAs) can provide the value of an expression
sizeof variable or sizeof(cast-expression) at compile-time.
Please see the discussion in the suggested thread.

True, but that doesn't affect array layout.
Once again: It depends on what kind of storage duration we have. The
short of the aforementioned discussion is that yes, an array has to be
contiguous, and yes, an array of arrays uses contiguous memory, too, but
the standard does not force you to use every byte of it.

It means you can't have gaps between elements, or else they wouldn't be
contiguous.
Example: int
a[3][3]
a[0][0] a[0][1] a[0][2] ---- a[1][0] a[1][1] a[1][2] ---- a[2][0] ...
leaves in memory int-sized holes between a[2] and a[i+1][0] but the
"rows" are stored one after the other and the "columns", too, and
everything is contiguous and the compiler only needs to tweak sizeof a
to be 3*3*sizeof(int).


You have put a gap between each row which means the rows are no longer
contiguous. a is an array of 3 elements, each of those elements is an
array of 3 ints. sizeof(a[0]) is 3*sizeof(int), The last byte of element
a[0] is at byte offset 3*sizeof(int)-1 from the start of a. If the array a
is to be contiguous a[1] must start at the next byte i.e. at offset
3*sizeof(int).
The question is not whether this is sensible or
helps much for an internal bounds checking compiler but whether this is
permitted. One issue also is the question what happens at access by
unsigned char (or unsigned char *). It should be legal to have a look at
padding elements (if there are any).
I once more refer to the original discussion.

That was more along the lines of "if I can prove that there isn't any
padding between structure elements, can I consider them an array". The
principle that there is no padding between array elements is implicit
there.

....

Lawrence
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,050
Latest member
AngelS122

Latest Threads

Top