contiguity of arrays

K

Keith Thompson

Joe Wright said:
James Kuyper wrote: [...]
The elements of the elements of 'a' must be stored in the same way
as the elements of an array of four integers. As a result, on most
implementations pf C code like this works exactly as you
expect. However, because this code has undefined behavior, an
implementation is free to implement pointers in such a fashion that
it can keep track of the limits beyond which they can't be
dereferenced, and abort the problem if those limits are violated. In
particular, when a[0] decays to a pointer, it's legal for the
compiler to give that pointer dereferencing limits of a[0] and
a[0]+1.

Name one compiler enforces such limits.

There may or may not be such a compiler. The point (especially in
comp.std.c) is that any compiler is allowed to enforce such limits.
 
E

E. Robert Tisdale

Keith said:
Joe said:
James Kuyper wrote:
[...]
The elements of the elements of 'a' must be stored in the same way
as the elements of an array of four integers. As a result, on most
implementations pf C code like this works exactly as you
expect. However, because this code has undefined behavior, an
implementation is free to implement pointers in such a fashion that
it can keep track of the limits beyond which they can't be
dereferenced, and abort the problem if those limits are violated. In
particular, when a[0] decays to a pointer, it's legal for the
compiler to give that pointer dereferencing limits of a[0] and
a[0]+1.

Name one compiler enforces such limits.

There may or may not be such a compiler.

Meaning that there *is* no such compiler.
The point (especially in comp.std.c) is that
any compiler is allowed to enforce such limits.

Correct.
The standard de jure allows compiler developers to commit suicide.
We expect them to have more sense than that.
 
J

James Kuyper

pete said:
If it was int a[2][2] = {{1, 2}, {3, 4}}, *b = (int *)a;
^^^^^^^
instead, there wouldn't be a problem accessing b[3], would there?

The validity limits of pointer arithmetic are defined in terms of the
containing array. There is one and only one only array of int that
contains the int object pointed at by 'b'. That array is a[0], and it
only contains two integers. 'a' itself is not an array of 'int', but
rather is an array of 'int[2]', and therefore is not capable of
determining the validity limits for arithmetic on an 'int*'.
 
K

Keith Thompson

E. Robert Tisdale said:
Keith said:
Joe Wright writes: [...]
Name one compiler enforces such limits.
There may or may not be such a compiler.

Meaning that there *is* no such compiler.

No, meaning that there may or may not be such a compiler.

If you're assuming that I'm familiar with all existing C compilers,
and that if there were one that does strict bounds checking I would
know about it, your confidence is misplaced.
 
I

Ivan A. Kosarev

James Kuyper said:
pete <[email protected]> wrote in message
...
If it was int a[2][2] = {{1, 2}, {3, 4}}, *b = (int *)a;
^^^^^^^
instead, there wouldn't be a problem accessing b[3], would there?

The validity limits of pointer arithmetic are defined in terms of the
containing array. There is one and only one only array of int that
contains the int object pointed at by 'b'. That array is a[0], and it

I believe the wording is for stand-alone arrays only, and a[0] is a part of
continious object.

The array above is a single object. Since a[1] + 2 can be pointed to, even b
+ 4 can be pointed to, and b[3] can certainly be addressed.
 
J

James Kuyper

Joe Wright wrote:
....
At a, there are four int objects, one after the other, having values 1,
2, 3 and four respectively. Looks like an array to me, even if undeclared.

Key word: undeclared. If it is not declared as such in the C program, it
doesn't count as such for purposes of determining what the C program is
allowed/required to do with it.
Name one compiler enforces such limits.

I don't know whether are any, though I have vague memories of a compiler
that provided such checking in a special debug mode. It would certainly
be too expensive for use in the default mode.

However, I don't care whether any compiler actually does this; for the
purposes of comp.std.c, all I care about is whether compilers are
allowed to do this. This is cross-posted to comp.lang.c, where the
relevant concerns are different.

....
If I couldn't access a[1][0] as b[2] I would be surprised, and annoyed.

Your surprise and annoyance wouldn't render such a compiler non-conforming.
 
M

Michael Mair

Hi pete,

If it was
int a[2][2] = {{1, 2}, {3, 4}}, *b = (int *)a;
^^^^^^^
instead,
there wouldn't be a problem accessing b[3], would there?

There would. The same arguments hold.


No, the same arguments don't hold.

I am sorry but I am not sure I understand you.

As a is an array of arrays of two ints and not a int **,
we effectively have a being a starting address and not
an address of a starting address. In other words: a does not
decay into an (int **).
This does not change by your applying the typecast.

Please explain your reasoning so we can work with it.

Is this legal? Must it print 4?

int a[2][2] = {{1, 2}, {3, 4}}, *b = a[0];
printf("%d\n", *(b + 3));

Of course not. You still point b to the address of a[0][0],
thus b+3 would point to the address of "a[0][3]". As arrays
are stored contiguously, you could access a[0][3] if a was
declared a[2][X], where X>=4.
But as it is, you try to access an address outside of the
object (int *)a[0] is pointing to. The difference between
a[1] and a[0] might not be (ptrdiff_t) 2*sizeof(int).
Imagine a system with sizeof(int)*CHAR_BIT==32 and optimal
access speed at 96-Bit aligned addresses where the compiler
thinks it fun to 96-Bit align every array "row", giving you
here a "padding" int but being able to get a[X][0] and a[X][1],
X in the appropriate range for a, very quickly to wherever they
are needed.

The problem originally was that b had the address of the first
element of a[0]. a[0] has two elements, a[0][0] and a[0][1].
The address of b[3] is outside of a[0][1],
which is to say that b[3] is beyond the boundary of a[0].

For
b = (int *)a;
b has the address of the first element of a, converted to int *.
a has two elements, a[0] and a[1].
The address of b[3] is not outside of a[1],
which is to say that b[3] is not beyond the boundary of a.

Umh, once again: a does not decay into an (int **).
What you want is really
int (*b)[2];
b = a;
Then b == a[0] and b+1 == a[1].

So ka?


--Michael
 
R

Richard Bos

E. Robert Tisdale said:
Joe said:
If I couldn't access a[1][0] as b[2] I would be surprised, and annoyed.

But you can.

The *de facto* standard is as you describe it.
No compiler developer would implement the standard *de jure*
unless they were suicidal.

Or designing a _deliberately_ strict implementation, for example a
debugging compiler.

Richard
 
R

Richard Bos

pete said:
Michael said:
There is special wording that allows any object,
including an array of arrays,
to be accessed completely using pointers to unsigned char.
This is what makes memcpy() usable.
However, for any other type this is an issue.


If it was
int a[2][2] = {{1, 2}, {3, 4}}, *b = (int *)a;
^^^^^^^
instead,
there wouldn't be a problem accessing b[3], would there?

There would. The same arguments hold.

No, the same arguments don't hold.

Actually, yes, they do. Observe - fit the first:
int a[2][2] = {{1, 2}, {3, 4}},

You now have an array of two arrays of int.
*b = a[0];

You get the address of the first of these int arrays, convert it to an
int pointer, and assign it to b.
printf("%d\n", *(b + 3));

You invoke undefined behaviour by increasing _that int pointer_ beyond
its legal boundary.

Fit the second:
int a[2][2] = {{1, 2}, {3, 4}},

You now have an array of two arrays of int.
*b = a[0];

You get the address of the entire array, convert it to an int pointer,
and assign it to b.
printf("%d\n", *(b + 3));

You invoke undefined behaviour by increasing _that int pointer_ beyond
its legal boundary.

Note that:
- the address of an array and the address of its first member are
identical.
- the entire array is properly aligned for ints, so a conversion of its
base address (or the address of its first member, which is the same
except for type) to int * must succeed and give the address of the
first int in the array.
- once a pointer has been converted to another pointer type, there is
nothing in the Standard that allows you to deduce the original type.

IOW, if you have

int a[2][2] = {{1, 2}, {3, 4}}, *b = (int *)a, *c = a[0];

then b==c _must_ evaluate to 1. The two pointers you get from those two
conversions have exactly the same value, and exactly the same
requirements.
Note that this is _not_ true of a and a[0]; but this is because a has a
different type than a[0]. Once you convert them both to int *, this
distinction is, obviously, lost.

Richard
 
D

Douglas A. Gwyn

James said:
Joe said:
If I couldn't access a[1][0] as b[2] I would be surprised, and annoyed.
Your surprise and annoyance wouldn't render such a compiler non-conforming.

James has correctly identified the issue: an "array" is
more than just a contiguous sequence of similar objects;
it also has a declared size. It is entirely possible
that a compiler can generate more efficient code if it
takes advantage of the declared size; for example, if an
architecture has a 64KB limit on segment size and the
array is declared smaller than that, then it will not be
necessary to generate code that copes with segment
boundaries (e.g. loading different segment base
addresses for different parts of the array). An array
of arrays is guaranteed to have the storage contiguously
allocated (without extra padding), but not all elements
of each array can be accessed by indexing off a pointer
"based on" a pointer to a given element in a particular
array.

However, on many architectures there is no noticeable
speed penalty involved in supporting that usage, due to
a uniform, large memory space and wide-enough pointers
Thus, some programmers have been getting away with this
nonportable practice on the platforms they have used so
far.
 
I

Ivan A. Kosarev

James Kuyper said:
Ivan A. Kosarev wrote:
... and a[0] is a part of
continious object.

Agreed. If the validity limits of pointer arithmetic cared about
contiguity, that would be a relevant argument. They don't. They're
defined entirely in terms of the elements of a single array of the
pointed-at type.

None of types designates objects. :)

Instead, objects are memory areas which are interpreted accordingly to their
types. Since that, it's not important how we get a pointer pair to compare
it with relational operators; if they point to a single array (that is an
*object*, not a *type*), they can be compared with a defined result.
The array above is a single object. Since a[1] + 2 can be pointed to,

a[1]+2 is valid pointer value, which can be compared for equality with
any valid pointer value, and compared for relative order with any other
pointer that points into or one past the end of a[1]. However, it cannot

Again, since the array is a single object, a[1] + 2 and b + 4 are values
that point to the same object of the same type.
 
E

E. Robert Tisdale

Douglas said:
Thus, some programmers have been getting away with this
nonportable practice
on the platforms they have used so far.

If they are "getting away with" it, it's portable.
 
R

Randy Howard

If they are "getting away with" it, it's portable.

Not at all. It simply means they haven't tried enough platforms
yet. By your usage, anyone using envp as an argument to main
is writing portable code, as long as they haven't tried it on
a platform where it doesn't work yet.
 
E

E. Robert Tisdale

Randy said:
Not at all.
It simply means they haven't tried enough platforms yet.
> cat main.c
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char* argv[]) {
const int a[2][2] = {{1, 2}, {3, 4}}, *b = a[0];
const size_t n = sizeof(a)/sizeof(a[0][0]);
for (size_t j = 0; j < n; ++j)
fprintf(stdout, "b[%u] = %d\t", j, b[j]);
fprintf(stdout, "\n");
return EXIT_SUCCESS;
}
> gcc -Wall -std=c99 -pedantic -o main main.c
> ./main
b[0] = 1 b[1] = 2 b[2] = 3 b[3] = 4

This program ports to every platform
with a C99 compliant compiler.
 
P

pete

James said:
pete said:
If it was int a[2][2] = {{1, 2}, {3, 4}}, *b = (int *)a;
^^^^^^^
instead, there wouldn't be a problem accessing b[3], would there?

The validity limits of pointer arithmetic are defined in terms of the
containing array. There is one and only one only array of int that
contains the int object pointed at by 'b'. That array is a[0], and it
only contains two integers. 'a' itself is not an array of 'int', but
rather is an array of 'int[2]', and therefore is not capable of
determining the validity limits for arithmetic on an 'int*'.

I don't see what difference it makes whether or not object a
even contains an int type. As long as object a is as big as an int
and also aligned for type int, I can access the object as (*(int*)&a)
regardless if a was declared as a structure or an array of floats.

With
int *b = (int *)&a;
b[3] isn't accessing an element of a,
b[3] is accessing the memory at (int*)&a + 3,
and treating it as an object of type int.
 
I

Ivan A. Kosarev

Douglas A. Gwyn said:
James said:
Joe said:
If I couldn't access a[1][0] as b[2] I would be surprised, and
annoyed.
Your surprise and annoyance wouldn't render such a compiler
non-conforming.

James has correctly identified the issue: an "array" is
more than just a contiguous sequence of similar objects;
it also has a declared size. It is entirely possible
that a compiler can generate more efficient code if it
takes advantage of the declared size; for example, if an

Does this mean that pointers that are results of array-to-pointer conversion
and any other pointers are somehow differ?

If they don't, how the Standard allows such optimizations with the first and
forbids with the second ones? (Hopefully, we will keep in mind that an
abstract machine cannot refer to any optimization, including any kind of
folding and propagation.)
 
M

Michael Mair

Hi pete

I don't see what difference it makes whether or not object a
even contains an int type. As long as object a is as big as an int
and also aligned for type int, I can access the object as (*(int*)&a)
regardless if a was declared as a structure or an array of floats.

With
int *b = (int *)&a;
b[3] isn't accessing an element of a,
b[3] is accessing the memory at (int*)&a + 3,
and treating it as an object of type int.

Assuming a in this case is not an array of any flavour
(or that you would have used the appropriate &a[0]...[0]):

That is exactly the point! b[3] or b+3 accesses this address
but it is not guaranteed that it may do so!
You just might try to access memory which you do not have
access to as it does not belong to the object you pointed b
to...


--Michael
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top