contiguity of arrays

S

Steve Kobes

Is this legal? Must it print 4?

int a[2][2] = {{1, 2}, {3, 4}}, *b = a[0];
printf("%d\n", *(b + 3));

--Steve
 
A

Al Bowers

Steve said:
Is this legal? Must it print 4?

int a[2][2] = {{1, 2}, {3, 4}}, *b = a[0];
printf("%d\n", *(b + 3));

--Steve

Yes it is legal and will print 4.
This array describes a contiguously allocated nonempty set of objects of
type int.
b = a[0] positions the int *b to the beginning of the array object as
would b = &a[0][0].

#include <stdio.h>

int main(void)
{
int a[2][2] = {{1, 2}, {3, 4}}, *b = a[0];

printf("a[2][2] = ((1,2},{3,4}};\nint *b;\n"
"Using b = a[0]. *(b + 3) = %d\n", *(b + 3));
b = &a[0][0];
printf("Using b = &a[0][0]. *(b + 3) = %d\n",*(b+3));
printf("The pointer values &a[0][0] and a[0] are%sequal\n",
(&a[0][0] == a[0])?" ":" not ");
return 0;
}
 
C

Chris Torek

Steve said:
Is this legal? Must it print 4?

int a[2][2] = {{1, 2}, {3, 4}}, *b = a[0];
printf("%d\n", *(b + 3));

Yes it is legal and will print 4.

I believe there are those in comp.std.c, at least, who will disagree
with you.
This array describes a contiguously allocated nonempty set of objects of
type int.
b = a[0] positions the int *b to the beginning of the array object as
would b = &a[0][0].

In particular, "b" points to the first of the two consecutive "int"s
{1, 2}. These two consecutive "int"s are indeed followed immediately
in memory by the two consecutive "int"s {3, 4} -- but a compiler
is allowed, through arbitrarily magical means, to insert bounds-checking
on array subscripts and pointer arithmetic. (Remember that these
two are quite similar: subscripting is simply address arithmetic
followed by pointer indirection, and pointer indirection is simply
subscripting -- *p and p[0] are identical.[%] Note that we are allowed
to compute &array[N] but not follow the resulting pointer, so there
is a slight hitch in using the same code for both -- you also need
a boolean flag "this address arithmetic will be followed by
indirection, or not", to handle this in any bounds-checking code.)

Suppose that the compiler does in fact do this magical bounds-checking,
so that it "knows" that b points to the first of only *two* ints stored
in the array a[0]. In this case, b[0] and b[1] are allowed, but b[2]
is rejected with an "out of bounds array subscript" error.

(Personally, I find it all quite unlikely, but it does seem to be
allowed. I also think that, if one writes a compiler with the
decidedly non-magical bounds-checking done via either "fat pointers"
or some kind of runtime table lookups, the bounds for b should run
from 0 to 3 inclusive, in this case, rather than 0 to 1. But these
are my opinions of "the way it ought to work" rather than "what
the letter of the C standard says".)

In practice, I do not believe anyone has yet found an implementation
where b[3] (or equivalently, *(b + 3)) is caught as an error. The
most likely place to find this kind of runtime[%%] checking is in
C compilers for Lisp machines, and I have not had access to one of
those in years.

[% As a result, you can always replace (*p) with p[0], including
in expressions like (*p).field, which is more usually written as
p->field. Hence the -> operator is replaceable with "[0].", e.g.,
instead of:

if (a->b->c == 42)

you can write:

if (a[0].b[0].c == 42)

This example just goes to show that C has a plethora of operators. :)]

[%% This particular case can be caught at compile time, but in
general one would want to test the subscript in b, or the offset
in (b+i), at runtime, both because "i" might be difficult or
impossible to predict, and also because "b" might also be difficult
or impossible to predict:

int a1[10], a2[2];
int *b;
...
if (user_input == 42)
b = a1;
else if (user_input == 6 * 9)
b = a2;
...
do_something(b[3]);

Here, a general-purpose bounds-checker might do something vaguely
like this:

struct bounds *bp;

bp = bounds_lookup(b);
if (bp == NULL)
__runtime_error("invalid pointer");
if (bp->upper_limit < 3)
__runtime_error("out of bounds array subscript");
tmp = b[3];
do_something(tmp);

to handle the call to do_something(). The bounds_lookup() function
compares the supplied pointer value against a table of all active
arrays -- all "global variables" that have array type, all outstanding
malloc()s, and all local variables that are still live on the current
stack frame, for instance -- and finds the corresponding array
bounds information. We can do this without concerning ourselves
with ordinary simple pointers (&localvar, where localvar is *not*
an array) because we have a nonzero subscript. If this were using
b the code would have to do:

if (i != 0) {
struct bounds *bp;
... as before ...
} else
tmp = *b; /* cannot do bounds checking on zero subscripts */
do_something(tmp);

becaues each ordinary variable "acts like" an array of size 0.
Alternatively, bounds_lookup() might actually have bounds-table
information for *all* variables whose address is taken, instead
of just all arrays.

The bounds-checking above needs to handle pointer arithmetic as
well, so that if we do "b++", the upper limit is reduced by one
and the lower limit goes from 0 to -1. (There are other methods
for handling this that are probably more practical in real
bounds-checkers; this version is just for illustration.)

Bounds-checking tends to slow down programs quite a bit, and is an
area in which good compiler-level optimization is important. If
a loop is guaranteed to run from 0 to N-1 inclusive, a single
bounds-check on N-1 before entering the loop at all replaces many
runtime tests. If the bounds-check can be done at compile time,
then omitted entirely from the runtime code, that is even better.]
 
B

Barry Schwarz

Steve said:
Is this legal? Must it print 4?

int a[2][2] = {{1, 2}, {3, 4}}, *b = a[0];
printf("%d\n", *(b + 3));

Yes it is legal and will print 4.

I believe there are those in comp.std.c, at least, who will disagree
with you.

I could not follow your discussion which followed so I don't know
exactly which point I'm about to disagree with. Instead I will try to
present an argument that I believe shows the original question about
legality must be answered in the affirmative.

For an array q (not declared as a function parameter),
sizeof q / sizeof *q must evaluate to the number of elements in b.

For a pointer p to type T, the expression p+1 must evaluate to the
same value as (T*)((char*)p + sizeof(T))

Therefore

sizeof a must evaluate to the same value as 2*sizeof a[0]

sizeof a[0] and sizeof a[1] both must evaluate to the same value
as 2*sizeof a[0][0].

sizeof a[0][0] must evaluate to the same value as sizeof(int)

sizeof a must evaluate to the same value as 4*sizeof(int)

Since a contains 4 int and is exactly large enough to contain 4 int,
these 4 int must be located at b, b+1, b+2, and b+3.

snip


<<Remove the del for email>>
 
M

Mark McIntyre

Steve Kobes wrote:
Is this legal? Must it print 4?

int a[2][2] = {{1, 2}, {3, 4}}, *b = a[0];
printf("%d\n", *(b + 3));

Yes it is legal and will print 4.

I believe there are those in comp.std.c, at least, who will disagree
with you.

I could not follow your discussion which followed so I don't know
exactly which point I'm about to disagree with. Instead I will try to
present an argument that I believe shows the original question about
legality must be answered in the affirmative.

(snip argument which is based on the size of the objects, and 1-d array
arithmetic. ) The size argument is spurious. The compiler is allowed to
tell you the object is size 4, even if all 4 members were in separate
memory arrays in different solar systems. Theres no limit to the internal
magic the compiler can perform.
 
P

pete

Barry said:
Steve Kobes wrote:
Is this legal? Must it print 4?

int a[2][2] = {{1, 2}, {3, 4}}, *b = a[0];
printf("%d\n", *(b + 3));

Yes it is legal and will print 4.

I believe there are those in comp.std.c, at least, who will disagree
with you.

I could not follow your discussion which followed so I don't know
exactly which point I'm about to disagree with. Instead I will try to
present an argument that I believe shows the original question about
legality must be answered in the affirmative.

For an array q (not declared as a function parameter),
sizeof q / sizeof *q must evaluate to the number of elements in b.

For a pointer p to type T, the expression p+1 must evaluate to the
same value as (T*)((char*)p + sizeof(T))

Therefore

sizeof a must evaluate to the same value as 2*sizeof a[0]

sizeof a[0] and sizeof a[1] both must
evaluate to the same value as 2*sizeof a[0][0].

sizeof a[0][0] must evaluate to the same value as sizeof(int)

sizeof a must evaluate to the same value as 4*sizeof(int)

Since a contains 4 int and is exactly large enough to contain 4 int,
these 4 int must be located at b, b+1, b+2, and b+3.

The problem is that b, with (b = a[0])
is pointed at the first element of a two element array.

If you have

int a = 0, b = 0;

then, you can't do this

if (a + 1 == b) { /* The condition check is valid */
printf("%d\n", a[1]); /* Then printf call isn't valid */
}

Whether or not there is an int object at (a + 1) is not the point.
Overunning the bounds of the object, is the point.
 
P

pete

pete said:
Barry said:
Steve Kobes wrote:
Is this legal? Must it print 4?

int a[2][2] = {{1, 2}, {3, 4}}, *b = a[0];
printf("%d\n", *(b + 3));

I believe there are those in comp.std.c, at least, who will disagree
with you.

I could not follow your discussion which followed so I don't know
exactly which point I'm about to disagree with. Instead I will try to
present an argument that I believe shows the original question about
legality must be answered in the affirmative.

For an array q (not declared as a function parameter),
sizeof q / sizeof *q must evaluate to the number of elements in b.

For a pointer p to type T, the expression p+1 must evaluate to the
same value as (T*)((char*)p + sizeof(T))

Therefore

sizeof a must evaluate to the same value as 2*sizeof a[0]

sizeof a[0] and sizeof a[1] both must
evaluate to the same value as 2*sizeof a[0][0].

sizeof a[0][0] must evaluate to the same value as sizeof(int)

sizeof a must evaluate to the same value as 4*sizeof(int)

Since a contains 4 int and is exactly large enough to contain 4 int,
these 4 int must be located at b, b+1, b+2, and b+3.

The problem is that b, with (b = a[0])
is pointed at the first element of a two element array.

If you have

int a = 0, b = 0;

then, you can't do this

if (a + 1 == b) { /* The condition check is valid */
printf("%d\n", a[1]); /* Then printf call isn't valid */
}

Major typos there.

Should be:


if (&a + 1 == &b) { /* The condition check is valid */
printf("%d\n", (&a)[1]); /* Then printf call isn't valid */
}
 
E

E. Robert Tisdale

Steve said:
Is this legal? Must it print 4?

int a[2][2] = {{1, 2}, {3, 4}}, *b = a[0];
printf("%d\n", *(b + 3));

Despite the fact that this works everywhere,
neither C89 or C99 "guarantees" this.

You can reference *(b+0) and *(b+1) and
you can compute (b+2) but you can't reference *(b+2).
It isn't "legal" to compute (b+3) much less reference *(b+3).

If this "equivalence" is important to you,
I don't think that you need to be too concerned
about what the standard says.
 
M

Michael Mair

Hiho,
(snip argument which is based on the size of the objects, and 1-d array
arithmetic. ) The size argument is spurious. The compiler is allowed to
tell you the object is size 4, even if all 4 members were in separate
memory arrays in different solar systems. Theres no limit to the internal
magic the compiler can perform.

True. But I have yet to see the compiler that will correctly wrap _all_
calls to memcpy() and all other functions getting size_t and pointer
arguments for this case.
It is much easier to build non-broken programs the other way.


Cheers
Michael
 
J

Joe Wright

E. Robert Tisdale said:
Steve said:
Is this legal? Must it print 4?

int a[2][2] = {{1, 2}, {3, 4}}, *b = a[0];
printf("%d\n", *(b + 3));


Despite the fact that this works everywhere,
neither C89 or C99 "guarantees" this.

You can reference *(b+0) and *(b+1) and
you can compute (b+2) but you can't reference *(b+2).
It isn't "legal" to compute (b+3) much less reference *(b+3).

If this "equivalence" is important to you,
I don't think that you need to be too concerned
about what the standard says.

Nonsense. b is an int* pointing to an array of four int's. b[2] and
b[3] are absolutely legal.
 
E

E. Robert Tisdale

Joe said:
E. Robert Tisdale said:
Steve said:
Is this legal? Must it print 4?

int a[2][2] = {{1, 2}, {3, 4}}, *b = a[0];
printf("%d\n", *(b + 3));

Despite the fact that this works everywhere,
neither C89 or C99 "guarantees" this.

You can reference *(b+0) and *(b+1) and
you can compute (b+2) but you can't reference *(b+2).
It isn't "legal" to compute (b+3) much less reference *(b+3).

If this "equivalence" is important to you,
I don't think that you need to be too concerned
about what the standard says.

Nonsense. b is an int* pointing to an array of four int's.
b[2] and b[3] are absolutely legal.
 
P

pete

Joe said:
E. Robert Tisdale said:
Steve said:
Is this legal? Must it print 4?

int a[2][2] = {{1, 2}, {3, 4}}, *b = a[0];
printf("%d\n", *(b + 3));


Despite the fact that this works everywhere,
neither C89 or C99 "guarantees" this.

You can reference *(b+0) and *(b+1) and
you can compute (b+2) but you can't reference *(b+2).
It isn't "legal" to compute (b+3) much less reference *(b+3).

If this "equivalence" is important to you,
I don't think that you need to be too concerned
about what the standard says.

Nonsense. b is an int* pointing to an array of four int's.
b[2] and b[3] are absolutely legal.

For

int myobject[4];

myobject, is an object of type array of four int.
You don't have an object of that type in the previous code.
 
R

Richard Bos

Joe Wright said:
E. Robert Tisdale said:
Steve said:
Is this legal? Must it print 4?

int a[2][2] = {{1, 2}, {3, 4}}, *b = a[0];
printf("%d\n", *(b + 3));


Despite the fact that this works everywhere,
neither C89 or C99 "guarantees" this.

You can reference *(b+0) and *(b+1) and
you can compute (b+2) but you can't reference *(b+2).
It isn't "legal" to compute (b+3) much less reference *(b+3).

If this "equivalence" is important to you,
I don't think that you need to be too concerned
about what the standard says.

Nonsense. b is an int* pointing to an array of four int's. b[2] and
b[3] are absolutely legal.

No, it's not. b is an int * pointing to the first of to subsequent
arrays of two ints.
E. is quite correct that a. this is not the same thing as your version,
b. strictly speaking this means that this code is not correct, and c.
this difference is theoretical; you'd be hard put to find an
implementation where this fails, perhaps excepting very strict debugging
implementations.

Richard
 
J

James Kuyper

E. Robert Tisdale said:
Joe said:
E. Robert Tisdale said:
Steve Kobes wrote:

Is this legal? Must it print 4?

int a[2][2] = {{1, 2}, {3, 4}}, *b = a[0];
printf("%d\n", *(b + 3));


Despite the fact that this works everywhere,
neither C89 or C99 "guarantees" this.

You can reference *(b+0) and *(b+1) and
you can compute (b+2) but you can't reference *(b+2).
It isn't "legal" to compute (b+3) much less reference *(b+3).

If this "equivalence" is important to you,
I don't think that you need to be too concerned
about what the standard says.


Nonsense. b is an int* pointing to an array of four int's.
b[2] and b[3] are absolutely legal.

How can it point to such an array? None has been defined in this
program. C doesn't have multidimensional arrays, it has arrays of
arrays, and this is one of the subtle differences between those two
concepts. The named array 'a' contains two elements, which are
themselves arrays. The unnamed array identified by a[0] is an array of
only two elements, which are ints. The rules for the limits of pointer
arithmetic are defined in terms of arrays, and according to the rules of
the C language, 'a' is not an array of four integers.

The elements of the elements of 'a' must be stored in the same way as
the elements of an array of four integers. As a result, on most
implementations pf C code like this works exactly as you expect.
However, because this code has undefined behavior, an implementation is
free to implement pointers in such a fashion that it can keep track of
the limits beyond which they can't be dereferenced, and abort the
problem if those limits are violated. In particular, when a[0] decays to
a pointer, it's legal for the compiler to give that pointer
dereferencing limits of a[0] and a[0]+1.

There is special wording that allows any object, including an array of
arrays, to be accessed completely using pointers to unsigned char. This
is what makes memcpy() usable. However, for any other type this is an issue.
 
P

pete

James said:
E. Robert Tisdale said:
Joe said:
E. Robert Tisdale wrote:

Steve Kobes wrote:

Is this legal? Must it print 4?

int a[2][2] = {{1, 2}, {3, 4}}, *b = a[0];
printf("%d\n", *(b + 3));
There is special wording that allows any object,
including an array of arrays,
to be accessed completely using pointers to unsigned char.
This is what makes memcpy() usable.
However, for any other type this is an issue.

If it was
int a[2][2] = {{1, 2}, {3, 4}}, *b = (int *)a;
^^^^^^^
instead,
there wouldn't be a problem accessing b[3], would there?
 
M

Michael Mair

There is special wording that allows any object,
including an array of arrays,
to be accessed completely using pointers to unsigned char.
This is what makes memcpy() usable.
However, for any other type this is an issue.


If it was
int a[2][2] = {{1, 2}, {3, 4}}, *b = (int *)a;
^^^^^^^
instead,
there wouldn't be a problem accessing b[3], would there?

There would. The same arguments hold. If you want to be
sure that you use the right type of pointer, you have
to work with something along the lines of int (*b)[2].
 
J

Joe Wright

James said:
E. Robert Tisdale said:
Joe said:
E. Robert Tisdale wrote:

Steve Kobes wrote:

Is this legal? Must it print 4?

int a[2][2] = {{1, 2}, {3, 4}}, *b = a[0];
printf("%d\n", *(b + 3));



Despite the fact that this works everywhere,
neither C89 or C99 "guarantees" this.

You can reference *(b+0) and *(b+1) and
you can compute (b+2) but you can't reference *(b+2).
It isn't "legal" to compute (b+3) much less reference *(b+3).

If this "equivalence" is important to you,
I don't think that you need to be too concerned
about what the standard says.



Nonsense. b is an int* pointing to an array of four int's.
b[2] and b[3] are absolutely legal.


How can it point to such an array? None has been defined in this
program. C doesn't have multidimensional arrays, it has arrays of
arrays, and this is one of the subtle differences between those two
concepts. The named array 'a' contains two elements, which are
themselves arrays. The unnamed array identified by a[0] is an array of
only two elements, which are ints. The rules for the limits of pointer
arithmetic are defined in terms of arrays, and according to the rules of
the C language, 'a' is not an array of four integers.

At a, there are four int objects, one after the other, having values
1, 2, 3 and four respectively. Looks like an array to me, even if
undeclared.
The elements of the elements of 'a' must be stored in the same way as
the elements of an array of four integers. As a result, on most
implementations pf C code like this works exactly as you expect.
However, because this code has undefined behavior, an implementation is
free to implement pointers in such a fashion that it can keep track of
the limits beyond which they can't be dereferenced, and abort the
problem if those limits are violated. In particular, when a[0] decays to
a pointer, it's legal for the compiler to give that pointer
dereferencing limits of a[0] and a[0]+1.

Name one compiler enforces such limits.
There is special wording that allows any object, including an array of
arrays, to be accessed completely using pointers to unsigned char. This
is what makes memcpy() usable. However, for any other type this is an
issue.

If I couldn't access a[1][0] as b[2] I would be surprised, and annoyed.
 
E

E. Robert Tisdale

Joe said:
If I couldn't access a[1][0] as b[2] I would be surprised, and annoyed.

But you can.

The *de facto* standard is as you describe it.
No compiler developer would implement the standard *de jure*
unless they were suicidal.
The "legal" restrictions of the standard de jure are pure sophistry.
 
P

pete

Michael said:
There is special wording that allows any object,
including an array of arrays,
to be accessed completely using pointers to unsigned char.
This is what makes memcpy() usable.
However, for any other type this is an issue.


If it was
int a[2][2] = {{1, 2}, {3, 4}}, *b = (int *)a;
^^^^^^^
instead,
there wouldn't be a problem accessing b[3], would there?

There would. The same arguments hold.

No, the same arguments don't hold.

Is this legal? Must it print 4?

int a[2][2] = {{1, 2}, {3, 4}}, *b = a[0];
printf("%d\n", *(b + 3));

The problem originally was that b had the address of the first
element of a[0]. a[0] has two elements, a[0][0] and a[0][1].
The address of b[3] is outside of a[0][1],
which is to say that b[3] is beyond the boundary of a[0].

For
b = (int *)a;
b has the address of the first element of a, converted to int *.
a has two elements, a[0] and a[1].
The address of b[3] is not outside of a[1],
which is to say that b[3] is not beyond the boundary of a.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Staff online

Members online

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,007
Latest member
obedient dusk

Latest Threads

Top