# i think i found

J

#### Joe keane

int f(int x)
{
int a[2];
int *p;

p = &a[0];
*p = 2;
++p;
*p = 3;
/* printf("%d %d\n", a[0], a[1]); */
}

That pretty much has to work, right?

So what's the deal with arrays of arrays?

I

#### Ian Collins

int f(int x)
{
int a[2];
int *p;

p =&a[0];
*p = 2;
++p;
*p = 3;
/* printf("%d %d\n", a[0], a[1]); */
}

That pretty much has to work, right?

Why shouldn't it?

So what's the deal with arrays of arrays?

Two for \$10.

G

#### Guest

int f(int x)
{
int a[2];
int *p;

p = &a[0];
*p = 2;
++p;
*p = 3;
/* printf("%d %d\n", a[0], a[1]); */
}

That pretty much has to work, right?

So what's the deal with arrays of arrays?

what?

B

#### Barry Schwarz

int f(int x)
{
int a[2];
int *p;

p = &a[0];
*p = 2;
++p;
*p = 3;
/* printf("%d %d\n", a[0], a[1]); */
}

That pretty much has to work, right?

No it does not. The function fails to return an int as promised in
the declaration. I don't remember whether this is undefined behavior
or a constraint violation.
So what's the deal with arrays of arrays?

Since your sample code does not have any arrays of arrays, we have no
idea what your question is. Did you perchance mean something like

int f(int x)
{
int a[4][3];
int (*p)[3];

p = &a[0];
(*p)[1] = 2;
++p;
(*p)[2] = 3;
/* printf("%d %d\n", a[0][1], a[1][2]); */
return 0;
}

J

#### James Kuyper

Barry Schwarz wrote: ....
Prior to C99, these words:

ISO/IEC 9899: 1990
6.6.6.4 The return statement

A function may have any number of return statements,
with and without expressions.

If a return statement without an expression is executed,
and the value of the function call is used by the caller,
the behavior is undefined.
Reaching the } that terminates a function is
equivalent to executing a return statement without an expression.

suggest that the use of the above function was defined in C90,
as long as the code didn't use the return value.

Both C99 and n1570, only have this much to say on this matter:

A function may have any number of return statements.

They didn't change that aspect of C in C99, they only changed the way it
was expressed.
6.9.1p12: "If the } that terminates a function is reached, and the value
of the function call is used by the caller, the behavior is undefined.".

J

#### Joe keane

Since your sample code does not have any arrays of arrays, we have no

OK

int f(int x)
{
int a[2][2];
int *p;

p = &a[0][0];
*p = 1;
p += 2;
*p = 2;
--p;
*p = 3;
p += 2;
*p = 4;
/* printf(...); */
return 0;
}

K

#### Keith Thompson

Since your sample code does not have any arrays of arrays, we have no

OK

int f(int x)
{
int a[2][2];
int *p;

p = &a[0][0];
*p = 1;
p += 2;
*p = 2;
--p;
*p = 3;
p += 2;
*p = 4;
/* printf(...); */
return 0;
}

Great, you've posted some more code. Did you want to say something
you think you've found?

I'm sure the point you're making is clear to you, but you're not
conveying it well to the rest of us.

As for the code you posted, its behavior is undefined. "p" is an "int*"
that's initialized to point to element 0 of a 2-element array a[0]
(which happens to be an element of a larger array). "p += 2" causes it
to point just past the last element of the array a[0], which is ok. But
the "*p = 2;" assigns a value past the end of that array, which has
undefined behavior.

N1570 section J.2 has a (non-normative) list of undefined behaviors, one
of which is:

An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.6).

J

#### Joe keane

Prior to C99, these words: [...]
suggest that the use of the above function was defined in C90,
as long as the code didn't use the return value.

I slip into C78.

J

#### Joe keane

An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.6).

I didn't use 'array subscript' at all!

It is fine that in a "a[j]" syntax people can check both subsctipts
rather than the offset. But it's rather oblique about whether you can
increment a 'int *' pointer, e.g. from &a[0][4] to &a[1][0]. It would
be kind of sad if this didn't work right. It doesn't overtly address
the point.

K

#### Keith Thompson

An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.6).

I didn't use 'array subscript' at all!

It is fine that in a "a[j]" syntax people can check both subsctipts
rather than the offset. But it's rather oblique about whether you can
increment a 'int *' pointer, e.g. from &a[0][4] to &a[1][0]. It would
be kind of sad if this didn't work right. It doesn't overtly address
the point.

The array subscript operation is defined in terms of pointer
arithmetic; E1[E2] is by definition identical to (*((E1)+(E2))).
It follows from that that your example's behavior is undefined.

In practice, it's likely to "work", but a compiler is free to generate
code assuming that the indices do not exceed the declared bounds of the
array.

J

#### Joe keane

The array subscript operation is defined in terms of pointer
arithmetic; E1[E2] is by definition identical to (*((E1)+(E2))).

What is 'array subscript'?

An array type is contiguous. What does that mean?

int f(int t)
{
int a[3];
int *p;

p = a;
++p;
*p = 1;
}

As we expect, this stores in a consistent location.

An array of arrays is contiguous. It doesn't say only the last
dimension is contiguous, all of them are. So then the whole thing is.
And we have 'sizeof' to prove it.

int g(int t)
{
int b[4][5];
int *p;

p = *a;
p += 4;
*p = 4;
p += 2;
*p = 6;
}

If it breaks on the second one, then the array is not 'contiguous'.

The compiler can't change it to 'int b[4][8];' because that's not
'contiguous' the way the programmer wanted.

I'm just using an 'int *' here. It has to work the same as any other
'int *', pointing to the stack, from malloc, or whatever. I don't think
it can 'remember' a bunch of extra stuff.
It follows from that that your example's behavior is undefined.

unclear

E

#### Eric Sosman

The array subscript operation is defined in terms of pointer
arithmetic; E1[E2] is by definition identical to (*((E1)+(E2))).

What is 'array subscript'?

The subject of Section 6.5.2.1 of a certain International
Standard.
An array type is contiguous. What does that mean?

The word "contiguous" is given no special meaning by the
International Standard mentioned above, so we must assume that
the Standard uses the word in accordance with its meaning in
English. There is no formal definition of English, nor any
authority that can be called "authoritative" (all dictionaries,
lexicons, grammars, and the like being merely derivative works),
but one studious observer of English offers

con-tig-u-ous adj [L /contiguus/, fr /contingere/ to
have contact with] 1 : being in actual contact :
TOUCHING 2 : ADJOINING 3 : next or near in time or

For arrays as described in Section 6.2.5 paragraph 20 of the
aforementioned aforementioned Standard, I think we can discard
meaning (1) and imagine that meanings (2) and/or (3) are intended.
As we expect, this stores in a consistent location.

An array of arrays is contiguous. It doesn't say only the last
dimension is contiguous, all of them are. So then the whole thing is.
And we have 'sizeof' to prove it.

int g(int t)
{
int b[4][5];
int *p;

p = *a;
p += 4;
*p = 4;
p += 2;
*p = 6;
}

If it breaks on the second one, then the array is not 'contiguous'.

It "breaks" because it doesn't even compile. More often
than I'd like, my own incomplete editing of what started as an
English sentence turns it into garbage; has something similar
happened here?

You betcha.

T

#### Tim Rentsch

The array subscript operation is defined in terms of pointer
arithmetic; E1[E2] is by definition identical to (*((E1)+(E2))).

What is 'array subscript'?

An array type is contiguous. What does that mean?

int f(int t)
{
int a[3];
int *p;

p = a;
++p;
*p = 1;
}

As we expect, this stores in a consistent location.

An array of arrays is contiguous. It doesn't say only the last
dimension is contiguous, all of them are. So then the whole
thing is. And we have 'sizeof' to prove it.

int g(int t)
{
int b[4][5];
int *p;

p = *b; // [corrected from p = *a;]
p += 4;
*p = 4;
p += 2;
*p = 6;
}

If it breaks on the second one, then the array is not 'contiguous'.

The compiler can't change it to 'int b[4][8];' because that's not
'contiguous' the way the programmer wanted.

Yes, the individual elements are contiguous. However, that
fact alone is not enough to guarantee that this sort of
"cross-subarray-bounds" indexing will work.

I'm just using an 'int *' here. It has to work the same as any
other 'int *', pointing to the stack, from malloc, or whatever.

In fact that's not right, according to remarks in the Standard
and also, IIRC, some Defect Reports. Sad but true.
I don't think it can 'remember' a bunch of extra stuff.

It can, or at least act as though it does in some ways. The
low-order bits are somewhat murky, but the high-order bit is ONE.

The Standard does make it clear that some cases of indexing
outside the bounds of a single subarray transgress into
undefined behavior, and it is likely that the code above
falls under the cases intended for that.

The Standard is _not_ clear about exactly where the boundaries
are as to what is defined and what is undefined in such
cases, and that has led to a lot of arguments in the
newsgroups about which is which. However, for simple
cases like the one shown above, I believe the evidence
that they are meant to be undefined behavior is pretty
convincing.

If you care, an assignment (or initialization) like

int *p = (int*) &b;

is more likely to work as you expect (ie, and be defined
behavior) for the "cross-subarray-bounds" indexing like
that shown in the example. So that might be a good way
to respond to this issue.

J

#### Joe keane

If you care, an assignment (or initialization) like

int *p = (int*) &b;

is more likely to work as you expect (ie, and be defined
behavior) for the "cross-subarray-bounds" indexing like
that shown in the example.

I debated how to write this.

Is it possible the following are different:

p = (int *) &b;
p = (int *) &b[0];
p = /int */ &b[0][0];

Maybe some work and others don't?

It seems like we're just guessing here.

I try to follow the standard, even the parts that contradict the other
parts.

I've been programming C for a long time, and i just 'knew' this is
perfectly valid C (for some reason), so i was a bit shocked.

I would just say, for 'int *' not 'int (*)[N]', we can increment it,
valid as long as you're within the *main* array, that it is whatever it
is declared as, global, or on the stack, or from malloc, and not just in
the first 'row', it is contiguous in all dimensions, and we should have
&a[1][0] == &a[0][5], and it is nonsensical for them to work different.

That seems like a basic property, first, they are contiguous, and second,
you can actually use that in your code.

If the standard doesn't say so, or says it is and it isn't at the same
time, i think it's defective.

K

#### Keith Thompson

If you care, an assignment (or initialization) like

int *p = (int*) &b;

is more likely to work as you expect (ie, and be defined
behavior) for the "cross-subarray-bounds" indexing like
that shown in the example.

I debated how to write this.

Is it possible the following are different:

p = (int *) &b;
p = (int *) &b[0];
p = /int */ &b[0][0];

The definition of b, which you haven't quoted, is

int b[4][5];

You wanted parentheses on that third line.
Maybe some work and others don't?

I believe all of these "work" in the sense that the conversion yields a
pointer to the int object b[0][0] (though I'm not 100% certain that
that's guaranteed). It doesn't follow that you can safely increment
that pointer to point to elements outside b[0].
It seems like we're just guessing here.

I try to follow the standard, even the parts that contradict the other
parts.

I'm not aware of any contradiction in this area.
I've been programming C for a long time, and i just 'knew' this is
perfectly valid C (for some reason), so i was a bit shocked.

I would just say, for 'int *' not 'int (*)[N]', we can increment it,
valid as long as you're within the *main* array, that it is whatever it
is declared as, global, or on the stack, or from malloc, and not just in
the first 'row', it is contiguous in all dimensions, and we should have
&a[1][0] == &a[0][5], and it is nonsensical for them to work different.

That seems like a basic property, first, they are contiguous, and second,
you can actually use that in your code.

If the standard doesn't say so, or says it is and it isn't at the same
time, i think it's defective.

Given
int b[4][5];
the standard says that the elements of b are adjacent. It also says
that the expression
b[0][7]
has undefined behavior. That doesn't mean that it must fail; it simply
means that the standard doesn't define its behavior. It certainly *can*
do what you expect it to do. The point is that the standard gives
compilers latitude to do something else.

J

#### Joe keane

I'm not aware of any contradiction in this area.

I'd go more with 'unclear'.

An array is pretty well defined: the compiler can't add padding in
between rows, or at the start or end. The size is what you expect
from a naive idea of 'array'. The sequence is defined; the compiler
can't store the rows 'upside-down' or 'backwards'.

So we have elements b[0][0], b[0][1], ..., b[1][0], ..., b[3][4]. We
can take addresses &b[0][0], &b[0][1], ..., &b[1][0], ..., &b[3][4], and
of course those have type 'int *'. And we have rows b[0], b[1], ...,
b[3]. We can take addresses &b[0], &b[1], ..., &b[3], and those have
type 'int (*)[5]'.

We can convert between the pointer types using a cast. What guarantees
do we have here? What things have to compare equal? Maybe things
compare equal but don't work the same? And we can do pointer
arithmetic. Maybe we do a lot of pointer arithmetic and convert between
a bunch of types, before a dereference.

We need the standard specify these, not use vague hand-wavy phrases,
leaving everyone to form their own opinion about what it looks like when
you connect all the puzzle pieces. We need to say what is defined or
undefined.

656 is weird because it talks about 'array subscript'. Everyone knows
the brackets operator is shorthand for pointer addition and dereference.
So it should talk about pointer addition and dereference. And then we
have cases where there is pointer addition independent of this, or many
steps before the dereference.

int sethi(char *c)
{
c[0] = 'h';
c[1] = 'i';
c[2] = 0;
}

We don't know that 'c[1]' is valid. The type here is 'char *', a
pointer to a character. So we just have to trust things here.

int boom()
{
char c;

/* here comes the 'boom' */
sethi(&c);
/* printf("%c\n", c); */
}

But this is a caller/callee disagreement. If it were pointer to array,
we could point our fingers better.

int beam()
{
char c[20];

c[0] = '*';
sethi(&c[1]);
strcat(&c[2], "\n");
/* fputs(c, stdout); */
}

Reverse of this, there's also the question of subtracting pointers where
arrays are involved. 656 doesn't appear to touch on this. When is it
defined and does what is fairly obvious? When is it undefined?

int sub(int x)
{
int b[4][5];
int *p;
int *q;
int d;

p = &b[0][4];
q = &b[1][0];
d = (int) (q - p);
/* printf("%d\n", d); */
return d;
}
The point is that the standard gives compilers latitude to do something
else.

Some rationale is not required, but it would be helpful. It is that
they think most uses that violate that rule are probably erroneous? It
is that you may produce better code? Do they think it is a case where
the programmer can do what he intends, but he should express it clearly?
Is it a case where they're being conservative, because the implications
of the opposite are not well understood?

T

#### Tim Rentsch

If you care, an assignment (or initialization) like

int *p = (int*) &b;

is more likely to work as you expect (ie, and be defined
behavior) for the "cross-subarray-bounds" indexing like
that shown in the example.

I debated how to write this.

Is it possible the following are different:

p = (int *) &b;
p = (int *) &b[0];
p = /int */ &b[0][0];

Yes. I believe there is a general consensus that the first two
forms have the same "span", ie all of 'b', and the third form has
a more restricted "span", ie to just the elements of b[0]. Some
people advocate using '&b' to span all of b, just to be safe.
(Disclaimer: "general consensus" does not mean unanimous.)
Maybe some work and others don't?

It seems like we're just guessing here.

That's probably true, but not as much as you might think. There
is additional information in some of the Defect Reports.
Basically, how much can be addressed through a given pointer
depends on where the pointer value comes from (IIRC this is
referred to as the "provenance" of the pointer).
I try to follow the standard, even the parts that contradict the other
parts.

I've been programming C for a long time, and i just 'knew' this is
perfectly valid C (for some reason), so i was a bit shocked.

Me too, and I had a very similar reaction when I first
encountered it. However, after digging in and investigating
different corners of the various official documents, I came
around to the understanding I have been explaining.
I would just say, for 'int *' not 'int (*)[N]', we can increment it,
valid as long as you're within the *main* array, that it is whatever it
is declared as, global, or on the stack, or from malloc, and not just in
the first 'row', it is contiguous in all dimensions, and we should have
&a[1][0] == &a[0][5], and it is nonsensical for them to work different.

I agree that's a reasonable expectation. However, there is
pretty good evidence that the people who defined the Standard
things should work.
That seems like a basic property, first, they are contiguous, and second,
you can actually use that in your code.

If the standard doesn't say so, or says it is and it isn't at the same
time, i think it's defective.

I agree the Standard is defective in this area, although my
reasons are probably different from yours. First, I understand
why it might be desirable to limit the span of &b[0][0], namely,
to provide better opportunities for compiler optimization. So
I'm okay with some 'int *' values having a more restricted range
than others, even though they point to the same place. My
complaint is that exactly how different cases are supposed to
behave is not spelled out either clearly or unambiguously. I'm
willing for the rules to be different from what I might otherwise
expect, but whatever they are they need to be spelled out; IMO
the Standard doesn't do that as well as it should, and that's why
I say it's defective.