Pointer arithmetics in 2D arrays

A

Army1987

Is this code legal (according to the strictest possible sane[1]
interpretation of the standard, regardless of wheter it does work
on all implementations in the known universe)?

#include <stdio.h>
void print_square_matrix(int *ptr, int order)
{
int i, j;
for (i = 0; i < order; i++) {
for (j = 0; j < order; j++)
printf("%3d ", *ptr++);
putchar('\n');
}
}

int main(void)
{
int matrix[3][3] = { {157, 64, 13},
{ 0, -16, 128},
{ 54, 42, -23} }
int *p = &matrix[0][0]
print_square_matrix(*matrix, 3);
return 0;
}

I never directly overflow an array. After 13 is printed, ptr points
to an element past the end of the first row, which is explicitly
allowed. That element happens to exist and is matrix[1][0]. So,
after the newline, the printf prints " 0 " and then *ptr is -16.
The question is "Am I missing something, or (p + 3) + 1 is ok and
points to -16, while p + (3 + 1) causes UB?"

[1] "sane" here means "excluding those who say int main(void)
{ return 0; } is conforming but not strictly conforming because an
implementation-defined form of the status successful termination is
returned to the host environment".
 
R

Richard Heathfield

Army1987 said:
Is this code legal (according to the strictest possible sane[1]
interpretation of the standard, regardless of wheter it does work
on all implementations in the known universe)?

Er, no. It doesn't compile. You missed a semicolon. But let's take a
closer look anyway...
#include <stdio.h>
void print_square_matrix(int *ptr, int order)
{
int i, j;
for (i = 0; i < order; i++) {
for (j = 0; j < order; j++)
printf("%3d ", *ptr++);
putchar('\n');
}
}

int main(void)
{
int matrix[3][3] = { {157, 64, 13},
{ 0, -16, 128},
{ 54, 42, -23} }
int *p = &matrix[0][0]

Here's where you missed your semicolon. Incidentally, what is p for?
print_square_matrix(*matrix, 3);

matrix has type int[3][3], which decays to int(*)[3] when used in a
value context. So *matrix has type int[3] - but you use this in a value
context too, so it becomes int *. That's legal to pass to
print_square_matrix, which takes int *. But in print_square_matrix, you
are, strictly speaking, breaking the rules. Why? Well, you pass a
pointer that points to the first element in an array of three ints, but
you roll through nine ints in your function. Therefore, you're
overstepping the bounds of an array. The fact that you know perfectly
well that you have six more ints immediately following is what has
tempted you to do this, and of course you are overwhelmingly likely to
get away with it, but I believe that the behaviour is nevertheless,
strictly speaking, undefined for the same reason that any array bounds
violation leads to undefined behaviour.

<snip>
 
A

Army1987

Richard Heathfield said:
Army1987 said: [snip]
int main(void)
{
int matrix[3][3] = { {157, 64, 13},
{ 0, -16, 128},
{ 54, 42, -23} }
int *p = &matrix[0][0]

Here's where you missed your semicolon. Incidentally, what is p for?

What's worse is that I missed *two* semicolons...
(p is for the example below.)

Suppose I have int *p, *q; p = &matrix[0][0]; q = &matrix[1][0];
p += 3;
Now p and q are equal. PROOF: p + 3 equals (int *)((unsigned char *)p + 3 *
sizeof *p);
q equals &*(matrix[1] + 0) = matrix[1] = *(matrix + 1) = (int *)((unsigned
char *)matrix + sizeof *matrix)
= (int *)((unsigned char *)matrix + sizeof (int [3])) = (int *)((unsiged
char *)matrix + 3 * sizeof(int))
But, IIUC, q++ now is legal and p++ isn't.
Right?
 
R

Richard Heathfield

Army1987 said:
Richard Heathfield said:
Army1987 said: [snip]
int main(void)
{
int matrix[3][3] = { {157, 64, 13},
{ 0, -16, 128},
{ 54, 42, -23} }
int *p = &matrix[0][0]

Here's where you missed your semicolon. Incidentally, what is p for?

What's worse is that I missed *two* semicolons...

So you did.
(p is for the example below.)

Suppose I have int *p, *q; p = &matrix[0][0]; q = &matrix[1][0];
p += 3;

Um, yeah, you can do that, and you're now pointing one beyond the end of
matrix[0], which is legal as long as you don't dereference.
Now p and q are equal.

I don't think you're allowed to compare them, are you? After all, one of
them is pointing into-or-one-past-the-end-of matrix[0], whereas the
other is pointing into-or-one-past-the-end-of matrix[1], and these are
different objects. (I must say I find the Standard rather silly at this
point, given that arrays are guaranteed to be contiguous.)
 
B

Bart van Ingen Schenau

Army1987 said:
Richard Heathfield said:
Army1987 said: [snip]
int main(void)
{
int matrix[3][3] = { {157, 64, 13},
{ 0, -16, 128},
{ 54, 42, -23} }
int *p = &matrix[0][0]

Here's where you missed your semicolon. Incidentally, what is p for?

What's worse is that I missed *two* semicolons...
(p is for the example below.)

Suppose I have int *p, *q; p = &matrix[0][0]; q = &matrix[1][0];
p += 3;
Now p and q are equal.

No, they are not guaranteed to be equal. This just happens to be the
case if the implementation does not keep bounds information in pointer
values.
Suppose I have an implementation that DOES store bounds information, the
array matrix is store in location 0x100, and we have 2-byte ints.
Initially we have:
p = &matrix[0][0]; /* valueof(p) = {.address=0x100, .low=0, .high=3} */
q = &matrix[1][0]; /* valueof(q) = {.address=0x106, .low=0, .high=3} */
p += 3; /* valueof(p) = {.address=0x106, .low=-3, .high=0} */

As you can see, p and q refer to the same address, but with different
bounds information. Therefor p != q.
PROOF: p + 3 equals (int *)((unsigned char *)p
+ 3 * sizeof *p);
q equals &*(matrix[1] + 0) = matrix[1] = *(matrix + 1) = (int
*)((unsigned char *)matrix + sizeof *matrix)
= (int *)((unsigned char *)matrix + sizeof (int [3])) = (int
*)((unsiged char *)matrix + 3 * sizeof(int))
But, IIUC, q++ now is legal and p++ isn't.
Right?

Your proof is not, as it assumes an implementation without bounds
information. The C standard does not make such an assumption.

Bart v Ingen Schenau
 
P

Pierre Asselin

Richard Heathfield said:
Army1987 said:
matrix has type int[3][3], which decays to int(*)[3] when used in a
value context. So *matrix has type int[3] - but you use this in a value
context too, so it becomes int *. That's legal to pass to
print_square_matrix, which takes int *. But in print_square_matrix, you
are, strictly speaking, breaking the rules. Why? Well, you pass a
pointer that points to the first element in an array of three ints, but
you roll through nine ints in your function. Therefore, you're
overstepping the bounds of an array.

Such a strict interpretation would cripple the use of C in numerical
computations. I don't think that was the intent, based in part on a
a similar example I noticed in the C99 Rationale v5.10, end of
section 6.7.5.3 .

<quote>
void g(double *ap, int n)
{
double (*a)[n] = (double (*)[n]) ap;
/* ... */ a[1][2] /* ... */
}

[ ... ] the function g can be called as in

{
double x[10][10];
g(&x[0][0], 10);
}
</quote>

The cited example has "&x[0][0]" as opposed to "*matrix" in the
orignal post, but that seems like a very subtle distinction. In
the Rationale code, "x[0][0]" is a mere scalar and "&x[0][0]" is
its address, but it is clear that g() is allowed to treat this
address as the start of an array of 100 doubles. (The example
goes on to show how to view said array as a two-dimensional object
by means of a VLA, but that's beside the point here.)

So in my view, the OP's code is de-facto valid C (C89 since he
doesn't use a VLA). If the OP needs to survive a code review, he
could write

print_square_matrix(&matrix[0][0], 3);

and cite the C99 rationale 6.7.5.3 in a comment.
 
E

Eric Sosman

Richard said:
Army1987 said:
Richard Heathfield said:
Army1987 said: [snip]
int main(void)
{
int matrix[3][3] = { {157, 64, 13},
{ 0, -16, 128},
{ 54, 42, -23} }
int *p = &matrix[0][0]
Here's where you missed your semicolon. Incidentally, what is p for?
What's worse is that I missed *two* semicolons...

So you did.
(p is for the example below.)

Suppose I have int *p, *q; p = &matrix[0][0]; q = &matrix[1][0];
p += 3;

Um, yeah, you can do that, and you're now pointing one beyond the end of
matrix[0], which is legal as long as you don't dereference.
Now p and q are equal.

I don't think you're allowed to compare them, are you? After all, one of
them is pointing into-or-one-past-the-end-of matrix[0], whereas the
other is pointing into-or-one-past-the-end-of matrix[1], and these are
different objects.

It's all right to compare them for equality or inequality,
using == or !=. p==q is true, but p-q is undefined behavior!
> (I must say I find the Standard rather silly at this
> point, given that arrays are guaranteed to be contiguous.)

I imagine that the Standard is trying to leave some leeway
for implementations that are able to check array bounds. At
least one implementation has done such checking, essentially by
giving each pointer value a "pedigree" describing the legitimate
bounds within which it points. When you derive a new pointer from
an old one (as in p+=3), the pedigree is carried along unchanged.
Thus, although p and q compare equal they have different pedigrees:
p is derived from matrix[0] and q from matrix[1], and the allowed
ranges of the two pointers are different even though their values
are identical.
 
J

Joe Wright

CBFalconer said:
Army1987 said:
Is this code legal (according to the strictest possible sane[1]
interpretation of the standard, regardless of wheter it does work
on all implementations in the known universe)?

#include <stdio.h>
void print_square_matrix(int *ptr, int order)
{
int i, j;
for (i = 0; i < order; i++) {
for (j = 0; j < order; j++)
printf("%3d ", *ptr++);
putchar('\n');
}
}

int main(void)
{
int matrix[3][3] = { {157, 64, 13},
{ 0, -16, 128},
{ 54, 42, -23} }
int *p = &matrix[0][0]
print_square_matrix(*matrix, 3);
return 0;
}

Looks OK to me.
Even without the semicolons?
 
A

Army1987

Pierre Asselin said:
Richard Heathfield said:
Army1987 said:
matrix has type int[3][3], which decays to int(*)[3] when used in a
value context. So *matrix has type int[3] - but you use this in a value
context too, so it becomes int *. That's legal to pass to
print_square_matrix, which takes int *. But in print_square_matrix, you
are, strictly speaking, breaking the rules. Why? Well, you pass a
pointer that points to the first element in an array of three ints, but
you roll through nine ints in your function. Therefore, you're
overstepping the bounds of an array.

Such a strict interpretation would cripple the use of C in numerical
computations. I don't think that was the intent, based in part on a
a similar example I noticed in the C99 Rationale v5.10, end of
section 6.7.5.3 .

<quote>
void g(double *ap, int n)
{
double (*a)[n] = (double (*)[n]) ap;
/* ... */ a[1][2] /* ... */
}

[ ... ] the function g can be called as in

{
double x[10][10];
g(&x[0][0], 10);
}
</quote>

The cited example has "&x[0][0]" as opposed to "*matrix" in the
orignal post, but that seems like a very subtle distinction. In
the Rationale code, "x[0][0]" is a mere scalar and "&x[0][0]" is
its address, but it is clear that g() is allowed to treat this
address as the start of an array of 100 doubles. (The example
goes on to show how to view said array as a two-dimensional object
by means of a VLA, but that's beside the point here.)

So in my view, the OP's code is de-facto valid C (C89 since he
doesn't use a VLA). If the OP needs to survive a code review, he
could write

print_square_matrix(&matrix[0][0], 3);

and cite the C99 rationale 6.7.5.3 in a comment.
Here the Standard quietly whispers what the behaviour is, but officially
says it is undefined. Annex J, under "Undefined behaviour":
An array subscript is out of range, even if an object is apparently
accessible with the

given subscript (as in the lvalue expression a[1][7] given the declaration
int

a[4][5]) (6.5.6).

(I wasn't aware that there has been an implementation which actually checks
array bounds, since I thought that would defeat the purpose why array
overflow was declared UB in the first place (i.e. speeding up run time by
trusting the program). But if there is one, I understand why even
overflowing the innermost array is UB.)
 
R

Richard Heathfield

Army1987 said:

Here the Standard quietly whispers what the behaviour is, but
officially says it is undefined. Annex J, under "Undefined behaviour":
An array subscript is out of range, even if an object is apparently
accessible with the

given subscript (as in the lvalue expression a[1][7] given the
declaration int

a[4][5]) (6.5.6).

(I wasn't aware that there has been an implementation which actually
checks array bounds, since I thought that would defeat the purpose why
array overflow was declared UB in the first place (i.e. speeding up
run time by trusting the program).

The point is that the implementor gets to *choose* - and may even pass
that choice on to the programmer, who may opt for bounds-checking
during development, switching it off for performance reasons once he's
satisfied that no bounds are being violated.
But if there is one, I understand
why even overflowing the innermost array is UB.)

Right.
 
D

David Thompson

Such a strict interpretation would cripple the use of C in numerical
computations. I don't think that was the intent, based in part on a

I mostly agree that preventing this would be bad for numerics, but I
believe it was indeed the intent to allow that. C wasn't originally
intended for numerics and for quite a while couldn't dream of
displacing Fortran. Only fairly recently has it even made an effort,
and only in C99 were some features standardized (complex/imag,
restrict, more mathlib and generics) to help this, but C99 did NOT
change the pointer arithmetic and hence subscripting rules. In
practice all reasonable implementations do allow it, but I think it's
clear the standard writers were careful not to actually require it.
a similar example I noticed in the C99 Rationale v5.10, end of
section 6.7.5.3 .

<quote>
void g(double *ap, int n)
{
double (*a)[n] = (double (*)[n]) ap;
/* ... */ a[1][2] /* ... */
}

[ ... ] the function g can be called as in

{
double x[10][10];
g(&x[0][0], 10);
}
</quote>

The cited example has "&x[0][0]" as opposed to "*matrix" in the
orignal post, but that seems like a very subtle distinction. In
the Rationale code, "x[0][0]" is a mere scalar and "&x[0][0]" is
its address, but it is clear that g() is allowed to treat this
address as the start of an array of 100 doubles. (The example
goes on to show how to view said array as a two-dimensional object
by means of a VLA, but that's beside the point here.)
But this example doesn't help your argument. It very carefully does
NOT access the 2D as 1D; the temporary 1D pointer is carefully
described as just a method to get around the lexical ordering of
parameter declaration to get from one 2D pointer to another
(correctly-sized) 2D pointer which is actually used. And is better
style in C99 because it more accurately shows in the source the actual
intent of what you are doing.

(Aside from the Rationale not being normative anyway.)
So in my view, the OP's code is de-facto valid C (C89 since he
doesn't use a VLA). If the OP needs to survive a code review, he
could write

print_square_matrix(&matrix[0][0], 3);

and cite the C99 rationale 6.7.5.3 in a comment.

I would just not cite anything and rely on the fact that 'everyone
knows' 2D arrays are contiguous and you will not in practice run into
any problems -- and if you do, it will be on an implementation so
weird that plenty of other things will be broken as well and this one
will get lost in the noise.

- formerly david.thompson1 || achar(64) || worldnet.att.net
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,262
Messages
2,571,043
Members
48,769
Latest member
Clifft

Latest Threads

Top