multi dimensional arrays as one dimension array

V

vippstar

The subject might be misleading.
Regardless, is this code valid:

#include <stdio.h>

void f(double *p, size_t size) { while(size--) printf("%f\n", *p++); }
int main(void) {
double array[2][1] = { { 3.14 }, { 42.6 } };
f((double *)array, sizeof array / sizeof **array);
return 0;
}

Assuming casting double [2][1] to double * is implementation defined
or undefined behavior, replace the cast with (void *).

Since arrays are not allowed to have padding bytes in between of
elements, it seems valid to me.
 
V

vippstar

The subject might be misleading.
Regardless, is this code valid:
#include <stdio.h>
void f(double *p, size_t size) { while(size--) printf("%f\n", *p++); }
int main(void) {
double array[2][1] = { { 3.14 }, { 42.6 } };
f((double *)array, sizeof array / sizeof **array);
return 0;
}
Assuming casting double [2][1] to double * is implementation defined
or undefined behavior, replace the cast with (void *).
Since arrays are not allowed to have padding bytes in between of
elements, it seems valid to me.

Stepping through a one dimensional array
and on through a consecutive one dimensional array
is only defined for elements of character type
and only because
any object can be treated as an array of character type.

So as I understand it you are saying my code invokes undefined
behavior.
In which hypothetical (or real, if such implementation does exist)
implementation my code won't work, and why?
 
B

Barry Schwarz

The subject might be misleading.
Regardless, is this code valid:

#include <stdio.h>

void f(double *p, size_t size) { while(size--) printf("%f\n", *p++); }
int main(void) {
double array[2][1] = { { 3.14 }, { 42.6 } };
f((double *)array, sizeof array / sizeof **array);
return 0;

}

Assuming casting double [2][1] to double * is implementation defined
or undefined behavior, replace the cast with (void *).

Changing the cast to void* only creates more work for the compiler.
Unless it is smart enough to optimize away the first step, it must
first convert array (which in this context is identical to &array[0]
and has type "double (*)[1]") to void* to satisfy the cast and then
convert that to double* to satisfy the prototype.

The only issue about pointer conversions is if the value is properly
aligned for the resulting type and the value of the expression array
is guaranteed to be properly aligned for a double* since the value
(disregarding type) is exactly the same as &array[0][0]..
Since arrays are not allowed to have padding bytes in between of
elements, it seems valid to me.

If your system has built in hardware assist for bounds checking, it
would be reasonable for the "bounds registers" to contain the start
and end addresses of array[0]. Eventually your p++ would be outside
this range (even though it is still within array as a whole). While
this is a perfectly valid value attempts to dereference it should be
trapped by the bounds checking logic in the hardware.

Whether such a system exists is a practical issue
 
J

James Kuyper

The subject might be misleading.
Regardless, is this code valid:
#include <stdio.h>
void f(double *p, size_t size) { while(size--) printf("%f\n", *p++); }
int main(void) {
double array[2][1] = { { 3.14 }, { 42.6 } };
f((double *)array, sizeof array / sizeof **array);
return 0;
}
Assuming casting double [2][1] to double * is implementation defined
or undefined behavior, replace the cast with (void *).
Since arrays are not allowed to have padding bytes in between of
elements, it seems valid to me.
Stepping through a one dimensional array
and on through a consecutive one dimensional array
is only defined for elements of character type
and only because
any object can be treated as an array of character type.

So as I understand it you are saying my code invokes undefined
behavior.
In which hypothetical (or real, if such implementation does exist)
implementation my code won't work, and why?

The key point is the pointer conversion. At the point where that
conversion occurs, the compiler knows that (double*)array == array[0].
It's undefined if any number greater than 1 is added to that pointer
value, and also if that pointer is dereferenced after adding 1 to it.

Because the behavior is undefined, it's possible for a conforming
implementation to provide a run-time bounds check. This would require
the use of some mechanism such as a "fat" pointer, which contains not
only the information about where it points, but also information about
the upper and lower limits on where it can point. The implementation can
set the limits on the pointer value created by that conversion, based
upon the limits for array[0].

Implementations with support for run-time bounds checking are rare; the
performance cost is significant, it's almost always optional, and
usually not the default option, so you're unlikely to run into this
issue by accident.

The more tricky possibility is also something you're far more likely to
run into. Because the behavior is undefined in those cases, a compiler
is allowed to generate code which is optimized in a way that assumes
that those cases don't actually come up; because it makes that
assumption, such code can fail catastrophically if that assumption is
violated.

This is most likely to occur as the result of anti-aliasing assumptions.
The compiler is allowed to generate code which assumes that array[0]
is never an alias for array[1][j], regardless of the values of i and j.
As a result, it can drop anti-aliasing checks from code that would need
such checks if that assumption could be violated without making the
behavior undefined.

I don't see any way for that to come up in this particular example, but
in more complicated cases it can come up. If 'p' and 'array' were both
in scope at the same place somewhere in the code, the compiler would not
be required to prevent the problems that might occur if p[j] and
array[1] happened to refer to the same location in memory. It is
required to cover the possibility that p[j] and array[0] refer to the
same object, but only if 'i' and 'j' are both 0. It's also required to
handle the possibility that p+j refers to the same location as
array[0], but only if 'i' and 'j' are both in the range [0,1].
 
V

vippstar

On Aug 29, 6:15 am, (e-mail address removed) wrote:
Message-ID: <38b37f10-252e-48f8-bffa-
(e-mail address removed)>

Thanks everyone.
 
J

James Tursa

Stepping through a one dimensional array
and on through a consecutive one dimensional array
is only defined for elements of character type
and only because
any object can be treated as an array of character type.

I am trying to understand your answer. Are you saying that the
original code will not necessarily work in a conforming compiler
because there is no guarantee in the standard that the row slices will
be exactly next to each other in memory (i.e., there may be padding
added to each row that may not be a multiple of sizeof(double))? But
at the same time if double was replaced with char, i.e.,

#include <stdio.h>
void f(char *p, size_t size) { while(size--) printf("%c\n", *p++); }
int main(void) {
char array[2][1] = { { 'a' }, { 'b' } };
f((char *)array, sizeof array / sizeof **array);
return 0;
}

then for this particular case the row slices are required by the
standard to be next to each other in memory so the individual stepping
will work in the called function? Or are you saying that for the char
case there may still be padding between the row slices but the
individual stepping will work because the padding will always be a
multiple of sizeof(char) (i.e., 1), and that the stepping in the
called function will just include the padded characters if they are
present?

i.e., in either case the called function may not be doing what you
intended if there is padded memory present between the rows, but in
the case of double or other non-character type it may even bomb.

Do I understand your answer correctly?

James Tursa
 
J

James Tursa

The key point is the pointer conversion. At the point where that
conversion occurs, the compiler knows that (double*)array == array[0].
It's undefined if any number greater than 1 is added to that pointer
value, and also if that pointer is dereferenced after adding 1 to it.

Trying to understand your answer as it relates to the original post. I
don't see how the original function gets an address 2 beyond the end,
or 1 beyond the end and attempts to dereference it, as you seem to be
saying. Can you point this out? Did I misunderstand you?

James Tursa
 
V

vippstar

Stepping through a one dimensional array
and on through a consecutive one dimensional array
is only defined for elements of character type
and only because
any object can be treated as an array of character type.

I am trying to understand your answer. Are you saying that the
original code will not necessarily work in a conforming compiler
because there is no guarantee in the standard that the row slices will
be exactly next to each other in memory (i.e., there may be padding
added to each row that may not be a multiple of sizeof(double))? But
at the same time if double was replaced with char, i.e.,

#include <stdio.h>
void f(char *p, size_t size) { while(size--) printf("%c\n", *p++); }
int main(void) {
char array[2][1] = { { 'a' }, { 'b' } };
f((char *)array, sizeof array / sizeof **array);
return 0;

}

then for this particular case the row slices are required by the
standard to be next to each other in memory so the individual stepping
will work in the called function? Or are you saying that for the char
case there may still be padding between the row slices but the
individual stepping will work because the padding will always be a
multiple of sizeof(char) (i.e., 1), and that the stepping in the
called function will just include the padded characters if they are
present?

i.e., in either case the called function may not be doing what you
intended if there is padded memory present between the rows, but in
the case of double or other non-character type it may even bomb.

Do I understand your answer correctly?

That code of yours invokes undefined behavior, if char is signed.
You have to change the type of p to unsigned char.
Also, it would only be meaningful if you did not divide by sizeof
**array.

'pete' did not really answer my question. Instead he spoke for object
representations.
What pete really meant is that you can treat any pointer to object as
an array of unsigned char, to observe its representation.
 
K

Keith Thompson

James Tursa said:
I am trying to understand your answer. Are you saying that the
original code will not necessarily work in a conforming compiler
because there is no guarantee in the standard that the row slices will
be exactly next to each other in memory (i.e., there may be padding
added to each row that may not be a multiple of sizeof(double))?

No. There cannot be padding between array elements; in particular,
given:

double arr[10][10];

the size of arr is guaranteed to be exactly 100*sizeof(double).

Padding isn't the issue. The issue is that the standard doesn't
require implementations to support indexing past the end of an array.
So if I write

arr[0][15]

I'm trying to refer to an element of arr[0] that doesn't exist.
There's a valid object, accessible as arr[1][5], at the intended
location in memory -- and *most* C compilers will let you access that
object either as arr[0][15] or as arr[1][5]. But arr[1][5] is
guaranteed to work, and arr[0][15] isn't, because it attempts to index
beyond the end of the double[10] array arr[0].

In other words, implementations are allowed, but not required, to
perform bounds checking.
 
J

James Tursa

Stepping through a one dimensional array
and on through a consecutive one dimensional array
is only defined for elements of character type
and only because
any object can be treated as an array of character type.

I am trying to understand your answer. Are you saying that the
original code will not necessarily work in a conforming compiler
because there is no guarantee in the standard that the row slices will
be exactly next to each other in memory (i.e., there may be padding
added to each row that may not be a multiple of sizeof(double))? But
at the same time if double was replaced with char, i.e.,

#include <stdio.h>
void f(char *p, size_t size) { while(size--) printf("%c\n", *p++); }
int main(void) {
char array[2][1] = { { 'a' }, { 'b' } };
f((char *)array, sizeof array / sizeof **array);
return 0;

}

then for this particular case the row slices are required by the
standard to be next to each other in memory so the individual stepping
will work in the called function? Or are you saying that for the char
case there may still be padding between the row slices but the
individual stepping will work because the padding will always be a
multiple of sizeof(char) (i.e., 1), and that the stepping in the
called function will just include the padded characters if they are
present?

i.e., in either case the called function may not be doing what you
intended if there is padded memory present between the rows, but in
the case of double or other non-character type it may even bomb.

Do I understand your answer correctly?

That code of yours invokes undefined behavior, if char is signed.
You have to change the type of p to unsigned char.
Also, it would only be meaningful if you did not divide by sizeof
**array.

What don't you like about sizeof **array ?
'pete' did not really answer my question. Instead he spoke for object
representations.
What pete really meant is that you can treat any pointer to object as
an array of unsigned char, to observe its representation.

OK, that's fine for objects, but that doesn't answer my question. What
is it about 2-dimensional (or multi-dimensional) arrays of double that
does not allow them to be stepped through with a double* ? And
ultimately, I would also ask if it is safe/conforming to use memcpy or
the like to copy values from/to such an array wholesale. e.g., is it
OK to have the following and be guaranteed to get all of the values
copied correctly and get at them with dp[0], dp[1], etc.:

double x[2][3];
double *dp;
dp = malloc(6*sizeof(double));
(some code to fill in values of x)
memcpy(dp,x,6*sizeof(double));

James Tursa
 
V

vippstar

Stepping through a one dimensional array
and on through a consecutive one dimensional array
is only defined for elements of character type
and only because
any object can be treated as an array of character type.
I am trying to understand your answer. Are you saying that the
original code will not necessarily work in a conforming compiler
because there is no guarantee in the standard that the row slices will
be exactly next to each other in memory (i.e., there may be padding
added to each row that may not be a multiple of sizeof(double))? But
at the same time if double was replaced with char, i.e.,
#include <stdio.h>
void f(char *p, size_t size) { while(size--) printf("%c\n", *p++); }
int main(void) {
char array[2][1] = { { 'a' }, { 'b' } };
f((char *)array, sizeof array / sizeof **array);
return 0;
}
then for this particular case the row slices are required by the
standard to be next to each other in memory so the individual stepping
will work in the called function? Or are you saying that for the char
case there may still be padding between the row slices but the
individual stepping will work because the padding will always be a
multiple of sizeof(char) (i.e., 1), and that the stepping in the
called function will just include the padded characters if they are
present?
i.e., in either case the called function may not be doing what you
intended if there is padded memory present between the rows, but in
the case of double or other non-character type it may even bomb.
Do I understand your answer correctly?
That code of yours invokes undefined behavior, if char is signed.
You have to change the type of p to unsigned char.
Also, it would only be meaningful if you did not divide by sizeof
**array.

What don't you like about sizeof **array ?

The object representation of an object is the unsigned char array[0]
to [sizeof object - 1]
If you divide with sizeof **array, you won't get all of the
representation. (unless sizeof(double) == 1)
'pete' did not really answer my question. Instead he spoke for object
representations.
What pete really meant is that you can treat any pointer to object as
an array of unsigned char, to observe its representation.

OK, that's fine for objects, but that doesn't answer my question. What
is it about 2-dimensional (or multi-dimensional) arrays of double that
does not allow them to be stepped through with a double* ? And
ultimately, I would also ask if it is safe/conforming to use memcpy or
the like to copy values from/to such an array wholesale. e.g., is it
OK to have the following and be guaranteed to get all of the values
copied correctly and get at them with dp[0], dp[1], etc.:

double x[2][3];
double *dp;
dp = malloc(6*sizeof(double));
(some code to fill in values of x)
memcpy(dp,x,6*sizeof(double));

mem* uses unsigned char. What is wrong is explained in the previous
posts.
 
H

Harald van Dijk

OK, that's fine for objects, but that doesn't answer my question. What
is it about 2-dimensional (or multi-dimensional) arrays of double that
does not allow them to be stepped through with a double* ?

The fact that double[2][3] doesn't have elements such as x[0][5]. There
must be a valid double, 5*sizeof(double) bytes into x. However, x[0][5]
doesn't mean just that. x[0][5] (or ((double*)x)[5]) means you're looking
5*sizeof(double) bytes into x[0]. x[0] doesn't have that many elements.

The machine will almost certainly let you get away with it, unless the
compiler specifically inserts instructions to stop this (bounds checking
implementations, as has been mentioned). The optimiser is less likely to.
The optimiser may assume, for example, that storing a value in x[0][5]
won't alter the value of x[1][2], or vice versa, and may re-order code
based on that assumption. If I recall correctly, there are situations
where at least gcc does this.
And
ultimately, I would also ask if it is safe/conforming to use memcpy or
the like to copy values from/to such an array wholesale. e.g., is it OK
to have the following and be guaranteed to get all of the values copied
correctly and get at them with dp[0], dp[1], etc.:

double x[2][3];
double *dp;
dp = malloc(6*sizeof(double));
(some code to fill in values of x)
memcpy(dp,x,6*sizeof(double));

That should be fine. memcpy doesn't try to access x[0][5] in the same way
that the expression x[0][5] would, and the way memcpy does it is allowed.
 
J

James Tursa

James Tursa said:
I am trying to understand your answer. Are you saying that the
original code will not necessarily work in a conforming compiler
because there is no guarantee in the standard that the row slices will
be exactly next to each other in memory (i.e., there may be padding
added to each row that may not be a multiple of sizeof(double))?

No. There cannot be padding between array elements; in particular,
given:

double arr[10][10];

the size of arr is guaranteed to be exactly 100*sizeof(double).

Padding isn't the issue.

Well, I didn't really believe that padding was an issue but that's
what seem to be implied by the response.
The issue is that the standard doesn't
require implementations to support indexing past the end of an array.
So if I write

arr[0][15]

I'm trying to refer to an element of arr[0] that doesn't exist.
There's a valid object, accessible as arr[1][5], at the intended
location in memory -- and *most* C compilers will let you access that
object either as arr[0][15] or as arr[1][5]. But arr[1][5] is
guaranteed to work, and arr[0][15] isn't, because it attempts to index
beyond the end of the double[10] array arr[0].

In other words, implementations are allowed, but not required, to
perform bounds checking.

Well, I am still trying to understand how that argument applies to the
original OP posted code. Your argument is based on using arr directly.
But you seem to be saying that OP can't do this:

double *dp = (double *) arr;

and then traverse the entire array using dp. Is that what you are
saying?

James Tursa
 
J

James Tursa

OK, that's fine for objects, but that doesn't answer my question. What
is it about 2-dimensional (or multi-dimensional) arrays of double that
does not allow them to be stepped through with a double* ?

The fact that double[2][3] doesn't have elements such as x[0][5]. There
must be a valid double, 5*sizeof(double) bytes into x. However, x[0][5]
doesn't mean just that. x[0][5] (or ((double*)x)[5]) means you're looking
5*sizeof(double) bytes into x[0]. x[0] doesn't have that many elements.

So you are saying that x[0][5] means exactly the same thing to the
compiler as ((double*)x)[5] ? I thought x[0] would be an array of 3
doubles, whereas (double*)x would be a plain double*. These are not
the same to me, and I would think also not the same to the compiler. I
guess I remain unconvinced that using the (double *)x method invokes
undefined behavior as long as the underlying data is contiguous. You
are telling the compiler exactly what behavior you want from the
pointer, aren't you?. I don't get it yet.
The optimiser may assume, for example, that storing a value in x[0][5]
won't alter the value of x[1][2], or vice versa, and may re-order code
based on that assumption. If I recall correctly, there are situations
where at least gcc does this.

Now that's an interesting side effect I hadn't thought of.

James Tursa
 
J

James Tursa

double x[2][3];
double *dp;
dp = malloc(6*sizeof(double));
(some code to fill in values of x)
memcpy(dp,x,6*sizeof(double));

mem* uses unsigned char. What is wrong is explained in the previous
posts.

See Harald's posted reply ... he thinks this is OK, and I tend to
agree with him.

James Tursa
 
K

Keith Thompson

James Tursa said:
[...]
No. There cannot be padding between array elements; in particular,
given:

double arr[10][10];

the size of arr is guaranteed to be exactly 100*sizeof(double).

Padding isn't the issue.

Well, I didn't really believe that padding was an issue but that's
what seem to be implied by the response.
The issue is that the standard doesn't
require implementations to support indexing past the end of an array.
So if I write

arr[0][15]

I'm trying to refer to an element of arr[0] that doesn't exist.
There's a valid object, accessible as arr[1][5], at the intended
location in memory -- and *most* C compilers will let you access that
object either as arr[0][15] or as arr[1][5]. But arr[1][5] is
guaranteed to work, and arr[0][15] isn't, because it attempts to index
beyond the end of the double[10] array arr[0].

In other words, implementations are allowed, but not required, to
perform bounds checking.

Well, I am still trying to understand how that argument applies to the
original OP posted code. Your argument is based on using arr directly.
But you seem to be saying that OP can't do this:

double *dp = (double *) arr;

and then traverse the entire array using dp. Is that what you are
saying?

Close. I'm saying that you most likely *can* get away with that
(treating an array of array of double as if it were an array of
double), but the standard doesn't require an implementation to make it
work. The most likely ways it can fail are if an implementation
performs run-time or compile-time bounds checking, or if an optimizer
assumes (as it's permitted to do) that you're not doing something like
this, causing the generated code not to do what you expected it to do.

Harald's explanation elsewhere in this thread makes the point more
clearly than I did, I think:

The fact that double[2][3] doesn't have elements such as
x[0][5]. There must be a valid double, 5*sizeof(double) bytes into
x. However, x[0][5] doesn't mean just that. x[0][5] (or
((double*)x)[5]) means you're looking 5*sizeof(double) bytes into
x[0]. x[0] doesn't have that many elements.
 
K

Keith Thompson

James Tursa said:
[...]
No. There cannot be padding between array elements; in particular,
given:

double arr[10][10];

the size of arr is guaranteed to be exactly 100*sizeof(double).

Padding isn't the issue.

Well, I didn't really believe that padding was an issue but that's
what seem to be implied by the response.
The issue is that the standard doesn't
require implementations to support indexing past the end of an array.
So if I write

arr[0][15]

I'm trying to refer to an element of arr[0] that doesn't exist.
There's a valid object, accessible as arr[1][5], at the intended
location in memory -- and *most* C compilers will let you access that
object either as arr[0][15] or as arr[1][5]. But arr[1][5] is
guaranteed to work, and arr[0][15] isn't, because it attempts to index
beyond the end of the double[10] array arr[0].

In other words, implementations are allowed, but not required, to
perform bounds checking.

Well, I am still trying to understand how that argument applies to the
original OP posted code. Your argument is based on using arr directly.
But you seem to be saying that OP can't do this:

double *dp = (double *) arr;

and then traverse the entire array using dp. Is that what you are
saying?

Close. I'm saying that you most likely *can* get away with that
(treating an array of array of double as if it were an array of
double), but the standard doesn't require an implementation to make it
work. The most likely ways it can fail are if an implementation
performs run-time or compile-time bounds checking, or if an optimizer
assumes (as it's permitted to do) that you're not doing something like
this, causing the generated code not to do what you expected it to do.

Harald's explanation elsewhere in this thread makes the point more
clearly than I did, I think:

The fact that double[2][3] doesn't have elements such as
x[0][5]. There must be a valid double, 5*sizeof(double) bytes into
x. However, x[0][5] doesn't mean just that. x[0][5] (or
((double*)x)[5]) means you're looking 5*sizeof(double) bytes into
x[0]. x[0] doesn't have that many elements.
 
H

Harald van Dijk

So you are saying that x[0][5] means exactly the same thing to the
compiler as ((double*)x)[5] ?

Yes, because (double*)x is a pointer to x[0][0].

(Actually, since the behaviour is undefined, it is allowed and
realistically possible for the compiler to treat the two differently, for
many possible reasons, but I'm not aware of any relevant specific details.)
I thought x[0] would be an array of 3
doubles, whereas (double*)x would be a plain double*.

Well, (double*)x is a plain double*, just not one into the whole array.
The only way I can think of to get overlapping arrays in the way you're
looking for is by using a union:

union {
double singledim[6];
double multidim[2][3];
} x;

but this is only possible if you know the length of the array at compile
time. Do you?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top