Can this code invoke Undefined Behaviour?

S

Sharath A.V

I had an argument with someone on wheather this piece of code can
invoke undefined bahaviour.
I think it does not invoke any undefined behaviour since there is
sufficient memory space of 9 integer elements starting from the in the
address passed, but the other person insisted that it would invoke
undefined behaviour(for whatever reasons he had).


void fill(int *p)
{
for(unsigned j=0;j<9;++j)
p[j]=0;
}

int main()
{
int arr[3][3];
fill(&arr[0][0]);
}


So please let me know if this code invokes undefined behaviour and why?

Sharath A.V
 
P

Phlip

Sharath said:
void fill(int *p)
{
for(unsigned j=0;j<9;++j)
p[j]=0;
}

int main()
{
int arr[3][3];
fill(&arr[0][0]);
}

I'm going to guess no. But don't do it anyway.

I suspect that the contiguity of arrays of arrays is well-defined. The
expression arr[3] is synonymous with *(arr + 3), hence arr[2][2] is *(*(arr
+ 2) + 2), which algebraically converts to *(*arr + 2 * 3 + 2).

Now the question is how well-formed and well-defined my transformation is.
Then, is it the same question, or a proof of the answer?
 
V

Victor Bazarov

Sharath said:
I had an argument with someone on wheather this piece of code can
invoke undefined bahaviour.
I think it does not invoke any undefined behaviour since there is
sufficient memory space of 9 integer elements starting from the in
the address passed, but the other person insisted that it would invoke
undefined behaviour(for whatever reasons he had).


void fill(int *p)
{
for(unsigned j=0;j<9;++j)
p[j]=0;
}

int main()
{
int arr[3][3];
fill(&arr[0][0]);
}


So please let me know if this code invokes undefined behaviour and
why?

AFAIK, the Standard guarantees that in an array all elements are placed
one after another, without gaps. That's true for multi-dimensional arrays
as well. So, int arr[3][3] consists of 3x3 (9) tightly-packed elements of
type 'int', and treating them as a single array of 9 ints is OK.

V
 
S

ScorpionChief

Sharath said:
I had an argument with someone on wheather this piece of code can
invoke undefined bahaviour.
I think it does not invoke any undefined behaviour since there is
sufficient memory space of 9 integer elements starting from the in the
address passed, but the other person insisted that it would invoke
undefined behaviour(for whatever reasons he had).


void fill(int *p)
{
for(unsigned j=0;j<9;++j)
p[j]=0;
}

int main()
{
int arr[3][3];
fill(&arr[0][0]);
}


So please let me know if this code invokes undefined behaviour and why?

Sharath A.V

I think this code might throw an exception (or something like that) on
a bound checked implementation.

The relevant C++ standard paragraph is 5.7-5
[expr.add]

"If both the pointer operand and the result point to elements of the
same array object, or one past the last element of the array object,
the evaluation shall not produce an overflow; otherwise, the behavior
is undefined."

First, p[4] is equivalent to *(p+4). The type of the pointer is int*.
The type of the element pointed-to by the pointer is int

It is clear, that:
"If both the pointer operand and the result point to elements of the
same array object."
The pointer points to an *element* of an array object. The pointer
points to int, and thus the array object is of type int[3].

The array is not an int[3][3], otherwise it would not make sense... an
int* pointing to an int[3]...So the array object to which it refers is
the first int[3] object of this array of 3 int[3].

And, p+1 is ok, p+2 is ok, p+3 is ok (pointer one past the last
element), but p+4 has undefined behavior.
If the implementation is bound checked (and the WG21 deliberately tried
to allow such implementation) it might throw a runtime exception.
A bound checked implementation would simply use pointers containing,
not only the raw memory address, but also information about the bounds
of the array directly this element.

Similarly, even if you have an implementation where the ABI says that:
struct X {
int x;
int y;
int z;
} s;

x, y and z are contiguous into memory.

It doesn't mean that you'll be able to do (&s.x)[2] to access s.z

Indeed, the standard says (5.7-4 [expr.add])
"
4 For the purposes of these operators, a pointer to a nonarray
object behaves the same as a pointer to the first element of an
array of length one with the type of the object as its element type."

Well, here it is very clear that &s.x is a pointer to a nonarray
object, and thus, a bound checked implementation will use, as bound
info, that it lives in an array of size 1.
So, (&s.x)+1 is still ok... But with a bound checked implementation,
(&s.x)+1 != s.y (the bound checked implementation will see that they
don't live in the "same array"). And, (&s.x)+2 has UB.

Well, I think that the problem is similar with int[3][3].
With a bound checked implementation, &arr[3] should be different from
&arr[1][0]
For example, if the underlying implementation of pointers is:
struct {raw_pointer start, ptr, end;};
start and end will be different for &arr[3] than for &arr[1][0]
In a sense, they both live in different memory spaces.
 
S

Sharath A.V

Sharath said:
I had an argument with someone on wheather this piece of code can
invoke undefined bahaviour.
I think it does not invoke any undefined behaviour since there is
sufficient memory space of 9 integer elements starting from the in the
address passed, but the other person insisted that it would invoke
undefined behaviour(for whatever reasons he had).


void fill(int *p)
{
for(unsigned j=0;j<9;++j)
p[j]=0;
}

int main()
{
int arr[3][3];
fill(&arr[0][0]);
}


So please let me know if this code invokes undefined behaviour and why?

Sharath A.V

The other person I was referring to is (e-mail address removed) who is
now in this thread and given this below reply:

I think this code might throw an exception (or something like that) on
a bound checked implementation.

The relevant C++ standard paragraph is 5.7-5
[expr.add]

"If both the pointer operand and the result point to elements of the
same array object, or one past the last element of the array object,
the evaluation shall not produce an overflow; otherwise, the behavior
is undefined."

First, p[4] is equivalent to *(p+4). The type of the pointer is int*.
The type of the element pointed-to by the pointer is int

It is clear, that:
"If both the pointer operand and the result point to elements of the
same array object."
The pointer points to an *element* of an array object. The pointer
points to int, and thus the array object is of type int[3].

The array is not an int[3][3], otherwise it would not make sense... an
int* pointing to an int[3]...So the array object to which it refers is
the first int[3] object of this array of 3 int[3].

And, p+1 is ok, p+2 is ok, p+3 is ok (pointer one past the last
element), but p+4 has undefined behavior.
If the implementation is bound checked (and the WG21 deliberately tried
to allow such implementation) it might throw a runtime exception.
A bound checked implementation would simply use pointers containing,
not only the raw memory address, but also information about the bounds
of the array directly this element.

Similarly, even if you have an implementation where the ABI says that:
struct X {
int x;
int y;
int z;
} s;

x, y and z are contiguous into memory.

It doesn't mean that you'll be able to do (&s.x)[2] to access s.z

Indeed, the standard says (5.7-4 [expr.add])
"
4 For the purposes of these operators, a pointer to a nonarray
object behaves the same as a pointer to the first element of an
array of length one with the type of the object as its element type."

Well, here it is very clear that &s.x is a pointer to a nonarray
object, and thus, a bound checked implementation will use, as bound
info, that it lives in an array of size 1.
So, (&s.x)+1 is still ok... But with a bound checked implementation,
(&s.x)+1 != s.y (the bound checked implementation will see that they
don't live in the "same array"). And, (&s.x)+2 has UB.

Well, I think that the problem is similar with int[3][3].
With a bound checked implementation, &arr[3] should be different from
&arr[1][0]
For example, if the underlying implementation of pointers is:
struct {raw_pointer start, ptr, end;};
start and end will be different for &arr[3] than for &arr[1][0]
In a sense, they both live in different memory spaces.
 
O

Old Wolf

Sharath said:
I had an argument with someone on wheather this piece of code can
invoke undefined bahaviour.

void fill(int *p)
{
for(unsigned j=0;j<9;++j)
p[j]=0;
}

int main()
{
int arr[3][3];
fill(&arr[0][0]);
}

This has been discussed many times on comp.lang.c (without
a solid conclusion). Some people think that it is absurd for it to
fail, others (such as me and ScorpionCh) think that it is a
bounds exception: arr[0] is an array of 3 ints, so if you take a
pointer to its first element and increment it 4 times, you have
overflowed arr[0].

However we all would agree that this code is fine:
fill( (int *)&arr );
 
J

Jack Klein

I had an argument with someone on wheather this piece of code can
invoke undefined bahaviour.
I think it does not invoke any undefined behaviour since there is
sufficient memory space of 9 integer elements starting from the in the
address passed, but the other person insisted that it would invoke
undefined behaviour(for whatever reasons he had).


void fill(int *p)
{
for(unsigned j=0;j<9;++j)
p[j]=0;
}

int main()
{
int arr[3][3];
fill(&arr[0][0]);
}


So please let me know if this code invokes undefined behaviour and why?

Sharath A.V

It is as simple as this: the standards for both C and C++ make it
illegal to construct a pointer more than one past the end of an array,
and to dereference a pointer more than zero past the end of an array.

I do not know of one single implementation where this will fail, but
if there is such an implementation somewhere that, for example, checks
array bounds at run time and traps and terminates your program, I
would be as surprised as you. But you will have no grounds, at all,
under the C++ language standard to complain, as the result does not
violate the standard. Since it is directly stated to be undefined
behavior, the standard places no requirements at all on the results.
 
P

Phlip

Jack said:
It is as simple as this: the standards for both C and C++ make it
illegal to construct a pointer more than one past the end of an array,
and to dereference a pointer more than zero past the end of an array.

Exactly. Therefore the pointer to arr[2][2], no matter how you get it, is
legal. It could be from &arr[0][0] + 2*3 + 2. It points into the array,
hence it is legal.
 
?

=?ISO-8859-15?Q?Juli=E1n?= Albo

Old said:
Sharath said:
I had an argument with someone on wheather this piece of code can
invoke undefined bahaviour.

void fill(int *p)
{
for(unsigned j=0;j<9;++j)
p[j]=0;
}

int main()
{
int arr[3][3];
fill(&arr[0][0]);
}

This has been discussed many times on comp.lang.c (without
a solid conclusion). Some people think that it is absurd for it to
fail, others (such as me and ScorpionCh) think that it is a
bounds exception: arr[0] is an array of 3 ints, so if you take a
pointer to its first element and increment it 4 times, you have
overflowed arr[0].

But & arr [0] [0] is a pointer to int, not an array of 3 int. How can fill
check for bounds that doesn't know? From a practical point of view, to
handle this type of things as errors, exceptions or even emit a meaningful
warning, will require that the compiler or the runtime code saves a bunch
of information under the hood and check it in many places. From a standard
point of view, I think the best solution will be that the pointer
arithmetic rules be rewritten to explicitly mention arrays of arrays.
 
K

Kai-Uwe Bux

Julián Albo said:
Old said:
Sharath said:
I had an argument with someone on wheather this piece of code can
invoke undefined bahaviour.

void fill(int *p)
{
for(unsigned j=0;j<9;++j)
p[j]=0;
}

int main()
{
int arr[3][3];
fill(&arr[0][0]);
}

This has been discussed many times on comp.lang.c (without
a solid conclusion). Some people think that it is absurd for it to
fail, others (such as me and ScorpionCh) think that it is a
bounds exception: arr[0] is an array of 3 ints, so if you take a
pointer to its first element and increment it 4 times, you have
overflowed arr[0].

But & arr [0] [0] is a pointer to int, not an array of 3 int.

True, it is the pointer to the first int in an array of three, namely
arr[0].
How can fill check for bounds that doesn't know?

Why wouldn't it know? The standard certainly allows pointers to be
implemented in much fancier a way than we imagine. For instance, a pointer
object could consist of an address and a type id and a the latter could
indicate that this partcular int it's is the start of an array of 3.
From a practical point of view, to
handle this type of things as errors, exceptions or even emit a meaningful
warning, will require that the compiler or the runtime code saves a bunch
of information under the hood and check it in many places.

True. But, I see that we agree it is possible.
From a standard
point of view, I think the best solution will be that the pointer
arithmetic rules be rewritten to explicitly mention arrays of arrays.

Maybe, but as of know, it appears to be UB.


Best

Kai-Uwe
 
O

Old Wolf

Julián Albo said:
Old said:
I think that it is a
bounds exception: arr[0] is an array of 3 ints, so if you take a
pointer to its first element and increment it 4 times, you have
overflowed arr[0].

But & arr [0] [0] is a pointer to int, not an array of 3 int. How can fill
check for bounds that doesn't know?

It knows that it is a pointer into an array of 3 ints. For example, it
could
be implemented such that each pointer contains the beginning and end
of the object it's pointing to.

&arr[0][0] is equivalent to (arr[0] + 0), according to the C standard.
arr[0] decays to a pointer to the first element of an array of 3 ints,
and
that pointer stores its bounds (e.g. it cannot go back from where it
is,
and it can go forward up to 3 ints).

Then, when you add 4 to it, it throws a bounds exception.
From a practical point of view, to handle this type of things as errors,
exceptions or even emit a meaningful warning, will require that the
compiler or the runtime code saves a bunch of information under the
hood and check it in many places.

Just a couple of addresses or offsets with each pointer.
From a standard point of view, I think the best solution will be that
the pointer arithmetic rules be rewritten to explicitly mention arrays
of arrays.

What did you have in mind?
 
G

Guest

Jack said:
I had an argument with someone on wheather this piece of code can
invoke undefined bahaviour.
I think it does not invoke any undefined behaviour since there is
sufficient memory space of 9 integer elements starting from the in the
address passed, but the other person insisted that it would invoke
undefined behaviour(for whatever reasons he had).


void fill(int *p)
{
for(unsigned j=0;j<9;++j)
p[j]=0;
}

int main()
{
int arr[3][3];
fill(&arr[0][0]);
}


So please let me know if this code invokes undefined behaviour and why?

Sharath A.V

It is as simple as this: the standards for both C and C++ make it
illegal to construct a pointer more than one past the end of an array,
and to dereference a pointer more than zero past the end of an array.

I do not know of one single implementation where this will fail,

A slightly modified version fails with GCC 4.1.1.

#include <iostream>

int main()
{
int arr[3][3];
arr[1][1] = 1;
for(unsigned j = 0; j < 9; ++j)
(&arr[0][0])[j] = 0;
std::cout << arr[1][1] << '\n';
}

When optimisations are enabled, GCC assumes arr[1][1] is not changed,
and prints 1. The original example happens to work at the moment, but
future versions of GCC may very well inline fill(). (I doubt anyone
thinks that the original example has no UB, but my example does. Either
they both have UB, or neither do.)
 
?

=?ISO-8859-15?Q?Juli=E1n?= Albo

Kai-Uwe Bux said:
This has been discussed many times on comp.lang.c (without
a solid conclusion). Some people think that it is absurd for it to
fail, others (such as me and ScorpionCh) think that it is a
bounds exception: arr[0] is an array of 3 ints, so if you take a
pointer to its first element and increment it 4 times, you have
overflowed arr[0].

But & arr [0] [0] is a pointer to int, not an array of 3 int.

True, it is the pointer to the first int in an array of three, namely
arr[0].
How can fill check for bounds that doesn't know?

Why wouldn't it know? The standard certainly allows pointers to be
implemented in much fancier a way than we imagine. For instance, a pointer
object could consist of an address and a type id and a the latter could
indicate that this partcular int it's is the start of an array of 3.

It is possible, but is against the spirit of C++ to expect that all
implementations handle this information. And in this case is far more
convenient IMHO to 'legalize' the usage that actually works in the majority
of compilers (something knows an actual exception?) that maintains a
difficult to diagnose undefined behavior.

But this is not an important point, I don't recommend to nothing to write
code like that.
 
?

=?ISO-8859-15?Q?Juli=E1n?= Albo

Old said:
Just a couple of addresses or offsets with each pointer.

Well, if this is a bunch or not is a matter of opinion, but consider that
many people find that a pointer to an vtable in all instances of a class is
too much overhead in many cases.
What did you have in mind?

Something like consider the complete array as the complete multidimensional
array may be enough, but I'm not an expert in Standard details.
 
S

SuperKoko

Julián Albo said:
Kai-Uwe Bux said:
This has been discussed many times on comp.lang.c (without
a solid conclusion). Some people think that it is absurd for it to
fail, others (such as me and ScorpionCh) think that it is a
bounds exception: arr[0] is an array of 3 ints, so if you take a
pointer to its first element and increment it 4 times, you have
overflowed arr[0].

But & arr [0] [0] is a pointer to int, not an array of 3 int.

True, it is the pointer to the first int in an array of three, namely
arr[0].
How can fill check for bounds that doesn't know?

Why wouldn't it know? The standard certainly allows pointers to be
implemented in much fancier a way than we imagine. For instance, a pointer
object could consist of an address and a type id and a the latter could
indicate that this partcular int it's is the start of an array of 3.

It is possible, but is against the spirit of C++ to expect that all
implementations handle this information. And in this case is far more
convenient IMHO to 'legalize' the usage that actually works in the majority
of compilers (something knows an actual exception?) that maintains a
difficult to diagnose undefined behavior.
What's the spirit of C++? AFAIK C++ has many UB (e.g. with sequence
point rules), and it is not that much a problem.
Not all implementations are required to handle this information... But
they're permitted.
It might be useful for a "safe" implementation.... Performance impact
might be small if the implementation is already slow (interpreted
implementation for example).
Personally, I think that bound checking is a good idea for an
interpreted implementation.


It works on "a majority of compilers", but GCC 4.1.1 is an "exception",
as "Harald van Dijk" said
A slightly modified version fails with GCC 4.1.1.

#include <iostream>

int main()
{
int arr[3][3];
arr[1][1] = 1;
for(unsigned j = 0; j < 9; ++j)
(&arr[0][0])[j] = 0;
std::cout << arr[1][1] << '\n';

}
But this is not an important point, I don't recommend to nothing to write
code like that.
Anyway, it is broken...

I think it is not a good idea to allow that stuff in the standard... If
you want to avoid subtle bugs, simply compile with low level of
optimizations.
I know that the code produced is slower... But it can't be as fast...
So, instead of inhibiting optimizations for everybody... It is better
to allow stronger optimizations for guys who knows the standard, and
lower optimization levels for guys like you.
 
S

SuperKoko

Julián Albo said:
Kai-Uwe Bux said:
This has been discussed many times on comp.lang.c (without
a solid conclusion). Some people think that it is absurd for it to
fail, others (such as me and ScorpionCh) think that it is a
bounds exception: arr[0] is an array of 3 ints, so if you take a
pointer to its first element and increment it 4 times, you have
overflowed arr[0].

But & arr [0] [0] is a pointer to int, not an array of 3 int.

True, it is the pointer to the first int in an array of three, namely
arr[0].
How can fill check for bounds that doesn't know?

Why wouldn't it know? The standard certainly allows pointers to be
implemented in much fancier a way than we imagine. For instance, a pointer
object could consist of an address and a type id and a the latter could
indicate that this partcular int it's is the start of an array of 3.

It is possible, but is against the spirit of C++ to expect that all
implementations handle this information. And in this case is far more
convenient IMHO to 'legalize' the usage that actually works in the majority
of compilers (something knows an actual exception?) that maintains a
difficult to diagnose undefined behavior.
What's the spirit of C++? AFAIK C++ has many UB (e.g. with sequence
point rules), and it is not that much a problem.
Not all implementations are required to handle this information... But
they're permitted.
Personally, I think that bound checking is a good idea for an
interpreted implementation where performance doesn't really matter, or
is already bad.

It works on "a majority of compilers", but GCC 4.1.1 is an "exception",
as "Harald van Dijk" said
A slightly modified version fails with GCC 4.1.1.
#include <iostream>
int main()
{
int arr[3][3];
arr[1][1] = 1;
for(unsigned j = 0; j < 9; ++j)
(&arr[0][0])[j] = 0;
std::cout << arr[1][1] << '\n';

Julián Albo said:
But this is not an important point, I don't recommend to nothing to write
code like that.

Fortunately you don't recommend it; It is broken...

I think it is not a good idea to forbid bound checking or
bounds-related optimizations in the standard...
If you want to avoid subtle bugs, simply compile your code with low
level of optimizations.
I know that the code produced is slower... But it can't be as fast...
So, instead of inhibiting optimizations for everybody... It is better
to allow stronger optimizations for guys who know the standard, and
lower optimization levels for guys like you.
 
O

Old Wolf

Phlip said:
Jack said:
It is as simple as this: the standards for both C and C++ make it
illegal to construct a pointer more than one past the end of an array,
and to dereference a pointer more than zero past the end of an array.

Exactly. Therefore the pointer to arr[2][2], no matter how you get it, is
legal. It could be from &arr[0][0] + 2*3 + 2. It points into the array,
hence it is legal.

It doesn't point into the array arr[0], which is what the pointer was
constructed from. It's not relevant what other arrays it points into.
Would you also say this is legal:

struct {
int x[1];
int y;
} s[10];

x[3] = 1;

because x+3 points somewhere in the array s[] ?
There's no conceptual difference between the cases.
 
P

Phlip

Old said:
It doesn't point into the array arr[0], which is what the pointer was
constructed from.

How can the pointer know that?

This is why a[2] is defined as *(a + 2).
It's not relevant what other arrays it points into.
Would you also say this is legal:

struct {
int x[1];
int y;
} s[10];

x[3] = 1;

because x+3 points somewhere in the array s[] ?

If the struct has no padding, then *(x + 3) points to s[1].y, which is an
int, so it's legal.

Now I know why a[2] is defined as *(a + 2). Otherwise, someone could use the
other pointer rules to infer that a pointer can somehow bond with one array,
and somehow cause trouble when it indexes into another array.

I suspect that legacy C code often flattened multidimensional arrays, and
passed them into functions expecting one long array, so these rules preserve
that behavior.
 
O

Old Wolf

Phlip said:
Old said:
It doesn't point into the array arr[0], which is what the pointer was
constructed from.

How can the pointer know that?

By having each pointer also contain the bounds of the object it
points to (as discussed elsewhere on the thread).
This is why a[2] is defined as *(a + 2).

Yes, that's right. 'a' decays to a pointer with bounds information.
The pointer addition checks that the new value (a+2) hasn't
exceeded its bounds.
It's not relevant what other arrays it points into.
Would you also say this is legal:

struct {
int x[1];
int y;
} s[10];

x[3] = 1;

because x+3 points somewhere in the array s[] ?

If the struct has no padding, then *(x + 3) points to s[1].y, which is an
int, so it's legal.

This is undefined according to C99 6.5.6#8 . In fact, J.2 says
the behaviour is explicitly undefined in this case:

Addition or subtraction of a pointer into, or just beyond, an array
object and an integer type produces a result that does not point
into, or just beyond, the same array object (6.5.6).

s[0].x is a pointer into the array s[0].x[] and it has 3 added to it,
but the result does not point into s[0].x[] .
Now I know why a[2] is defined as *(a + 2). Otherwise, someone could use the
other pointer rules to infer that a pointer can somehow bond with one array,
and somehow cause trouble when it indexes into another array.

I'm not sure what you're trying to say. if 'a' is a pointer resulting
from a
decayed array, then _is_ bound to that array.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top