Simple assignment operator and object overlap

M

Mike S

I came across the following paragraph in the "Semantics" section for
simple assignment in N1124 (C99 draft) and I'm wondering if I'm
interpreting it right:

6.5.16.1p3:

If the value being stored in an object is read from another object that
overlaps in any way
the storage of the first object, then the overlap shall be exact and
the two objects shall
have qualified or unqualified versions of a compatible type; otherwise,
the behavior is
undefined.

The only concievable way for the storage of two objects to overlap in
an assignment context (that I can think of) is in the case where the
new value being stored to an object of type T is the result of the *
operator on an object of type 'pointer to T' (or a pointer to a
different type cast to 'pointer to T'), where the pointer points to a
valid location within the object being assigned to. That's a lot of
verbiage on my part, so hopefully the following code snippet
illustrates what I mean:

/* assume sizeof int >=2 */

int main(void)
{
int i = 10;
int *ip = &i;

i = *ip; /* although pointless, this seems to be allowed \
* because although the storage of i and *ip \
* overlaps, the overlap is exact */

++ip; /* get ready to (potentially) cause some UB :) */

i = *ip; /* from my understanding, this is UB \
* because the storage of i and *ip \
* overlaps (ip is pointing at a location \
* occupied by the storage of i) \
* but ip is -not- pointing at the lowest \
* addressable byte of i, thus the \
* overlap is not exact and it's undefined */
return 0;
}

DId I read the Standard correctly and is my example illustrative of
what the 6.5.16.1p3 intends to communicate? If not, please set my
straight (and also, if there are other instances where overlap can
occur in an assignment besides trying to assign an object a value from
its own storage, that would be interesting to know, as my example is a
shamelessly contrived one).
 
D

Dingo

Mike said:
I came across the following paragraph in the "Semantics" section for
simple assignment in N1124 (C99 draft) and I'm wondering if I'm
interpreting it right:

6.5.16.1p3:

If the value being stored in an object is read from another object that
overlaps in any way
the storage of the first object, then the overlap shall be exact and
the two objects shall
have qualified or unqualified versions of a compatible type; otherwise,
the behavior is
undefined.

The only concievable way for the storage of two objects to overlap in
an assignment context (that I can think of) is in the case where the
new value being stored to an object of type T is the result of the *
operator on an object of type 'pointer to T' (or a pointer to a
different type cast to 'pointer to T'), where the pointer points to a
valid location within the object being assigned to. That's a lot of
verbiage on my part, so hopefully the following code snippet
illustrates what I mean:

/* assume sizeof int >=2 */

int main(void)
{
int i = 10;
int *ip = &i;

i = *ip; /* although pointless, this seems to be allowed \
* because although the storage of i and *ip \
* overlaps, the overlap is exact */

++ip; /* get ready to (potentially) cause some UB :) */

i = *ip; /* from my understanding, this is UB \
* because the storage of i and *ip \
* overlaps (ip is pointing at a location \
* occupied by the storage of i) \
* but ip is -not- pointing at the lowest \
* addressable byte of i, thus the \
* overlap is not exact and it's undefined */
return 0;
}

After the increment, the storage of i and *ip do not overlap,
but you've still managed to invoke undefined behavior. I think
declaring ip as a pointer to an unsigned char would be better
suited to your query.
DId I read the Standard correctly and is my example illustrative of
what the 6.5.16.1p3 intends to communicate? If not, please set my
straight (and also, if there are other instances where overlap can
occur in an assignment besides trying to assign an object a value from
its own storage, that would be interesting to know, as my example is a
shamelessly contrived one).

A union perhaps?
 
K

Keith Thompson

Mike S said:
I came across the following paragraph in the "Semantics" section for
simple assignment in N1124 (C99 draft) and I'm wondering if I'm
interpreting it right:

6.5.16.1p3:

If the value being stored in an object is read from another object
that overlaps in any way the storage of the first object, then the
overlap shall be exact and the two objects shall have qualified or
unqualified versions of a compatible type; otherwise, the behavior
is undefined.

The only concievable way for the storage of two objects to overlap in
an assignment context (that I can think of) is in the case where the
new value being stored to an object of type T is the result of the *
operator on an object of type 'pointer to T' (or a pointer to a
different type cast to 'pointer to T'), where the pointer points to a
valid location within the object being assigned to. That's a lot of
verbiage on my part, so hopefully the following code snippet
illustrates what I mean:

/* assume sizeof int >=2 */

int main(void)
{
int i = 10;
int *ip = &i;

i = *ip; /* although pointless, this seems to be allowed \
* because although the storage of i and *ip \
* overlaps, the overlap is exact */

So far, so good, I think.
++ip; /* get ready to (potentially) cause some UB :) */

This causes ip to point to an int object just after i; if
sizeof(int)==4, it causes ip to advance by 4 bytes.
i = *ip; /* from my understanding, this is UB \
* because the storage of i and *ip \
* overlaps (ip is pointing at a location \
* occupied by the storage of i) \
* but ip is -not- pointing at the lowest \
* addressable byte of i, thus the \
* overlap is not exact and it's undefined */

No, i and *ip don't overlap, but dereferencing ip invokes undefined
behavior, since ip doesn't point to an object.

You could make i an array of 2 ints, and do some pointer conversions
to increment ip by 1 byte rather than by sizeof(int) bytes, but then
dereferencing ip would invoke undefined behavior if it's not aligned
properly. If the implementation allows 1-byte alignment for ints, and
if sizeof(int) >= 2 as you mentioned above, then this would be a
(rather contrived) example of what 6.5.16.1p3 is talking about.
return 0;
}

DId I read the Standard correctly and is my example illustrative of
what the 6.5.16.1p3 intends to communicate? If not, please set my
straight (and also, if there are other instances where overlap can
occur in an assignment besides trying to assign an object a value from
its own storage, that would be interesting to know, as my example is a
shamelessly contrived one).

You can build a less contrived example using unions. Here's what I've
come up with:

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
struct big_type {
int arr[32];
};
struct s1 {
char c;
struct big_type bt1;
};
struct s2 {
long long x;
struct big_type bt2;
};
union u {
struct s1 sub1;
struct s2 sub2;
} obj;

printf("obj.sub1.bt1, offset = %d, size = %d\n",
(int)offsetof(union u, sub1.bt1),
(int)sizeof obj.sub1.bt1);
printf("obj.sub2.bt2, offset = %d, size = %d\n",
(int)offsetof(union u, sub2.bt2),
(int)sizeof obj.sub2.bt2);

/*
* Initialize bt1 so we can refer to it without invoking UB.
*/
memset(&obj.sub1.bt1, 0, sizeof obj.sub1.bt1);
/*
* Now assign its value to bt2. If bt1 and bt2 overlap, this
* invokes undefined behavior.
*/
obj.sub2.bt2 = obj.sub1.bt1;

return 0;
}

The output I get is:

obj.sub1.bt1, offset = 4, size = 128
obj.sub2.bt2, offset = 8, size = 128

so obj.sub1.bt1 and obj.sub2.bt2 overlap, but not completely, and
they're both properly aligned. (An implementation could assign them
both the same offset, so the program doesn't *unconditionally* invoke
undefined behavior.)

Assignment can be done using the equivalent of memcpy(); it doesn't
have to use the equivalent of memmove().
 
M

Mike S

Keith said:
So far, so good, I think.


This causes ip to point to an int object just after i; if
sizeof(int)==4, it causes ip to advance by 4 bytes.

Oops...Of course it does. My brain was thinking in assembly language
when I wrote that little gem ;-) As Dingo mentioned, I should have used
an unsigned char* to make my case. Even in doing so, since *ip might
afterwards not be aligned properly, the resulting undefined behavior
from that also invalidates my experiment, now that I think about it...

i = *ip; /* from my understanding, this is UB \
* because the storage of i and *ip \
* overlaps (ip is pointing at a location \
* occupied by the storage of i) \
* but ip is -not- pointing at the lowest \
* addressable byte of i, thus the \
* overlap is not exact and it's undefined */

No, i and *ip don't overlap, but dereferencing ip invokes undefined
behavior, since ip doesn't point to an object.

You could make i an array of 2 ints, and do some pointer conversions
to increment ip by 1 byte rather than by sizeof(int) bytes, but then
dereferencing ip would invoke undefined behavior if it's not aligned
properly. If the implementation allows 1-byte alignment for ints, and
if sizeof(int) >= 2 as you mentioned above, then this would be a
(rather contrived) example of what 6.5.16.1p3 is talking about.
return 0;
}

DId I read the Standard correctly and is my example illustrative of
what the 6.5.16.1p3 intends to communicate? If not, please set my
straight (and also, if there are other instances where overlap can
occur in an assignment besides trying to assign an object a value from
its own storage, that would be interesting to know, as my example is a
shamelessly contrived one).

You can build a less contrived example using unions. Here's what I've
come up with:

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
struct big_type {
int arr[32];
};
struct s1 {
char c;
struct big_type bt1;
};
struct s2 {
long long x;
struct big_type bt2;
};
union u {
struct s1 sub1;
struct s2 sub2;
} obj;

printf("obj.sub1.bt1, offset = %d, size = %d\n",
(int)offsetof(union u, sub1.bt1),
(int)sizeof obj.sub1.bt1);
printf("obj.sub2.bt2, offset = %d, size = %d\n",
(int)offsetof(union u, sub2.bt2),
(int)sizeof obj.sub2.bt2);

/*
* Initialize bt1 so we can refer to it without invoking UB.
*/
memset(&obj.sub1.bt1, 0, sizeof obj.sub1.bt1);
/*
* Now assign its value to bt2. If bt1 and bt2 overlap, this
* invokes undefined behavior.
*/
obj.sub2.bt2 = obj.sub1.bt1;

return 0;
}

The output I get is:

obj.sub1.bt1, offset = 4, size = 128
obj.sub2.bt2, offset = 8, size = 128
[...]

Again, as Dingo mentioned and as you show here, unions are probably the
best way to illustrate the kind of scenario the Standard is referring
to. That should've been obvious, seeing as how the entire concept of
union relies on objects overlapping one another ;-)
 
L

lovecreatesbeauty

Keith said:
No, i and *ip don't overlap, but dereferencing ip invokes undefined
behavior, since ip doesn't point to an object.

ip refers to an unknown object after self-increasing. To write to *ip
at that time causes undefined behavior, to use the value of the *ip is
using some garbage data of type of *ip, and will not cause undefined
behavior. Is it?

lovecreatesbeauty
 
P

Peter Nilsson

Mike said:
I came across the following paragraph in the "Semantics" section for
simple assignment in N1124 (C99 draft) and I'm wondering if I'm
interpreting it right:

6.5.16.1p3:

If the value being stored in an object is read from another object that
overlaps in any way
the storage of the first object, then the overlap shall be exact and
the two objects shall
have qualified or unqualified versions of a compatible type; otherwise,
the behavior is
undefined.

The only concievable way for the storage of two objects to overlap in
an assignment context (that I can think of) is in the case where the
new value being stored to an object of type T is the result of the *
operator on an object of type 'pointer to T' (or a pointer to a
different type cast to 'pointer to T'), where the pointer points to a
valid location within the object being assigned to. That's a lot of
verbiage on my part, so hopefully the following code snippet
illustrates what I mean:

/* assume sizeof int >=2 */

Why would want or need to assume that?!
int main(void)
{
int i = 10;
int *ip = &i;

i = *ip; /* although pointless, this seems to be allowed \
* because although the storage of i and *ip \
* overlaps, the overlap is exact */

++ip; /* get ready to (potentially) cause some UB :) */

i = *ip; /* from my understanding, this is UB \
* because the storage of i and *ip \

No, it's UB because ip is not guaranteed to point to an object.
* overlaps (ip is pointing at a location \
* occupied by the storage of i) \
* but ip is -not- pointing at the lowest \
* addressable byte of i, thus the \
* overlap is not exact and it's undefined */
return 0;
}

DId I read the Standard correctly and is my example illustrative of
what the 6.5.16.1p3 intends to communicate?

The more likely scenario is along the lines...

struct X { int x; };
struct Y { struct X x; int y; };

void foo(struct Y *yp, const struct X *xp)
{
yp->x = *xp;
}

Potentially, xp points to yp's struct X member. But the assignment
is valid in the same sense that i = i; is generally a valid assignment
(so long as i has a legitimate value).

The invalid case is something like...

struct X { unsigned char x[3]; }
unsigned char data[1024]; /* data read from binary file
perhaps */
struct X *x1 = (struct X *) &data[1];
struct X *x2 = (struct X *) &data[0];
*x1 = *x2;

Here the assignment is not valid because x2 partially overlaps x1.
Of course, there are alignment issues with this example, but that's a
separate matter.

The point is that implementations have greater freedom to optimise
assignments of larger objects more aggressively if they can assume
that objects either overlap exactly, or are mutually exclusive.
 
P

Peter Nilsson

lovecreatesbeauty said:
ip refers to an unknown object after self-increasing.

No, ip needn't refer to any object.
To write to *ip at that time causes undefined behavior,
Yes.

to use the value of the *ip is using some garbage data of type of *ip,
and will not cause undefined behavior. Is it?

No. If there is an object at ip, then it may be a trap representation.
But there needn't be _any_ object at ip.

C allows pointers to point to one byte past the end of an object or
array. But you cannot portably dereference such pointers since
the prior object may be on the edge of a real memory boundary.

Consider a 32-bit processor on a machine that doesn't actually
have 4GB of memory. The C implementations may put an object
on the edge of a real memory boundary. Moving the pointer one
byte beyond the object is usually safe on such machines because
it's just simple arithmetic. Dereferencing such a pointer though may
crash the system because the processor will attempt to retrieve
memory that doesn't exist, usually causing a hardware interrupt.
 
B

Barry Schwarz

No, ip needn't refer to any object.


Yes.

Not only storing in *ip but any attempt to evaluate *ip (reading
also).
No. If there is an object at ip, then it may be a trap representation.
But there needn't be _any_ object at ip.

It doesn't matter whether there is an object at that address or not or
what value the object may have. It is a constraint violation to
evaluate *ip. From n1123, para 6.5.6-8: "If the result points one
past the last element of the array object, it shall not be used as the
operand of a unary * operator that is evaluated." Footnote 89
provides the intuitive extension of "last element" to a scalar object.
C allows pointers to point to one byte past the end of an object or
array. But you cannot portably dereference such pointers since
the prior object may be on the edge of a real memory boundary.

It's not a question of portability. It is a constraint violation.



Remove del for email
 
P

Peter Nilsson

Barry said:
It's not a question of portability.

Yes it is.
It is a constraint violation.

Quite so Barry, but the fact that it's a constraint violation does not
say _why_ it's a constraint violation. I was trying to illustrate how
on
real machines, it may be possible to reference one-past-the-end
pointers, but not to dereference them. Thus, the standard allows
flexibility but provides a constraint against one form of misuse.
 
B

Barry Schwarz

Yes it is.

You said such a pointer could not be portably dereferenced. This
implies that such a dereference is no worse than assuming integers are
little-endian, which is in fact something that cannot be done
portably.

But it is worse. It is a constraint violation and therefore invokes
undefined behavior. It is in the same vain as referencing allocated
memory after freeing it.
Quite so Barry, but the fact that it's a constraint violation does not
say _why_ it's a constraint violation. I was trying to illustrate how
on
real machines, it may be possible to reference one-past-the-end
pointers, but not to dereference them. Thus, the standard allows
flexibility but provides a constraint against one form of misuse.

The standard does not allow you to reference one past the end. It
allows you to calculate that address and use that address in the
intuitively obvious manner. But any attempt, or if you prefer all
attempts, to reference the memory is/are undefined.



Remove del for email
 
G

Guest

It doesn't matter whether there is an object at that address or not or
what value the object may have. It is a constraint violation to
evaluate *ip. From n1123, para 6.5.6-8: "If the result points one
past the last element of the array object, it shall not be used as the
operand of a unary * operator that is evaluated."

Did you mean n1124? 6.5.6#8 is in the Semantics section, not the
Constraints section, so no, it's "just" undefined behaviour. A
constraint violation is stronger than that: it requires a diagnostic
(which is pretty much impossible for this in the general case), and may
cause the program to fail to compile even if it is code would otherwise
never be reached.
 
K

Keith Thompson

Barry Schwarz said:
On 7 Jul 2006 22:20:28 -0700, "Peter Nilsson" <[email protected]>
wrote: [...]
C allows pointers to point to one byte past the end of an object or
array. But you cannot portably dereference such pointers since
the prior object may be on the edge of a real memory boundary.

It's not a question of portability. It is a constraint violation.

No, it's not a constraint violation; it's undefined behavior.

An implementation is required to issue a compile-time diagnostic when
a constraint is violated (e.g., for a type mismatch in an assignment
statement). Attempting to dereference a pointer just past the end of
an object cannot in general be detected at compile time.
 
P

Peter Nilsson

Barry said:
You said such a pointer could not be portably dereferenced.
This implies that such a dereference is no worse than assuming
integers are little-endian, which is in fact something that cannot
be done portably.

In some cases, dereferencing one past the end pointers (and beyond)
is considerably less worse than assuming little-endian.

[Think struct hack.]
 
B

Barry Schwarz

Barry said:
You said such a pointer could not be portably dereferenced.
This implies that such a dereference is no worse than assuming
integers are little-endian, which is in fact something that cannot
be done portably.

In some cases, dereferencing one past the end pointers (and beyond)
is considerably less worse than assuming little-endian.

[Think struct hack.]

But the struct hack requires you to allocate more space than sizeof
struct so you are still in the area of memory allocated for you.


Remove del for email
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top