Is the aliasing rule symmetric?

  • Thread starter Johannes Schaub (litb)
  • Start date
J

Joshua Maurice

I think I'm missing something. This last simplification does not seem to be
valid according to the intent. In the unsimplified code, before executing
the "return a->y" you have for read access to "*a_y":

I think that you need to look at it again. "a_y" originally held the
result of "& a->y ", but I slowly transformed it to hold the result of
"& b->y ".

The longer version of that one-step simplification is:
T1* a = (T1*) p;
T2* b = (T2*) p;
int* a_y = & b->y;
int* b_y = & b->y;
*a_y = 2;
return a->y;
simplifies to:
T1* a = (T1*) p;
T2* b = (T2*) p;
int* a_y = & b->y;
*a_y = 2;
return a->y;
simplifies to:
int* a_y = & ((T2*)p)->y;
*a_y = 2;
return ((T1*)p)->y;
simplifies to:
((T2*)p)->y = 2;
return ((T1*)p)->y;
 
J

Johannes Schaub (litb)

Joshua said:
I think that you need to look at it again. "a_y" originally held the
result of "& a->y ", but I slowly transformed it to hold the result of
"& b->y ".

I think "& a->y" and "& b->y" are exactly equivalent.
The longer version of that one-step simplification is:
T1* a = (T1*) p;
T2* b = (T2*) p;
int* a_y = & b->y;
int* b_y = & b->y;
*a_y = 2;
return a->y;
simplifies to:
T1* a = (T1*) p;
T2* b = (T2*) p;
int* a_y = & b->y;
*a_y = 2;
return a->y;
simplifies to:
int* a_y = & ((T2*)p)->y;
*a_y = 2;
return ((T1*)p)->y;
simplifies to:
((T2*)p)->y = 2;
return ((T1*)p)->y;

The last simplification in this longer version is invalid, I think. Prior to
the "return", in the second last version, you access one object and that
object only by an lvalue of type "int". In the second version prior to the
return (where you do a write access), you access two objects, the first of
which with an lvalue of type "T2" (and changes the effective type to that)
and the second of which by an lvalue of type int.
 
B

Ben Bacarisse

Johannes Schaub (litb) said:
Joshua Maurice wrote:
for the return access you have

object 1: lvalue T1, address X, sizeof(T1), effective type: T1
object 2: lvalue int, address X, sizeof(int), effective type: int

The effective type in the access to object 1 was taken from the type of the
"lvalue" used for the access.

I can't see any lvalue of type T1 in the return expression. The whole
malloced object never gets an effective type as far as I can see. I
note the "scare quotes" so maybe you have some slightly different
meaning for lvalue here.

There are only two things here that are lvalue expressions: 'p' and
'((T1*)p)->y'. One has type void * and the other has type int. Only
this second lvalue expression is used to access the object in question
(access to the pointer object 'p' is not at issue).

Just to clarify, a cast expression is not a lvalue and even if it were
the type of (T1 *)p is T1 * not T1. Also, in C, E->M is not defined to
be the same as (*E).M or there would certainly be an access via an
lvalue expression of type T1.

<snip>
 
W

Wojtek Lerch

Also, in C, E->M is not defined to
be the same as (*E).M or there would certainly be an access via an
lvalue expression of type T1.

Why? Are you saying that whenever an lvalue expression such as S.M is
evaluated, it counts not only as an access to the member but also an
access to the whole structure? (Except, I assume, in a context where it
does not access an object at all, such as in &S.M?)

Or do you have something more subtle in mind, maybe along the lines that
the expression S.M accesses only the member, but it accesses it "via" an
lvalue expression of the structure type, without accessing the whole
structure, because the struct lvalue is a subexpression of the lvalue
designating the object actualy accessed?
 
J

Joshua Maurice

Why?  Are you saying that whenever an lvalue expression such as S.M is
evaluated, it counts not only as an access to the member but also an
access to the whole structure?  (Except, I assume, in a context where it
does not access an object at all, such as in &S.M?)

Or do you have something more subtle in mind, maybe along the lines that
the expression S.M accesses only the member, but it accesses it "via" an
lvalue expression of the structure type, without accessing the whole
structure, because the struct lvalue is a subexpression of the lvalue
designating the object actualy accessed?

Specifically, with regards to POSIX pthreads race conditions, and the
volatile rules, is there a difference between
* a->x = 1;
and
(*a).x = 1;
?

That would be kind of funny if there was a difference, where one would
cause more volatile reads or writes than the other, or where one would
could have a race condition but the other could not.
 
J

Joshua Maurice

I think "& a->y" and "& b->y" are exactly equivalent.




The last simplification in this longer version is invalid, I think. Prior to
the "return", in the second last version, you access one object and that
object only by an lvalue of type "int". In the second version prior to the
return (where you do a write access), you access two objects, the first of
which with an lvalue of type "T2" (and changes the effective type to that)
and the second of which by an lvalue of type int.

To be clear, you think that there's a difference between
a->x = 2;
and
int* x = & a->x;
*x = 2;
?

It would take me a long time to buy that.
 
T

Tim Rentsch

Johannes Schaub (litb) said:
[snip]

In particular, I think the committee intends the spec to say that a struct
or union access expression involves an access with the struct or union
lvalue.

T1 *p = malloc(sizeof *p);
p->x = 0;

In this case, I think the committee's intent is that the object pointed to
by "p" is accesse by an lvalue of type T1, and so the effective type of the
object containing the int changes to T1. So a later cast and access by an
lvalue of T2 will be undefined behavior.

I'm not aware of any evidence that supports this theory (ie,
that using '.' or '->' is also an access for the left operand).
Furthermore it seems to be in conflict with the definitions the
Standard gives for access, value, etc.

Do you have any such evidence to offer? Or are you simply
stating an unsupported opinion?
 
T

Tim Rentsch

Joshua Maurice said:
On Feb 6, 8:55 am, "Johannes Schaub (litb)"
[snip]

In particular, I think the committee intends the spec to say that a struct
or union access expression involves an access with the struct or union
lvalue.

    T1 *p = malloc(sizeof *p);
    p->x = 0;

In this case, I think the committee's intent is that the object pointed to
by "p" is accesse by an lvalue of type T1, and so the effective type of the
object containing the int changes to T1. So a later cast and access by an
lvalue of T2 will be undefined behavior.

I think this is also the only sensible interpretation of the
committee's intent. [snip elaboration]

What I think you're trying to say is that this interpretation is
the only one that makes sense, and therefore must be what the
committee intended. (We don't know what the committee intended,
so there is no way to judge whether a particular interpretation
is the only sensible one, or indeed whether there is _any_
sensible meaning for what they intended.)

Regardless of what the committee might or might not have
intended, there certainly are alternative ways of reading
the standard that make as much sense as this one.
 
T

Tim Rentsch

Ben Bacarisse said:
[snip]
Also, in C, E->M is not defined to be the same as (*E).M

It's true that they aren't defined to be the same (and as you
point out a cast expression is not an lvalue), but there is a
sequence of equivalences (using '===' to mean "equivalent"):

(&E)->MOS === E.MOS // by footnote 83
(&(*P))->MOS === (*P).MOS // substituting (*P) for E
P->MOS === (*P).MOS // 6.5.3.2p3
 
B

Ben Bacarisse

Wojtek Lerch said:
Why? Are you saying that whenever an lvalue expression such as S.M is
evaluated, it counts not only as an access to the member but also an
access to the whole structure? (Except, I assume, in a context where
it does not access an object at all, such as in &S.M?)

Hmm... I did not want to derail the discussion of the pointer access
but I think you are right. I should just have said "there would
certainly be an lvalue expression of type T1" without saying if there is
an access "via" it or not.
Or do you have something more subtle in mind, maybe along the lines
that the expression S.M accesses only the member, but it accesses it
"via" an lvalue expression of the structure type, without accessing
the whole structure, because the struct lvalue is a subexpression of
the lvalue designating the object actualy accessed?

I think used to but I am really not sure anymore! One certainly can't
read the definition of . as implying an access of the whole structure,
but then I am puzzled by the pupose of the second to last of the access
rules (6.5 p7). I see that the rules don't use the term "via" but
simply "by". I think it might help if I start a new thread asking a
naive question about 6.5 p7.
 
J

Johannes Schaub (litb)

Joshua said:
[snipped]
To be clear, you think that there's a difference between
a->x = 2;
and
int* x = & a->x;
*x = 2;
?

It would take me a long time to buy that.

Yes I think there is a difference between te two. The first uses the struct
for the access. The second does not.
 
J

Joshua Maurice

Joshua said:
[snipped]
To be clear, you think that there's a difference between
  a->x = 2;
and
  int* x = & a->x;
  *x = 2;
?
It would take me a long time to buy that.

Yes I think there is a difference between te two. The first uses the struct
for the access. The second does not.

I never really considered this beyond a first glance.

Again, to be crystal clear, consider:
/* 1 */
a -> x = 2;
and
/* 2 */
* ( & ( a -> x )) = 2;
and
/* 3 */
int* x = & a->x;
*x = 2;

You really think there's a difference? Really? Where's the difference?
Between 1 and 2, or 2 and 3? I /hope/ between 1 and 2. 2 and 3 better
be entirely equivalent, or I'm really losing it.

As a naive understanding for the difference between 1 and 2: The
addressof operator (&) simply returns the address of the object
referred to the lvalue, and then the dereference operator (*) simply
takes that pointer value and returns back the same lvalue (which
refers to the same object). This isn't operator overloading in C++. I
would think that it ought to be a noop. If there is any difference at
all between any of 1, 2, and 3 above in this post, then I have a
fundamental misunderstanding of the language.
 
W

Wojtek Lerch

Again, to be crystal clear, consider:
/* 1 */
a -> x = 2;
and
/* 2 */
* (& ( a -> x )) = 2;
and
/* 3 */
int* x =& a->x;
*x = 2;

You really think there's a difference? Really? Where's the difference?
Between 1 and 2, or 2 and 3? I /hope/ between 1 and 2. 2 and 3 better
be entirely equivalent, or I'm really losing it.

As a naive understanding for the difference between 1 and 2: The
addressof operator (&) simply returns the address of the object
referred to the lvalue, and then the dereference operator (*) simply
takes that pointer value and returns back the same lvalue (which
refers to the same object). This isn't operator overloading in C++. I
would think that it ought to be a noop. If there is any difference at
all between any of 1, 2, and 3 above in this post, then I have a
fundamental misunderstanding of the language.

I don't think a C pointer is simply just the address of an object. If
you consider the rules of pointer arithmetic and DR260, a pointer value
carries some extra properties that decide what operations on it are
defined. Two pointers may compare equal and be represented by identical
bit patterns, but depending on their "provenance", one of them may be
safe to dereference or increment but not decrement, while the other may
be safe to decrement but not increment or dereference. The standard
tells us that every object can be considered an array element, and every
pointer to an object has a range of integers that can be legitimately
added to it, based on the object's "arrayness"; but the standard rarely
bothers explaining how to determine what that array is, and all we can
do is rely on obvious guesses where they're obvious, and in less-obvious
cases we can hope that the guess is even harder for a compiler, forcing
it to generate code that does the "naive" thing regardless of what the
limit would be if the standard didn't neglect to specify it.

Since this "arrayness" is not exactly on topic here, I don't want to go
too deep into it now; but maybe pointers are also supposed to remember
their "structness", and your &a->x (and also your x) are not just
pointers to an int that is known not to be an array element, but
pointers to an int that is known not to be an array element but is also
known to be the "x" member of a struct T1? If that were the case, then
maybe a simple assignment to *x could still impose an effective type of
struct T1 on the object surrounding the int that x points to. But of
course none of that is actually discussed in the standard, just like the
transformations of "arrayness" are not discussed for most of the
operations where they apparently happen.
 
J

Joshua Maurice

I don't think a C pointer is simply just the address of an object.  If
you consider the rules of pointer arithmetic and DR260, a pointer value
carries some extra properties that decide what operations on it are
defined.  Two pointers may compare equal and be represented by identical
bit patterns, but depending on their "provenance", one of them may be
safe to dereference or increment but not decrement, while the other may
be safe to decrement but not increment or dereference.  The standard
tells us that every object can be considered an array element, and every
pointer to an object has a range of integers that can be legitimately
added to it, based on the object's "arrayness"; but the standard rarely
bothers explaining how to determine what that array is, and all we can
do is rely on obvious guesses where they're obvious, and in less-obvious
cases we can hope that the guess is even harder for a compiler, forcing
it to generate code that does the "naive" thing regardless of what the
limit would be if the standard didn't neglect to specify it.

Since this "arrayness" is not exactly on topic here, I don't want to go
too deep into it now; but maybe pointers are also supposed to remember
their "structness", and your &a->x (and also your x) are not just
pointers to an int that is known not to be an array element, but
pointers to an int that is known not to be an array element but is also
known to be the "x" member of a struct T1?  If that were the case, then
maybe a simple assignment to *x could still impose an effective type of
struct T1 on the object surrounding the int that x points to.  But of
course none of that is actually discussed in the standard, just like the
transformations of "arrayness" are not discussed for most of the
operations where they apparently happen.

Indeed. This is exactly what I meant when I was saying "data
dependency analysis". I think your way is clearer. (It could be that)
pointer values carry with them some semantic information, in this case
it remembers that it came from a memberof expression on a T1 lvalue.
I'll have to check out that DR. Are there any other spots in the C
standard which you suggest that I look at regarding this "arrayness"?
 
J

Johannes Schaub (litb)

Johannes Schaub (litb) said:
[snip]

In particular, I think the committee intends the spec to say that a struct
or union access expression involves an access with the struct or union
lvalue.

T1 *p = malloc(sizeof *p);
p->x = 0;

In this case, I think the committee's intent is that the object pointed to
by "p" is accesse by an lvalue of type T1, and so the effective type of the
object containing the int changes to T1. So a later cast and access by an
lvalue of T2 will be undefined behavior.

I'm not aware of any evidence that supports this theory (ie,
that using '.' or '->' is also an access for the left operand).
Furthermore it seems to be in conflict with the definitions the
Standard gives for access, value, etc.

Do you have any such evidence to offer? Or are you simply
stating an unsupported opinion?

The committee argues that way in the union DR. See http://www.open-
std.org/jtc1/sc22/wg14/www/docs/dr_236.htm .
 
J

Johannes Schaub (litb)

Joshua said:
On Feb 6, 3:34 pm, "Johannes Schaub (litb)"
[snipped]
To be clear, you think that there's a difference between
a->x = 2;
and
int* x =& a->x;
*x = 2;
?
It would take me a long time to buy that.

Yes I think there is a difference between te two. The first uses the struct
for the access. The second does not.

I never really considered this beyond a first glance.

Again, to be crystal clear, consider:
/* 1 */
a -> x = 2;
and
/* 2 */
* (& ( a -> x )) = 2;
and
/* 3 */
int* x =& a->x;
*x = 2;

You really think there's a difference? Really? Where's the difference?
Between 1 and 2, or 2 and 3? I /hope/ between 1 and 2. 2 and 3 better
be entirely equivalent, or I'm really losing it.

Yes, I think /* 1 */ is different from /* 2 */ in that /* 1 */ involves
the type of a's struct in the access. /* 2 */ is equivalen to /* 3 */ I
think.

Anyway, the committee says in the union-DR that this is UB:

union A { int a; float b; } u;
u.a = 0;
float *b = &u.b;
*b = 0.f;
// *&u.b = 0.f; // i think this is equivalent

The only way I can use aliasing rule to get to UB is: The object at "u"
has effective type "union A" (with a sizeof union A) and effective type
int (with a sizeof int). If you access it with merely "int", you access
the object whose' effecive type is A with an lvalue of type int. And
have undefined behavior.

But I don't think that this makes sense. It would mean the following is
UB too:

struct A { int a; } b;
b.a = 0;
*&b.a = 0;

Same situation. We access an object whose effective type is struct A by
an lvalue of type int. So I can't follow the committee's intent here
anyway. I.e whatever you might think about /* 1 */ and /* 2 */ having
apparently different semantics, I can neither explain nor understand the
extent of it.
As a naive understanding for the difference between 1 and 2: The
addressof operator (&) simply returns the address of the object
referred to the lvalue, and then the dereference operator (*) simply
takes that pointer value and returns back the same lvalue (which
refers to the same object). This isn't operator overloading in C++. I
would think that it ought to be a noop. If there is any difference at
all between any of 1, 2, and 3 above in this post, then I have a
fundamental misunderstanding of the language.

I thought we agreed that "a.b = ..." and "*x = ..." are different in
that the type of "a" has some influence on the access, in order to deem
the following UB.

typedef struct A { int a; } A;
typedef struct B { int a; } B;
A *x = malloc(sizeof *a);
x->a = 0; // access with effective type A and int
((B*)x)->a = 0; // I thought we agreed this is UB
// and committee intent.

I think *I* am misunderstanding the matter rather than you :(
 
J

Johannes Schaub (litb)

Joshua said:
On Feb 6, 3:34 pm, "Johannes Schaub (litb)"
[snipped]
To be clear, you think that there's a difference between
a->x = 2;
and
int* x =& a->x;
*x = 2;
?
It would take me a long time to buy that.

Yes I think there is a difference between te two. The first uses the struct
for the access. The second does not.

I never really considered this beyond a first glance.

Again, to be crystal clear, consider:
/* 1 */
a -> x = 2;
and
/* 2 */
* (& ( a -> x )) = 2;
and
/* 3 */
int* x =& a->x;
*x = 2;

You really think there's a difference? Really? Where's the difference?
Between 1 and 2, or 2 and 3? I /hope/ between 1 and 2. 2 and 3 better
be entirely equivalent, or I'm really losing it.

Yes, I think /* 1 */ is different from /* 2 */ in that /* 1 */ involves
the type of a's struct in the access. /* 2 */ is equivalen to /* 3 */ I
think.

Anyway, the committee says in the union-DR that this is UB:

union A { int a; float b; } u;
u.a = 0;
float *b = &u.b;
*b = 0.f;
// *&u.b = 0.f; // i think this is equivalent

The only way I can use aliasing rule to get to UB is: The object at "u"
has effective type "union A" (with a sizeof union A) and effective type
int (with a sizeof int). If you access it with merely "int", you access
the object whose' effecive type is A with an lvalue of type int. And
have undefined behavior.

But I don't think that this makes sense. It would mean the following is
UB too:

struct A { int a; } b;
b.a = 0;
*&b.a = 0;

Same situation. We access an object whose effective type is struct A by
an lvalue of type int. So I can't follow the committee's intent here
anyway. I.e whatever you might think about /* 1 */ and /* 2 */ having
apparently different semantics, I can neither explain nor understand the
extent of it.
As a naive understanding for the difference between 1 and 2: The
addressof operator (&) simply returns the address of the object
referred to the lvalue, and then the dereference operator (*) simply
takes that pointer value and returns back the same lvalue (which
refers to the same object). This isn't operator overloading in C++. I
would think that it ought to be a noop. If there is any difference at
all between any of 1, 2, and 3 above in this post, then I have a
fundamental misunderstanding of the language.

I thought we agreed that "a.b = ..." and "*x = ..." are different in
that the type of "a" has some influence on the access, in order to deem
the following UB.

typedef struct A { int a; } A;
typedef struct B { int a; } B;
A *x = malloc(sizeof *a);
x->a = 0; // access with effective type A and int
((B*)x)->a = 0; // I thought we agreed this is UB
// and committee intent.

I think *I* am misunderstanding the matter rather than you :(
 
T

Tim Rentsch

Johannes Schaub (litb) said:
Johannes Schaub (litb) said:
[snip]

In particular, I think the committee intends the spec to say that a struct
or union access expression involves an access with the struct or union
lvalue.

T1 *p = malloc(sizeof *p);
p->x = 0;

In this case, I think the committee's intent is that the object pointed to
by "p" is accesse by an lvalue of type T1, and so the effective type of the
object containing the int changes to T1. So a later cast and access by an
lvalue of T2 will be undefined behavior.

I'm not aware of any evidence that supports this theory (ie,
that using '.' or '->' is also an access for the left operand).
Furthermore it seems to be in conflict with the definitions the
Standard gives for access, value, etc.

Do you have any such evidence to offer? Or are you simply
stating an unsupported opinion?

The committee argues that way in the union DR. See http://www.open-
std.org/jtc1/sc22/wg14/www/docs/dr_236.htm .

Actually they don't. You might infer that's what they are thinking,
but no such position is stated, nor is it necessary to reach the
conclusions they reach.
 
J

Johannes Schaub (litb)

Johannes Schaub (litb) said:
On Feb 7, 8:09 am, "Johannes Schaub (litb)"
Joshua Maurice wrote:
On Feb 6, 3:34 pm, "Johannes Schaub (litb)"
[snipped]
To be clear, you think that there's a difference between
a->x = 2;
and
int* x =& a->x;
*x = 2;
?

It would take me a long time to buy that.

Yes I think there is a difference between te two. The first uses the struct
for the access. The second does not.

I never really considered this beyond a first glance.

Again, to be crystal clear, consider:
/* 1 */
a -> x = 2;
and
/* 2 */
* (& ( a -> x )) = 2;
and
/* 3 */
int* x =& a->x;
*x = 2;

You really think there's a difference? Really? Where's the difference?
Between 1 and 2, or 2 and 3? I /hope/ between 1 and 2. 2 and 3 better
be entirely equivalent, or I'm really losing it.

Yes, I think /* 1 */ is different from /* 2 */ in that /* 1 */
involves the type of a's struct in the access. /* 2 */ is equivalen to
/* 3 */ I think.

Anyway, the committee says in the union-DR that this is UB:

union A { int a; float b; } u;
u.a = 0;
float *b =&u.b;
*b = 0.f;
// *&u.b = 0.f; // i think this is equivalent

Assuming you're talking about DR 236, they say no such thing.

Then I encourage you to tell us what else they say by:
 
J

Johannes Schaub (litb)

Then I encourage you to tell us what else they say by:

Hm, it seems I may have misunderstood what they say. They actually seems
to say that a write like "*qd = 0" does *not* chagne the effective type
of the accessed object.

But that seems wrong, because the aliasing rule says that a write
changes the effective type *for that access* and for all further read
accesses. So WTF does the committee say!?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,900
Latest member
Nell636132

Latest Threads

Top