About casts (and pointers)

C

Christian Bau

Tim Rentsch said:
I believe that whether or not the compiler can deduce that no function
outside the single-translation-unit program is called is irrelevant.
First, 6.2.7 p1 doesn't say translation units in the same executable;
it just says translation units. Second, there are lots of ways a
structure value might be transmitted between different executables,
eg,

- writing bytes out to a file and another executable reading
them in;

- storing a value in shared memory;

- transmitting bytes over a pipe or a socket;

- just leaving a value in memory somewhere and expecting the
next program to pick it up;

- for a more subtle example - the value of an 'offsetof'
macro call might be stored by one executable and used
by another.

The C standard makes (some) guarantees about program execution, but it
seems silly to think it makes guarantees *only* about program
execution. There also are guarantees about what representations are
used in (some) data types. I think most people reading sections
6.2.5, 6.2.6, 6.2.7 and 6.3 would conclude that guarantees are made
about the representations of compatible types (of structs) in
different translation units, regardless of whether they were ever
bound into a single executable. In response to that, do you have
anything to offer other than just an assertion to the contrary?
Saying "... it is clearly allowed ..." without giving any supporting
statements isn't very convincing.

The C Standard gives a guarantee that under certain circumstances
modifying an object through a pointer to some struct modifies another
object in a defined way. Example:

typedef struct { int x; double y; char z; } s1;
typedef struct { int a; double b; short c; } s2;

typedef union { s1 x; s2 y; } u;

int main (void) {
s1 x;
((s2 *) &x) -> b = 3.7;
return 0;
}

The assignment is guaranteed to set x.y to 3.7. How the compiler does
this is none of anyones business. A simple strategy to achieve this is
for the compiler to use the same layout for all initial sequences of all
structs. That means the offset of any one struct member depends only of
the type of that member and all preceeding members, but not on the type
of any following members.

In my tiny example, it is clear that the compiler can achieve what is
guaranteed by the standard without using the same layout for s1 and s2.
All it needs to do is replace the left hand side of the assignment ((s2
*) &x)->b with x.y. That will achieve exactly what is guaranteed by the
C Standard. Yes, the simplest implementation will use the same offsetof
for s1.y and s2.b, but _that_ is not guaranteed by the C Standard.

So the situation is: The C Standard guarantees X. A simple method to
achieve this in an implementation is to do Y. A compiler would in fact
have to work hard to achieve X without doing Y. This does _not_
guarantee Y in any way, it guarantees X.

If the C Standard had wished to guarantee same layout for structures, it
would have been easy to add this.
 
S

S.Tobias

Peter Nilsson said:
Christian Bau wrote:
[snip]
However, the compiler is free to assume that two pointers to different
struct types cannot access the same memory without undefined behavior.

Again, why?

They are a little colloquial words, but I think he is right.
These assignments are made through int lvalues. The struct types are
incidental.

Not quite incidental:

# [#4] A postfix expression followed by the -> operator and an
# identifier designates a member of a structure or union
# object. The value is that of the named member of the object
# to which the first expression points, and is an lvalue.69)

If `p1' points to object of type `struct s1', and p2 points at
the same location, then the expression `p2->b' raises UB, because
the objects simply doesn't have a `b' member (I think it's undefined,
because the Std doesn't specify it).

If `p1' points to an object without an effective type (allocated),
then I believe the first assignment impresses the new type on that
object.

There's no way here you can access the same struct object through
both "->y" and "->b" without UB.


[snip]
Consider the following implementation...
short: 2 bytes, 2 byte alignment
int: 4 bytes, 4 byte alignment
double: 16 bytes, 16 byte alignment
...and consider the layouts (where . is padding)...
struct s1: |xx..yyyy|
struct s2: |aa......bbbb....cccccccccccccccc|
Can you cite chapter and verse where an implementation cannot adopt
this choice of layouts?

It can.
Can you cite why this cannot be done "in practice"?

"In practice" means binary compatibility.
int main (void) {
void* p = malloc (sizeof (struct s1) + sizeof (struct s2));
if (p) f (p, p);
return 0;
}
[snip]

I can't see a distinction between this and the usual pointer aliasing
problems. If you want the compiler to make the assumption that p1
and p2 point to different locations, then you have to make them
restrict qualified. [That's what restrict is for, after all.]

It's not that they cannot point to the same object (they can), its
that you cannot legally access the object members by using member-access
from another struct.

Consider this:

void f(short *ps, double *pd)
{
*ps = 0;
*pd = 1; /* could this change `*ps'? - no */
*ps;
}


typedef struct { int i; short s; } Sshort;
typedef struct { int i; double d; } Sdouble;

void f(Sshort *ps, Sdouble *pd)
{
ps->i = 0;
pd->i = 1; /* could this change `*ps'? - no */
*ps->i;
/* what counts here is "ps->" and "pd->"; also note that "->i" mean
two different things, ie. members in two different name spaces */
}

union { Sshort s; Sdouble d; };

void g(Sshort *ps, Sdouble *pd)
{
ps->i = 0;
pd->i = 1; /* could this change `*ps'? - yes! ps and pd *can* point
to the same object and the Std provides exception
for such access */
*ps->i; /* must re-read the object */
}
 
S

S.Tobias

Christian Bau said:
The C Standard gives a guarantee that under certain circumstances
modifying an object through a pointer to some struct modifies another
object in a defined way. Example:
typedef struct { int x; double y; char z; } s1;
typedef struct { int a; double b; short c; } s2;
typedef union { s1 x; s2 y; } u;
int main (void) {
s1 x;
((s2 *) &x) -> b = 3.7;
return 0;
}
The assignment is guaranteed to set x.y to 3.7.

Basically I agree with what you say here and earlier, but in this
example we have a specific situation and the compiler may produce
specific code.

# One special guarantee is made in
# order to simplify the use of unions: If a union contains
# several structures that share a common initial sequence (see
# below), and if the union object currently contains one of
# these structures, it is permitted to inspect the common
# initial part of any of them anywhere that a declaration of
# the completed type of the union is visible.

The exception is made for the struct objects which are part of a union.
In your example the compiler may notice that `x' object is not part
of a union and is not obliged by this exception.

Otherwise, when the compiler processes separate translation units,
and when it can't trace the origin of lvalues (passed through pointers),
then yes, it must produce a general code which is aware of
the special guarantee.

How the compiler does
this is none of anyones business. A simple strategy to achieve this is
for the compiler to use the same layout for all initial sequences of all
structs. That means the offset of any one struct member depends only of
the type of that member and all preceeding members, but not on the type
of any following members.

"In practice" yes. But now I'm not sure if it can be actually
strictly proved. Suppose an implementation ("Made in Hell ;-)")
in which every struct object is "observed" and before each
member access a special code is generated that synchronizes
the resulting value of the expression with the previous write
access - then I think there would be no need to have the same
layout. But I don't really know, there are lots of other issues
to consider.
 
T

Tim Rentsch

Christian Bau said:
The C Standard gives a guarantee that under certain circumstances
modifying an object through a pointer to some struct modifies another
object in a defined way. Example:

typedef struct { int x; double y; char z; } s1;
typedef struct { int a; double b; short c; } s2;

typedef union { s1 x; s2 y; } u;

int main (void) {
s1 x;
((s2 *) &x) -> b = 3.7;
return 0;
}

The assignment is guaranteed to set x.y to 3.7. How the compiler does
this is none of anyones business. A simple strategy to achieve this is
for the compiler to use the same layout for all initial sequences of all
structs. That means the offset of any one struct member depends only of
the type of that member and all preceeding members, but not on the type
of any following members.

In my tiny example, it is clear that the compiler can achieve what is
guaranteed by the standard without using the same layout for s1 and s2.
All it needs to do is replace the left hand side of the assignment ((s2
*) &x)->b with x.y. That will achieve exactly what is guaranteed by the
C Standard. Yes, the simplest implementation will use the same offsetof
for s1.y and s2.b, but _that_ is not guaranteed by the C Standard.

The argument given tacitly assumes that the guarantees of 6.5.2.3 p5
are the only guarantees that affect what the compiler can do in this
case. In other words it assumes the very thing it purports to show.
Circular reasoning.

So the situation is: The C Standard guarantees X. A simple method to
achieve this in an implementation is to do Y. A compiler would in fact
have to work hard to achieve X without doing Y. This does _not_
guarantee Y in any way, it guarantees X.

A more accurate statement is: The C Standard guarantees X; it also
guarantees X', X'', X''', .... Now the question is, Does the union of
the guarantees X, X', X'', ..., imply Y?

What was given was an argument that "not (X implies Y)". That may be
true, but it's irrelevant to the question. Any reasoning that tries
to answer that question (and give the answer "No") needs to take into
account all the guarantees that are present in the C Standard, not
just those in 6.5.2.3.
 
P

Peter Nilsson

S.Tobias said:
Peter Nilsson said:
Christian Bau wrote:
[snip]
However, the compiler is free to assume that two pointers to
different struct types cannot access the same memory without
undefined behavior.
Again, why?

They are a little colloquial words, but I think he is right.

Apart from 'compiler', for which you can simply substitute the
term 'implementation', I can't see a colloquialism being used.
These assignments are made through int lvalues. The struct types are
incidental.

Not quite incidental:

# [#4] A postfix expression followed by the -> operator and an
# identifier designates a member of a structure or union
# object. The value is that of the named member of the object
# to which the first expression points, and is an lvalue.69)

If `p1' points to object of type `struct s1', and p2 points at
the same location, then the expression `p2->b' raises UB, because
the objects simply doesn't have a `b' member (I think it's
undefined, because the Std doesn't specify it).

But there is a member b in struct s2, to which p2 points, so I
have no idea why you think that section adds anything relevant.
If `p1' points to an object without an effective type (allocated),
then I believe the first assignment impresses the new type on that
object.

The only objects being modified are p1->y and p2->b, both have the
same type and potentially the same address. The type of any enclosing
object is irrelevant.
There's no way here you can access the same struct object through
both "->y" and "->b" without UB.

Chapter and verse please. I can't see how your cited section adds
anything relevant.
[snip]
Consider the following implementation...
short: 2 bytes, 2 byte alignment
int: 4 bytes, 4 byte alignment
double: 16 bytes, 16 byte alignment
...and consider the layouts (where . is padding)...
struct s1: |xx..yyyy|
struct s2: |aa......bbbb....cccccccccccccccc|
Can you cite chapter and verse where an implementation cannot adopt
this choice of layouts?

It can.
Can you cite why this cannot be done "in practice"?

"In practice" means binary compatibility.

Are you saying an int is not binary compatible (whatever that
means) with an int?
int main (void) {
void* p = malloc (sizeof (struct s1) + sizeof (struct s2));
if (p) f (p, p);
return 0;
}
[snip]

I can't see a distinction between this and the usual pointer aliasing
problems. If you want the compiler to make the assumption that p1
and p2 point to different locations, then you have to make them
restrict qualified. [That's what restrict is for, after all.]

It's not that they cannot point to the same object (they can), its
that you cannot legally access the object members by using member-
access from another struct.

Where does the standard state this?

How is the situation any different to...

stuct s { int i };

void foo(struct s *sp, int *ip)
{
sp->i = 0;
*ip = 1;
}

...
struct s s = { 42 };
foo(&s, &s->i);
Consider this:

void f(short *ps, double *pd)
{
*ps = 0;
*pd = 1; /* could this change `*ps'? - no */

It *could* change the representation.

Now you invoke UB because *ps could be a trap representation (if
*ps and *pd overlap memory.) But if *ps and *pd were to point
to objects of the same type, then there is no UB.
}


typedef struct { int i; short s; } Sshort;
typedef struct { int i; double d; } Sdouble;

void f(Sshort *ps, Sdouble *pd)
{
ps->i = 0;
pd->i = 1; /* could this change `*ps'? - no */

Yes it can.
*ps->i;
/* what counts here is "ps->" and "pd->"; ...

Why does it count? Your quoted section _only_ states that 'i' must
be a member of the struct pointed to by the given pointer expression
to the left of ->, which it is in both cases.
 
S

S.Tobias

Peter Nilsson said:
S.Tobias said:
Peter Nilsson said:
Christian Bau wrote:
So if you only have

#include <stdio.h>
#include <stdlib.h>

struct s1 { short x; int y; }
struct s2 { short a; int b; double c; }

void f (struct s1* p1, struct s2* p2) {
p1->y = 0;
p2->b = 1;
These assignments are made through int lvalues. The struct types are
incidental.

Not quite incidental:

# [#4] A postfix expression followed by the -> operator and an
# identifier designates a member of a structure or union
# object. The value is that of the named member of the object ^^^^^^^^^^^^^^======
# to which the first expression points, and is an lvalue.69)

If `p1' points to object of type `struct s1', and p2 points at
the same location, then the expression `p2->b' raises UB, because
the objects simply doesn't have a `b' member (I think it's
undefined, because the Std doesn't specify it).
But there is a member b in struct s2, to which p2 points, so I
have no idea why you think that section adds anything relevant.

I was thinking of a special situation here (sure, you can't read my
mind...), where the call is:
struct s1 s;
f(&s, (struct s2*)&s);

Of course `struct s2' (and lvalue `*p2') has a member `b',
but not the *object* that p2 points to.

The Standard does not define what happens when you apply `->' operator
to an lvalue which designates an object that doesn't have the specified
member.
The only objects being modified are p1->y and p2->b, both have the
same type and potentially the same address.

I have to correct myself again: the first assignment impresses `struct s1'
type, and the second impresses `struct s2'. The UB arises when you
try to access is with `p1->y' again (below).
The type of any enclosing
object is irrelevant.

It is, for the "p1->" part: object `*p1' *must* have the specified member,
or else there's UB. (IOW: `p1', to which `->' is applied must point
to an object that has that member.)
Chapter and verse please. I can't see how your cited section adds
anything relevant.

I can't give you a c&v, because it's not specified that way. Show
me that you can, and then I'll give you c&v that you can't.


[snip]
Consider the following implementation...
short: 2 bytes, 2 byte alignment
int: 4 bytes, 4 byte alignment
double: 16 bytes, 16 byte alignment
...and consider the layouts (where . is padding)...
struct s1: |xx..yyyy|
struct s2: |aa......bbbb....cccccccccccccccc|
Can you cite chapter and verse where an implementation cannot adopt
this choice of layouts?

It can.
Can you cite why this cannot be done "in practice"?

"In practice" means binary compatibility.
Are you saying an int is not binary compatible (whatever that
means) with an int?

What I meant to say was that a "practical" compiler must make sure
that "yyyy" and "bbbb" are aligned. I don't see any other
way other to satisfy all requirements if both structs are to be
defined in different translation units, and the linker is ignorant
of C. Christian has given enough argument for that.


int main (void) {
void* p = malloc (sizeof (struct s1) + sizeof (struct s2));
if (p) f (p, p);
return 0;
} [snip]

I can't see a distinction between this and the usual pointer aliasing
problems. If you want the compiler to make the assumption that p1
and p2 point to different locations, then you have to make them
restrict qualified. [That's what restrict is for, after all.]

It's not that they cannot point to the same object (they can), its
that you cannot legally access the object members by using member-
access from another struct.
Where does the standard state this?
How is the situation any different to...
stuct s { int i };
void foo(struct s *sp, int *ip)
{
sp->i = 0;
*ip = 1;
}

This is okay, `*ip' may alias `sp->i'.
...
struct s s = { 42 };
foo(&s, &s->i);
It *could* change the representation.

It could and it would raise UB either here (object with declared type) or ..

... here (object without a declared type, where the effective type
is impressed by the latest write access, or lvalue type, in that order).
This would break aliasing rules, even before accessing trap representation.

The conlusion is that the compiler may assume that *ps and *pd don't
alias the same object.

See 6.5, and the Rationale has many explanations.
Now you invoke UB because *ps could be a trap representation (if
*ps and *pd overlap memory.)

UB arises even before that, and even if there're no trap representations.
But if *ps and *pd were to point
to objects of the same type, then there is no UB.

Yes, it means they may alias the same object.
Yes it can.

Yes it can, leading to UB at some place, etc.

It's the same argument as above. What only differs is this:
The conclusion is that the expression `ps->i' doesn't alias `pd->i'
(subject to the exception that Christian talked about).

(If there was a third argument `int *pi', then of course `*pi' may
alias either ps->i or pd->i).
Why does it count? Your quoted section _only_ states that 'i' must
be a member of the struct pointed to by the given pointer expression
to the left of ->, which it is in both cases.

No! As I said above: when you apply `->', the *object* must have
the member `i'. If the lvalue doesn't have it - it's constraint violation.
If the object doesn't have it - it's UB.
 
P

Peter Nilsson

S.Tobias said:
Peter Nilsson said:
S.Tobias said:
Christian Bau wrote:
So if you only have

#include <stdio.h>
#include <stdlib.h>

struct s1 { short x; int y; }
struct s2 { short a; int b; double c; }

void f (struct s1* p1, struct s2* p2) {
p1->y = 0;
p2->b = 1;

These assignments are made through int lvalues. The struct types
are incidental.

Not quite incidental:

# [#4] A postfix expression followed by the -> operator and an
# identifier designates a member of a structure or union
# object. The value is that of the named member of the
object
^^^^^^^^^^^^^^======

But there is a member b in struct s2, to which p2 points, so I
have no idea why you think that section adds anything relevant.

I was thinking of a special situation here (sure, you can't read my
mind...), where the call is:
struct s1 s;
f(&s, (struct s2*)&s);

Of course `struct s2' (and lvalue `*p2') has a member `b',
but not the *object* that p2 points to.

The Standard does not define what happens when you apply `->' operator
to an lvalue which designates an object that doesn't have the specified
member.

Then the following code, which is in very common practice, is
undefined...

struct x { int a; double b; };
struct y { struct x x; int c; };

struct y y = { 0 };
struct x *xp = (struct x *) xp;
xp->a = 42;
xp->b = 4 * atan(1);

Your rule states that the object to which xp points to must have
a and b members. But the object being pointed to only has x and c
members.

But lets instead suppose I do something like...

struct x *xp = malloc(sizeof *yp);
struct y *yp = (struct y *) xp;
x->a = 42;
y->c = 42;

Is the second assignment undefined behaviour?

I think I understand your arguments, but I'm not convinced that the
interpretation is what the committee intended.

Note that 6.5.2.3p3 (the one before the one you quote) uses different
wording...

A postfix expression followed by the . operator and an identifier
designates a member of a structure or union object. The value is
that of the named member, and is an lvalue if the first expression
is an lvalue. If the first expression has qualified type, the
result has the so-qualified version of the type of the designated
member.

So, in the original code by Christian, 'p2->b = 1;' is UB, but it
seems that '(*p2).b = 1;' is okay!

Like I said, I don't think that's the committee's intent.
 
S

S.Tobias

Peter Nilsson said:
S.Tobias wrote:

Then the following code, which is in very common practice, is
undefined...
struct x { int a; double b; };
struct y { struct x x; int c; };
struct y y = { 0 };
struct x *xp = (struct x *) xp;
ITYM: (struct x *) &y;
xp->a = 42;
xp->b = 4 * atan(1);
Your rule states that the object to which xp points to must have
a and b members. But the object being pointed to only has x and c
members.

Everything is all right: `y' contains a subobject `x' which contains
members `a' and `b', and the pointer `xp' points to it by virtue of
6.7.2.1#12 (n869.txt, "suitably converted").
But lets instead suppose I do something like...
struct x *xp = malloc(sizeof *yp);
struct y *yp = (struct y *) xp;
x->a = 42;
y->c = 42;
Is the second assignment undefined behaviour?

I don't know, but I think it's not UB. The question here is when
and how an allocated object becomes a struct.

I believe (I can't give you any references to support this right now) that
with allocated objects (a) the object has a (effective) type such
as you (the programmer) want it to have (by the rules 6.5#6); and
(b) the access semantics are meant to be the same as those for
the declared objects, subject to (a).

I think, when you assign a member, the allocated object (or its part)
becomes that structure (with that member), and remaining members
become indeterminate.

I believe that the second assignment in your example augments the
structure, but I don't really know. Perhaps a more tricky example
is worth looking at:

struct y2 { struct x x; int d; } *y2p = (struct y2 *)xp;
x->a = 42;
y->c = 42; //augments?
y2->d = 54; //changes type
x->a; y2->x->a; //UB?
I think I understand your arguments, but I'm not convinced that the
interpretation is what the committee intended.

Then the simplest thing to do is to ask them in csc; I won't do it
now, because I'm not prepared for the discussion yet. In fact,
I was hoping that someone of the Elders of the C Tribe would add
their comments in this discussion, so that I could become more
convinced (either way), too.

+++

I think that the issues here are similar to those in the
discussion "contiguity of arrays" last year (about "int a[2][2];
a[0][2];"). The problem is not what is where, but what the
compiler believes is where. Answering to my post (Message-ID:
<[email protected]>) Douglas Gwyn informally agreed to
my supposition that in order to access a subobject, you have to
explicitly give the full "path" to it, referring to it, and its
containing objects (see the post, I might be bending the interpretation
here); values of pointers that you get by some miracle
"out of nowhere" will cause UB at some point or another. I feel
that this intention might equally apply to accessing members in a
structure or union (ie. you can't access a member without referring
to its containing struct at some place), but I wouldn't vow for its truth.

+++
Note that 6.5.2.3p3 (the one before the one you quote) uses different
wording...
A postfix expression followed by the . operator and an identifier
designates a member of a structure or union object. The value is
that of the named member, and is an lvalue if the first expression
is an lvalue. If the first expression has qualified type, the
result has the so-qualified version of the type of the designated
member.
So, in the original code by Christian, 'p2->b = 1;' is UB, but it
seems that '(*p2).b = 1;' is okay!

I'm sorry, but you have to give me more clue what the difference is.
Both specifications refer to an "object". The only substantial
difference I see is that "->" expression is always lvalue (pointers
cannot point to temporaries), whereas "." expression is either
lvalue or rvalue. I think both forms above are equivalent.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Pointer casts for OOP 2
Casts 81
casts and pointers 0
function casts 27
Union and pointer casts? 13
Help with pointers 1
Sizes of pointers 233
casts and lvalues 68

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,022
Latest member
MaybelleMa

Latest Threads

Top