Can *common* struct-members of 2 different struct-types, that are thesame for the first common membe

J

John Reye

Assume identical common top struct members:

struct a {
int i1;
char c1;
short sa1[3];
};

struct b {
int i2;
char c2;
short sa2[3];

int differ_here;
};


struct a tmp;



Are the following 2 line always equivalent (as in: yielding the same
lvalue) and allowed:

tmp.c1
((struct b*)&tmp)->c1

Thanks.
- John.
 
J

James Kuyper

Assume identical common top struct members:

struct a {
int i1;
char c1;
short sa1[3];
};

struct b {
int i2;
char c2;
short sa2[3];

int differ_here;
};


struct a tmp;



Are the following 2 line always equivalent (as in: yielding the same
lvalue) and allowed:

tmp.c1
((struct b*)&tmp)->c1

No, the behavior is undefined. It's not because the members might be a
different locations in the two structs; that's possible but unlikely.
The real reason is the anti-aliasing rules (6.5p7). Because those rules
make the behavior of such code undefined, the implementation is not
obligated to consider the possibility that an lvalue referring to a
"struct a" object refers to the same object as one that refers to a
"struct b" object. As a result, when implementing code such as the
following:

tmp.c1 = 1;
printf("%d\n", ((struct b*)&tmp)->c1);

An implementation is not required to notice that the printf() is
referring to the same object as the assignment statement. As a result,
it could, for instance, defer the writing the new value to tmp.c1 until
after executing the printf() call. That's pretty unlikely in this simple
case, but gets more likely in more complicated code when aggressive
optimization is turned on.

There's a special exception that allows you to access the "common
initial sequence" of any two struct types, using either struct type,
when they are both members of the same union (6.5.2.3p6):

union ab
{
struct a one;
struct b two;
} pair;
pair.one.c1 = 1
printf("%d\n", pair.two.c1);
 
J

John Reye

Assume identical common top struct members:
struct a {
  int i1;
  char c1;
  short sa1[3];
};
struct b {
  int i2;
  char c2;
  short sa2[3];
  int differ_here;
};
struct a tmp;
Are the following 2 line always equivalent (as in: yielding the same
lvalue) and allowed:
tmp.c1
((struct b*)&tmp)->c1

No, the behavior is undefined. It's not because the members might be a
different locations in the two structs; that's possible but unlikely.
The real reason is the anti-aliasing rules (6.5p7). Because those rules
make the behavior of such code undefined, the implementation is not
obligated to consider the possibility that an lvalue referring to a
"struct a" object refers to the same object as one that refers to a
"struct b" object. As a result, when implementing code such as the
following:

        tmp.c1 = 1;
        printf("%d\n", ((struct b*)&tmp)->c1);

An implementation is not required to notice that the printf() is
referring to the same object as the assignment statement. As a result,
it could, for instance, defer the writing the new value to tmp.c1 until
after executing the printf() call. That's pretty unlikely in this simple
case, but gets more likely in more complicated code when aggressive
optimization is turned on.

There's a special exception that allows you to access the "common
initial sequence" of any two struct types, using either struct type,
when they are both members of the same union (6.5.2.3p6):

        union ab
        {
                struct a one;
                struct b two;
        } pair;
        pair.one.c1 = 1
        printf("%d\n", pair.two.c1);

Ah thanks James. Your replies are always much appreciated.

By the way does the union trick also work, when the first part is
common, but the 2 diverge afterwards.
Example:

struct a {
int i1;
char c1;
short sa1[3];

char u;
};

struct b {
int i2;
char c2;
short sa2[3];

int differ_here;
};



union ab
{
struct a one;
struct b two;
} pair;


The sizeof(union ab) will be the maximum value, of course.
So some union access-members (e.g. short structs), do not give me
access to the whole union (as described by the maximum struct). Right?
 
J

James Kuyper

Assume identical common top struct members:
struct a {
int i1;
char c1;
short sa1[3];
};
struct b {
int i2;
char c2;
short sa2[3];
int differ_here;
};
struct a tmp;
Are the following 2 line always equivalent (as in: yielding the same
lvalue) and allowed:
tmp.c1
((struct b*)&tmp)->c1

Correction: that should be ->c2, presumably.

Same correction here.

Correction: that should have been pair.two.c2.
Ah thanks James. Your replies are always much appreciated.

By the way does the union trick also work, when the first part is
common, but the 2 diverge afterwards.

It works for the entire initial common sequence, no matter how many
additional members either struct type has after the common part.
Corresponding members of the common sequence must have compatible types;
for bit-fields, they must also have the same width.
Example:

struct a {
int i1;
char c1;
short sa1[3];

char u;
};

struct b {
int i2;
char c2;
short sa2[3];

int differ_here;
};



union ab
{
struct a one;
struct b two;
} pair;


The sizeof(union ab) will be the maximum value, of course.
So some union access-members (e.g. short structs), do not give me
access to the whole union (as described by the maximum struct). Right?

Correct.
 
B

Barry Schwarz

Assume identical common top struct members:

struct a {
int i1;
char c1;
short sa1[3];
};

struct b {
int i2;
char c2;
short sa2[3];

int differ_here;
};


struct a tmp;



Are the following 2 line always equivalent (as in: yielding the same
lvalue) and allowed:

tmp.c1
((struct b*)&tmp)->c1

Since struct b does not contain a member c1, the second line should
produce a diagnostic.

I appear to be in the minority. If you change the second line to
((struct b*)&tmp)->c2
and if the two structures are guaranteed to have the same alignment,
then I believe the requirement in 6.5.2.3-5 (which technically only
applies if the two structures are members of a union) would force the
compiler to generate the appropriate code to yield the same lvalue.
This would probably work everywhere except the DS9000.
 
J

James Kuyper

Assume identical common top struct members:

struct a {
int i1;
char c1;
short sa1[3];
};

struct b {
int i2;
char c2;
short sa2[3];

int differ_here;
};


struct a tmp;



Are the following 2 line always equivalent (as in: yielding the same
lvalue) and allowed:

tmp.c1
((struct b*)&tmp)->c1

Since struct b does not contain a member c1, the second line should
produce a diagnostic.

I appear to be in the minority. If you change the second line to
((struct b*)&tmp)->c2
and if the two structures are guaranteed to have the same alignment,
then I believe the requirement in 6.5.2.3-5 (which technically only
applies if the two structures are members of a union) would force the
compiler to generate the appropriate code to yield the same lvalue.

It might generate a retrieval using the same offset from the base of the
struct, but the key issue is whether the value it retrieves from that
location is the one that would, otherwise be considered the "current" value.
This would probably work everywhere except the DS9000.

So the DS9000 is the only platform that would aggressively optimize
based upon the fact that a "struct a" lvalue could never, with defined
behavior, refer to the same object as a "struct b" lvalue?
I'd thought that the best modern optimizers were more aggressive than
that, at least at their highest levels.
 
J

John Reye

the implementation is not
obligated to consider the possibility that an lvalue referring to a
"struct a" object refers to the same object as one that refers to a
"struct b" object. As a result, when implementing code such as the
following:

        tmp.c1 = 1;
        printf("%d\n", ((struct b*)&tmp)->c1);

An implementation is not required to notice that the printf() is
referring to the same object as the assignment statement.


Hmmm... I think that would be one heck of a rubbish compiler (or more
precisely *optimizing* compiler)! ;)
If the standard allows that kind of stuff, then it simply is not
bullet-proof enough.

Because tmp occurs in both lines. Every compiler should notice that!

Even if I "obfuscate" like this:
tmp.c1 = 1;
char *cp = ((struct b*)&tmp);
printf("%d\n", cp->c1);

I'd expect any compiler that gets it wrong, to be complete rubbish.
Why? Because it's an optimizer BUG.
Why?
Because any simple compiler, that does not optimize... get's it right!
And if any simple compiler get's it right, then any optimization must
guarantee to get it right as well.

I mean: if the C standard allows one to create optimizers that result
in such ... ummm... surprises (read: "rubbish"), then the standard is
faulty in my eyes.
 
J

John Reye

In fact...

then any use of pointers at all would fail.
char a;
char *p;

a = 1;
*p = 2;
printf("%d", a);

This will never print 1, unless the compiler is buggy.


In the same spirit... any compiler that gets the following wrong is
buggy:
tmp.c1 = 1;
printf("%d\n", ((struct b*)&tmp)->c2);

An implementation is not required to notice that the printf() is
referring to the same object as the assignment statement. As a result,
it could, for instance, defer the writing the new value to tmp.c1 until
after executing the printf() call. That's pretty unlikely in this simple
case, but gets more likely in more complicated code when aggressive
optimization is turned on.

It's not complicated. Rather... I suspect it would be a optimizer-
compiler bug.
 
J

John Reye

Ahh on the other hand, I might have gotten carried away here.
tmp.c1 = 1;
printf("%d\n", ((struct b*)&tmp)->c1);

I would always avoid something like this (not because of aliasing
rules, and compiler optimization), but because of the reasons given by
the mysterious 2nd poster (copied below) and because it's completely
unnecessary!

There is simply no need to do something like this. One can always
introduce an inner struct for the common part.
So in my above arguments, I forgot that I was arguing for something
that this is very very bad style anyway... to cast from type struct,
to a different type like that. So what I said about "rubbish
compiler's" is probably completely out of context. Sorry.

The guarantee only applies to the first member. The work around is to
make each first field itself the same struct:

struct common {
int i; char c;
};
struct a {

struct common x1;> short sa1[3];
struct b {

struct common x2;
short sa2[3];
int differ_here;
};
tmp.x1.c
((struct b*)&tmp)->x2.c

&struct a = &struct a.x1
&struct b = &struct b.x2
so if &struct a = &struct b
&struct a.x1 = &struct b.x2
and typeof struct a.x1 = typeof struct b.x2 = typeof struct common
therefore for each f in struct common
&struct a.x1.f = &struct b.x2.f

--
My name Indigo Montoya. | R'lyeh 38o57'6.5''S 102o51'16''E.
You flamed my father. | I'm whoever you want me to be.
Prepare to be spanked. | Annoying Usenet one post at a time.
Stop posting that! | At least I can stay in character.
 
J

John Reye

(where did my message go? OK I'll repost)


Ah I think I got carried away above. Sorry.

The main problem was that I was putting down possible optimizing
compilers, while the reality is, that this statement itself is bloody
bad, and should be avoided at all costs.

struct a tmp;
((struct b*)&tmp)->c2

Reason: one casts from one type to a completely different unrelated
type.
Rather one should use an common inner struct, for the common parts
within structs. That's a simple way of not getting bitten. ;)
 
J

John Reye

In fact...

then any use of pointers at all would fail.
char a;
char *p;

a = 1;
*p = 2;
printf("%d", a);

This will never print 1, unless the compiler is buggy.

Correction in 2nd line of code:
char *p = &a;
 
N

Nobody

Hmmm... I think that would be one heck of a rubbish compiler (or more
precisely *optimizing* compiler)! ;)

If the standard allows that kind of stuff, then it simply is not
bullet-proof enough.

Because tmp occurs in both lines. Every compiler should notice that!

The compiler is free to lose track of that information during
optimisation, and may well do so.
Why? Because it's an optimizer BUG.

A bug is a failure to behave as documented, not a failure to behave
according to the intuition of some guy on usenet.

If it were otherwise, all compilers would have bugs, and those bugs would
be impossible to fix, as different people's intutions are different and
often contradictory.
Why?
Because any simple compiler, that does not optimize... get's it right!
And if any simple compiler get's it right, then any optimization must
guarantee to get it right as well.

The behaviour of a "simple" compiler does not form a part of the standard.
I mean: if the C standard allows one to create optimizers that result
in such ... ummm... surprises (read: "rubbish"), then the standard is
faulty in my eyes.

Whether or not it is faulty "in your eyes" is irrelevant. The standard
is what it is.

The standard explicitly and intentionally facilitates optimisation, rather
than forcing the majority of real-world code to be suboptimal for the sake
of pathological cases. This is also why "const" and "volatile" were added
to C89 and "restrict" to C99, in spite of being completely unnecessary for
"simple" (i.e. non-optimising) compilers.
 
J

John Reye

The compiler is free to lose track of that information during
optimisation, and may well do so.

char a;
char *p = &a;

a = 1;
*p = 2;
printf("%d\n", a);

Does the C standard guarantee, that the value 2 will get printed in
the above code??
Thanks.
 
J

Jens Gustedt

Am 05/03/2012 08:45 PM, schrieb John Reye:
char a;
char *p = &a;

a = 1;
*p = 2;
printf("%d\n", a);

Does the C standard guarantee, that the value 2 will get printed in
the above code??

no

for that you'd have to declare them

char volatile a;
char volatile *p = &a;

Jens
 
J

John Reye

no

for that you'd have to declare them

char volatile a;
char volatile *p = &a;

Jens


Thanks!!!
So it would seem that good C coders sprinkle a lot of volatile. ;)
 
J

John Reye

Assume no volatile:

char a;
char *p = &a;

Does the C standard guarantee that the following will print 2? ->

a = 1, *p = 2, printf("%d\n", a);

Thanks.
 
J

James Kuyper

Am 05/03/2012 08:45 PM, schrieb John Reye:

no

Why not? In the abstract machine those statements must be executed in
sequence. There's a sequence point at the end of each statement, which
means that side effects such as the change in value of a in the second
statement, must be complete by the time the value of 'a' is read by the
third statement.
for that you'd have to declare them

char volatile a;
char volatile *p = &a;

Why should that be necessary?
 
J

James Kuyper

char a;
char *p = &a;

a = 1;
*p = 2;
printf("%d\n", a);

Does the C standard guarantee, that the value 2 will get printed in
the above code??

Yes.

The key point is that *p and a have the same type, so there's no
violation of the anti-aliasing rules. There's also a special exemption
in those rules for character types, so the behavior would remain
well-defined even if 'a' had the type 'int'.
 
J

James Kuyper

In fact...

then any use of pointers at all would fail.
char a;
char *p;

a = 1;
*p = 2;
printf("%d", a);

How do you reach that conclusion? That use of pointers doesn't violate
the anti-aliasing rules.
 
J

John Reye

Yes.

The key point is that *p and a have the same type, so there's no
violation of the anti-aliasing rules. There's also a special exemption
in those rules for character types, so the behavior would remain
well-defined even if 'a' had the type 'int'.

Great, thanks.
(I'm quite relieved, otherwise I might have started sprinkling
unnecessary "volatiles"!!)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,731
Messages
2,569,432
Members
44,832
Latest member
GlennSmall

Latest Threads

Top