Nick said:
Eric Sosman said:
My copy of the Standard is silent on all points, since I lack
a text-to-speech program that will read it aloud to me. If I had such
a program, though, I'd request it to perform 6.2.7p1.
Grossly unfair. That's a perfectly normal (slightly legalistic) term
for "the document doesn't express[1] anything about this"
I believe Mr Sosman was only making a little joke, I find he has a dry
wit that can sometimes pass you by.
However, this reference is interesting. Let's think about these two
translation units.
/* begin TU1 */
struct foo { int a; int b; } foo;
extern void fn2(void *);
void fn1(void)
{
fn2(&foo);
}
/* end TU1 */
/* begin TU2 */
struct bar { int x; int y; };
void fn2(void *vp)
{
struct bar *bp = vp;
bp->x++;
}
/* end TU2 */
As I read that paragraph, this code is not Standard-conforming: to make
it conforming, struct bar would have to be called struct foo, and the
fields would also need to have the same names.
Right.
This doesn't make much sense to me because TUs are compiled
independently. So when TU2 is compiled, how could the compiler ever know
what struct bar or its fields happened to be called in TU1? Maybe by that
point the source file to TU1 has been deleted and only a stripped object
file remains!
As you say, the compiler doesn't "know" about the content of one
TU while translating another. That's why the behavior is "undefined,"
rather than an error the compiler is required to diagnose.
Why should it be an error? Because the two struct types are not
the same, even though they look the same. This happens with other
types, too: `char*' and `void*' look the same but are different types;
on many systems `int' and `long' look the same but are different, on
many systems `double' and `long double' look the same but are different,
and so on. If a programmer writes `char *cp; void *vp;' he states his
intent to create two variables of two distinct types, even though they
happen to share the same representation. If he writes `struct foo u;
struct bar v;', he similarly asks for the variables to have different
types even if they happen to look the same.
Strictly speaking, it is not even required that `struct foo' and
`struct bar' have the same representation! No sane compiler will make
them different, but the Standard doesn't forbid it. There's just no
pressing reason to require identical representations for types that
the programmer obviously intends to be different (if he'd wanted them
to be the same, he'd have created only one type).
Now let's look at optimization and efficiency. Take the two struct
types as given, and ponder this (silly) function:
int f(struct foo *pf, struct bar *pb) {
int n = 0;
for (pf->a = 0; pf->a < 10; pf->a++) {
pb->x++;
n++;
}
return n;
}
Can the compiler optimize this into the equivalent of
int f(struct foo *pf, struct bar *pb) {
pf->a = 10;
pb->x += 10;
return 10;
}
? Yes, it can. Since the parameters point to different types (and
since there's no union in sight that could hold both), the compiler
is allowed to assume that they point to distinct objects, and that
modifications to `*pf' and `*pb' don't interfere with each other
(see 6.5p7). If the parameters could point at the same object, the
optimization would not be valid (and the returned value would not
necessarily be ten).
Another reason it "should" be an error to mix types that just
happen to look alike is that their resemblance may be transitory.
The programmer has gone to the trouble of creating two distinct types,
suggesting two distinct purposes. They happen to look alike today,
but tomorrow the programmer may decide to add a `z' element to
`struct bar' -- and suddenly all the code that blithely assumed the
two structs were interchangeable stops compiling. More subtly, he
might leave the structs with the same elements, but (for some reason)
decide to switch the order in one of them:
struct foo { int a; int b; };
struct bar { int y; int x; }; // y is now first
If the structs were interchangeable (and interchanged), the program
would still compile -- but it would now have an entirely different
meaning: Altering the `a' element of a `struct foo' would now change
the `y' of an aliased `struct bar' instead of the `x'. Is this an
outcome you think would be a useful language feature?
When you think about it, the real anomaly in C is that different
TU's can utter independent declarations of the same type and somehow
have those independent declarations agree. This is weird! It's at
odds, in a way, with good software engineering practice: In a big
program, there should be one and only one "authoritative" description
for each type, object, and function, some kind of "data dictionary"
and not a bunch of free-floating independent declarations that you
sort of hope you'll keep synchronized as the program mutates. But
C doesn't have the meta-linguistic machinery to support such things,
so we're forced to rely on textual similarities. 6.2.7 describes how
much we can and cannot rely on.
A final thought: More than the users of other languages I know
of, C programmers seem obsessed with the representations of the values
and objects their programs manipulate. There are in truth times when
representations must be dealt with directly, but much C code would be
improved if the programmers could just forget about representation for
a moment or two and think about the values instead. I almost never
care how many bits are in a `double'; instead, I care about range and
precision and accumulated round-off error. Same thing with structs:
I almost never care where the padding is or isn't, nor even about the
order of their elements; I care about what the elements are, what they
mean, and what values are stored in them. You'll be a better architect
if you think more about the building and less about the bricks.