An example of unions not being type safe?

Chad · Aug 15, 2007

Okay, so like recently the whole idea of using a Union in C finally
sunk into my skull. Seriously, I think it probably took me 2 years to
catch on what a Union really is. Belated, I mentioned this too my
ultra smart friend who just quit working as a CTO of a wireless
company so he go complete his PhD in particle physics. Anyhow he
mentioned that Unions in C are not typesafe.

Now, how is it possible to violate type safety in Unions?

Chad

Scott Fluhrer · Aug 15, 2007

Chad said:
Okay, so like recently the whole idea of using a Union in C finally
sunk into my skull. Seriously, I think it probably took me 2 years to
catch on what a Union really is. Belated, I mentioned this too my
ultra smart friend who just quit working as a CTO of a wireless
company so he go complete his PhD in particle physics. Anyhow he
mentioned that Unions in C are not typesafe.

Now, how is it possible to violate type safety in Unions?

#include <stdio.h>

int main(void) {
union {
int i;
double d;
} z;

z.d = 42.7;
printf( "%d\n", z.i ); /* Oops */
return 0;
}

Old Wolf · Aug 16, 2007

z.d = 42.7;
printf( "%d\n", z.i ); /* Oops */

Well, you aren't allowed to do that. Would you
also say that an int is not typesafe because
you can write:

int x = 10;
printf("%f", *(double *)&x); /* oops */

? If you follow the rules as to what you are
permitted to do with unions, then I don't see
any violation of type safety.

Keith Thompson · Aug 16, 2007

Old Wolf said:
Well, you aren't allowed to do that. Would you
also say that an int is not typesafe because
you can write:

int x = 10;
printf("%f", *(double *)&x); /* oops */

? If you follow the rules as to what you are
permitted to do with unions, then I don't see
any violation of type safety.

You aren't allowed to do it, but the prohibition is not enforced; it
invokes undefined behavior.

You can play similar games with pointer conversions, but storing a
value in one union member and then reading the value of another is an
easier mistake to make (or an easier thing to do deliberately if you
don't mind undefined behavior). Some programmers have the attitude
that that's just what unions are for (and I'm not convinced that
they're entirely wrong).

Richard Heathfield · Aug 16, 2007

Old Wolf said:

Well, you aren't allowed to do that.

False. The behaviour is implementation-defined in C90, and undefined in
C99. Taking advantage of either behaviour is *not* forbidden, although
it does render your program non-portable. I'm not condoning the above
by any means, but it is not true that "you aren't allowed to do that",
any more than it is true that you are not allowed to define an array
with 1 << CHAR_BIT elements (implementation-defined behaviour) or
dereference a pointer to video memory under MS-DOS (undefined
behaviour).

Would you
also say that an int is not typesafe because
you can write:

int x = 10;
printf("%f", *(double *)&x); /* oops */

Yes. What makes it possible to break the type system is the existence of
multiple types and the ability to convert between them.

<snip>

Keith Thompson · Aug 16, 2007

Richard Heathfield said:
Old Wolf said: [...]

Would you
also say that an int is not typesafe because
you can write:

int x = 10;
printf("%f", *(double *)&x); /* oops */

Click to expand...

Yes. What makes it possible to break the type system is the existence of
multiple types and the ability to convert between them.

Not quite. Being able to convert between types doesn't necessarily
break the type system. For example, a hypothetical 100% typesafe
language might freely allow conversions among different numeric types.
This:
int x = 10;
printf("%f", (double)x);
doesn't break type safety.

What does break type safety in C is the ability to treat an object of
one type as if it were an object of a different type, *without*
coverting the value of the object (i.e., type-punning). Unions and
pointer conversions make this possible.

Old Wolf · Aug 16, 2007

Old Wolf said:

False. The behaviour is implementation-defined in C90, and undefined in
C99.

C90 does not have aliasing rules? (Notwithstanding
clauses specific to unions).

Richard Heathfield · Aug 16, 2007

Keith Thompson said:

Richard Heathfield said:
Richard Heathfield said:

Old Wolf said: [...]

Would you
also say that an int is not typesafe because
you can write:

int x = 10;
printf("%f", *(double *)&x); /* oops */

Click to expand...

Yes. What makes it possible to break the type system is the existence
of multiple types and the ability to convert between them.

Click to expand...

Not quite. Being able to convert between types doesn't necessarily
break the type system.

I didn't say it did. I said that it was what makes it *possible* to
break the type system. Now, I will accept that it may not be
sufficient, but it is certainly a prerequisite. Without the ability to
convert between types, there is no mechanism for breaking the type
system; and without multiple types, there isn't a type system to break.

The existence of a mechanism for breaking the type system does not imply
that that mechanism must be used. Therefore, the ability to convert
between types doesn't of itself break the type system. Such a mechanism
does, however, remove a significant barrier to such breakage.

For example, a hypothetical 100% typesafe
language might freely allow conversions among different numeric types.
This:
int x = 10;
printf("%f", (double)x);
doesn't break type safety.

What does break type safety in C is the ability to treat an object of
one type as if it were an object of a different type, *without*
coverting the value of the object (i.e., type-punning). Unions and
pointer conversions make this possible.

You're making the same mistake again. That ability does not break type
safety. It merely opens the door to such breakage. Nor are unions and
pointer conversions *necessary* for breaking type safety, since C
thoughtfully provides at least one other type-safety-breaking
mechanism: the ability to write values to, and subsequently read those
values from, a file.

Richard Heathfield · Aug 16, 2007

Old Wolf said:

C90 does not have aliasing rules?

It's hard to prove a negative. Can you show that C90 /does/ have
aliasing rules that disallow the specific code given?

For "disallow", I will on this occasion accept the following meanings:

* the code violates a constraint
* the code contains a syntax error
* the behaviour of the code is undefined [1]

Note that 3.3.2.3 of C89 renders the behaviour implementation-defined,
so any answer has to be able to trump that rendering. C90 will have
different numbering - possibly 6.3.2.3 - but it is well-known that C89
and C90 have basically the same text, except for the insertion of three
"noise" sections.

(Notwithstanding
clauses specific to unions).

The *code* is specific to unions!

[1] In fact, there is no prohibition on taking advantage of undefined
behaviour, but I want to give you every chance to back up your claim
that "you aren't allowed to do that". What I won't do is agree that
implementation-defined behaviour is not allowed.

Old Wolf · Aug 16, 2007

Richard said:
Old Wolf said:

It's hard to prove a negative. Can you show that C90 /does/ have
aliasing rules that disallow the specific code given?

I don't have a copy of C90, hence why I'm asking you.

Note that 3.3.2.3 of C89 renders the behaviour implementation-defined,
so any answer has to be able to trump that rendering.

Yes. For example, in C99 the behaviour is undefined
because it is explicitly undefined to alias a float
as an int (regardless of whether a union is involved).
It seems a reasonable assumption that C90 has similar
aliasing rules, although I'm unable to look it up.

Robert Gamble · Aug 16, 2007

Old Wolf said:

False. The behaviour is implementation-defined in C90, and undefined in
C99.

It is actually undefined in C90 (the fact that the Standard states it
is implementation-defined is a defect). In C99 it is only undefined
behavior if the result is a trap representation.

Robert Gamble

Richard Heathfield · Aug 16, 2007

Robert Gamble said:

On Aug 15, 8:35 pm, Richard Heathfield <[email protected]> wrote:

[Context: union type-punning]

It is actually undefined in C90 (the fact that the Standard states it
is implementation-defined is a defect).

No, it is actually implementation-defined in C90. The fact that the C90
Standard states it is implementation-defined means that it is
implementation-defined in the C90 Standard. If the ISO C Committee
decide that it constitutes a defect in that Standard, they can issue a
TC - and presumably they did. But that doesn't affect the C90 Standard.
It only affects a Standard that we might reasonably call "C90 + TCs 1
to n", where n is the number of the TC in which the correction was
made.

Not even the ISO C Committee can change the past.

<snip>

Keith Thompson · Aug 16, 2007

Richard Heathfield said:
Keith Thompson said:

Richard Heathfield said:

Old Wolf said: [...]
Would you
also say that an int is not typesafe because
you can write:

int x = 10;
printf("%f", *(double *)&x); /* oops */

Yes. What makes it possible to break the type system is the existence
of multiple types and the ability to convert between them.

Click to expand...

Not quite. Being able to convert between types doesn't necessarily
break the type system.

Click to expand...

I didn't say it did. I said that it was what makes it *possible* to
break the type system. Now, I will accept that it may not be
sufficient, but it is certainly a prerequisite. Without the ability to
convert between types, there is no mechanism for breaking the type
system; and without multiple types, there isn't a type system to break.

The existence of a mechanism for breaking the type system does not imply
that that mechanism must be used. Therefore, the ability to convert
between types doesn't of itself break the type system. Such a mechanism
does, however, remove a significant barrier to such breakage.

[...]

I think we're saying the same thing in different words.

I'm thinking of "breaks the type system" and "makes it possible to
break the type system" as being essentially synonymous. Being able to
convert between types is not sufficient to make it possible to break
the type system (if, for example, only conversions among numeric types
are allowed).

I think the difference is that I'm talking about language features
breaking the type system, and you're talking about programs actually
*using* those features to break the type system. A language that
allows arbitary assignments between distinct types has a broken type
system, even if it's possible to write programs in that language that
don't take advantage of that brokenness.

Note that "broken" isn't necessarily a bad thing. Sometimes you want
to bypass type checking.

Richard Heathfield · Aug 16, 2007

Keith Thompson said:

I think we're saying the same thing in different words.

No, we *are*!

Keith Thompson · Aug 16, 2007

Richard Heathfield said:
Robert Gamble said:

On Aug 15, 8:35 pm, Richard Heathfield <[email protected]> wrote:

Click to expand...

[Context: union type-punning]

It is actually undefined in C90 (the fact that the Standard states it
is implementation-defined is a defect).

Click to expand...

No, it is actually implementation-defined in C90. The fact that the C90
Standard states it is implementation-defined means that it is
implementation-defined in the C90 Standard. If the ISO C Committee
decide that it constitutes a defect in that Standard, they can issue a
TC - and presumably they did. But that doesn't affect the C90 Standard.
It only affects a Standard that we might reasonably call "C90 + TCs 1
to n", where n is the number of the TC in which the correction was
made.

Not even the ISO C Committee can change the past.

Strictly speaking, the committee corrected the C90 standard by issuing
the C99 standard. They're not going to issue corrections to a
standard that's officially obsolete. (I know that C90 is still a de
facto current standard, but the committee isn't going to act on that
basis.)

Richard Heathfield · Aug 16, 2007

Keith Thompson said:

Strictly speaking, the committee corrected the C90 standard by issuing
the C99 standard. They're not going to issue corrections to a
standard that's officially obsolete. (I know that C90 is still a de
facto current standard, but the committee isn't going to act on that
basis.)

All absolutely true, of course. Nevertheless, C90 remains topical here,
and in C90 (sans TCs) the behaviour in question is, and will always
remain, implementation-defined.

Keith Thompson · Aug 16, 2007

Richard Heathfield said:
Keith Thompson said:

All absolutely true, of course. Nevertheless, C90 remains topical here,
and in C90 (sans TCs) the behaviour in question is, and will always
remain, implementation-defined.

True, but the flaw is that there are implementations on which the
behavior cannot reasonably be defined (unless the definition says
"This can yield arbitrary results that can blow up in your face").

Richard Tobin · Aug 16, 2007

z.d = 42.7;
printf( "%d\n", z.i ); /* Oops */

[/QUOTE]

Well, you aren't allowed to do that. Would you
also say that an int is not typesafe because
you can write:

int x = 10;
printf("%f", *(double *)&x); /* oops */

No, I would say that casts aren't type-safe. Casts and unions are two
of the ways to have objects treated as the wrong type.

-- Richard

CBFalconer · Aug 16, 2007

Richard said:
No, I would say that casts aren't type-safe. Casts and unions
are two of the ways to have objects treated as the wrong type.

Apart from variadic function parameters, such as printf, most casts
are errors. I include unnecessary as an error.

Mark McIntyre · Aug 16, 2007

Old Wolf said:

False.

Too strong.

The behaviour is implementation-defined in C90, and undefined in
C99. Taking advantage of either behaviour is *not* forbidden,

In C99, its forbidden in the same sense that much of what we discuss
here as 'forbidden' or 'illegal' is.
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

An example of unions not being type safe?

Chad

Scott Fluhrer

Old Wolf

Keith Thompson

Richard Heathfield

Keith Thompson

Old Wolf

Richard Heathfield

Richard Heathfield

Old Wolf

Robert Gamble

Richard Heathfield

Keith Thompson

Richard Heathfield

Keith Thompson

Richard Heathfield

Keith Thompson

Richard Tobin

CBFalconer

Mark McIntyre

Members online

Forum statistics

Latest Threads