Stanard compliant bit-casting

Z

zr

Hi,

How can a value of type A be bit-casted to value of type B? By bit-
casting i mean that both values will have the same machine bit
representation. Let's assume that A and B both have the same size in
bits.

Here are a few methods which i can think of:
1)
A a;
B b;
b = *static_cast<B*>(&a)

2)
Using a union

3)
memcpy(&b, &a, sizeof(A));

Which are standard compliant? Is there another way?

TIA
 
R

Rolf Magnus

zr said:
Hi,

How can a value of type A be bit-casted to value of type B? By bit-
casting i mean that both values will have the same machine bit
representation.
reinterpret_cast.

Let's assume that A and B both have the same size in bits.

Here are a few methods which i can think of:
1)
A a;
B b;
b = *static_cast<B*>(&a)

2)
Using a union

3)
memcpy(&b, &a, sizeof(A));

Which are standard compliant?

2) is explicitly marked as invoking undefined behaviour. You are not allowed
to read any union member other than the one you have last written to.

I think the other ones are pretty much the same. I'm not sure if it's
undefined or only unspecified, but it's definitely compiler-specific. Of
course, you have to ensure that your A value is not a trap representation
for B. And A and B should be POD types.
Is there another way?

b = reinterpret_cast<B&>(a);
 
J

Joshua Maurice

Hi,

How can a value of type A be bit-casted to value of type B? By bit-
casting i mean that both values will have the same machine bit
representation. Let's assume that A and B both have the same size in
bits.

Here are a few methods which i can think of:
1)
A a;
B b;
b = *static_cast<B*>(&a)

static_cast is not what you want. It may do conversions.
2)
Using a union

Although technically undefined behavior according to the standard's
intent and a particular reading of the standard, (nearly?) all C and C+
+ compilers support type punning through a union as an extension. I
would strongly suggest having the union in scope when accessing any of
its members, though, due to a defect in the C and C++ standards.

(The defect is: separate compilation units + allowances to do strict
aliasing + unions = contradiction. If you define a union in one
translation unit, and let pointers to its members go to another
translation unit, then that other translation unit has no way to know
if those pointers alias or not. They might because they might both
point to the same union, but the strict aliasing rules are there for
the compiler to assume they do not. Ergo: bug in the C and C++
standards.)
3)
memcpy(&b, &a, sizeof(A));

I think this is the most standard compliant way of doing it.
Which are standard compliant? Is there another way?

Note that strongly suggested, but not spelled out anywhere literally,
in the C++ standard is the allowance to read to or write from any POD
type through a char pointer or unsigned char pointer (using
reinterpret_cast or static_cast through void pointer). memcpy of POD
types seems to be more strongly allowed, whereas the char pointer and
unsigned char pointer approach is not quite so guaranteed. (Also, if
you write a trap representation, you're on your own.)

Also note that this is a very black art as the standard is not the
most clear about it. Also note that all of this entirely
implementation dependent, and thus not portable. Also, you better know
what you're doing.
 
S

SG

Rolf said:
2) is explicitly marked as invoking undefined behaviour. You are not allowed
to read any union member other than the one you have last written to.

Can you quote the standard on that one? Is it implied by some of the
other rules? There is 3.10/15 which seems relevant. But I didn't find
anything else specific to unions except that at most one member of a
union can be "active". According to 3.10/15 you seem to exclude some
valid uses with your statement.

Cheers,
SG
 
J

Joshua Maurice

Can you quote the standard on that one? Is it implied by some of the
other rules? There is 3.10/15 which seems relevant. But I didn't find
anything else specific to unions except that at most one member of a
union can be "active". According to 3.10/15 you seem to exclude some
valid uses with your statement.

The reading I always had was that the C++ standard seemed to hint in
that very passage on strict aliasing that you could type pun through a
union, but through discussions I've learned that the intent of the C
standard was such type punning is not allowed, or so random people X
say. The standard itself is pretty vague on the subject, and the C++
standard itself almost seems to allow such things according to
3.10/15.
 
S

SG

How can a value of type A be bit-casted to value of type B? By bit-
casting i mean that both values will have the same machine bit
representation.

FYI: In C++ standard terminology we have "object representation" and
"value representation" where the latter is a subset (in bits) of the
former. The value representation is the set of bits that determines
the value. The object representation may include padding bits.
Let's assume that A and B both have the same size in
bits.
OK.

Here are a few methods which i can think of:
1)
A a;
B b;
b = *static_cast<B*>(&a)

In case the types A and B are compatible with respect to 3.10/15
(which I quote below) you can do this with a reinterpret_cast:

b = reinterpret_cast<B&>(a);

(You don't need pointers here). In case B is a base class of A, you
don't need a reinterpret_cast, of course. In case A and B are not
"compatible" (w.r.t. 3.10/15) but both are PODs (plain old data
structures) you still have the option to use memcpy.
2)
Using a union

In practice it may work with your compiler. But the standard doesn't
seem to be really clear on that one. My understanding is that if the
reinterpret_cast thing "works" (in the sense that it's guaranteed by
3.10/15) then the union version should also work. But others keep
telling us that the standard's intent is to restrict read access to
the only union member that is "active" (the last one that has been
written to).

C++ standard, 3.10/15:

"If a program attempts to access the stored value of an object
through an lvalue of other than one of the following types the
behaviour is undefined
- the dynamic type of the object,
- a cv-qualified version of the type of the object,
- a type similar (as defined in 4.4) to the dynamic type of the
object,
- a type that is the signed or unsigned type corresponding to
the dynamic type of the object,
- a type that is the signed or unsigned type corresponding to
a cv-qualified version of the dynamic type of the object,
- an aggregate or union type that includes one of the
aforementioned types among its members (including, recursivly,
a member of a subaggregate or contained union),
- a type that is a (possibly cv-qualified) base class type of
the dynamic type of the object
- a char or unsigned char type.
3)
memcpy(&b, &a, sizeof(A));

Should be okay if A and B are PODs and the value representation of b
is a valid value representation for the type A. Otherwise, it's
probably undefined behaviour (or at least implementation-defined, not
sure).

If you provide more details about your problem we could probably give
a better answer. Otherwise there are a lot of cases to consider.

Cheers,
SG
 
J

James Kanze

Which shouldn't compile, of course.
2) is explicitly marked as invoking undefined behaviour. You
are not allowed to read any union member other than the one
you have last written to.
I think the other ones are pretty much the same. I'm not sure
if it's undefined or only unspecified, but it's definitely
compiler-specific. Of course, you have to ensure that your A
value is not a trap representation for B. And A and B should
be POD types.

Anything you can do to make the bits of one type be interpreted
as another type has to be undefined behavior, since the
standard can't define what might happen. (Interpreting the bits
of a long as if they were a double might result in a trapping
NaN, for example.) In the end, you're necessarily playing with
implementation defined behavior at best in such cases.

Having said that, the authors of the standard also realized that
there is (generally very low level) code where such games are
necessary. That's why they provided reinterpret_cast.

Note that you still have to be very, very careful, however,
because if you end up with two pointers of different types,
unless one of the types is a character type, the compiler is
free to assume that they cannot be aliases to the same memory.
As long as the reinterpret_cast is freely visible, from a QoI
point of view, at least, you should be safe (but I think g++ may
have problems in this regard), but beyond that, all bets are
off.
 
J

James Kanze

Can you quote the standard on that one? Is it implied by some
of the other rules?

It's clearer in the C standard (which I don't have accessible
here), but even in C++, "In a union, at most one of the data
members can be active at any time, that is, the value of at most
one of the data members can be stored in a union at any time."
That pretty much indicates that in fact, the union has the time
of its "active" member, and accessing it through any other
member is undefined behavior. (IIRC, the C standard says this
explicitly.)

IMHO, it's an issue that the standards committee should address
(although maybe the C standards committee, rather than the C++,
since C and C++ should really be compatible in this regard).
Historically, I think the union was the preferred solution for
type punning, and from a compiler writer's point of view, it
should be the preferred solution. The C committee explicitly
forbid this use of unions, however, but didn't make it really
clear that casting pointers should work. The C++ committee
introduced reinterpret_cast, doubtlessly to support specific
uses of C casts (that one didn't want to accidentally get), with
a very vague suggestion that reinterpret_cast should be used for
this, but without explicitly offering the necessary guarantees.
So in fact, you're very much at the mercy of the implementers:
g++, for example, does guarantee the use of unions in such
cases, and takes all the liberties which the standard allows for
reinterpret_cast.
 
J

Joshua Maurice

Anything you can do to make the bits of one type be interpreted
as another type has to be undefined behavior, since the
standard can't define what might happen.  (Interpreting the bits
of a long as if they were a double might result in a trapping
NaN, for example.)  In the end, you're necessarily playing with
implementation defined behavior at best in such cases.

Having said that, the authors of the standard also realized that
there is (generally very low level) code where such games are
necessary.  That's why they provided reinterpret_cast.

Note that you still have to be very, very careful, however,
because if you end up with two pointers of different types,
unless one of the types is a character type, the compiler is
free to assume that they cannot be aliases to the same memory.
As long as the reinterpret_cast is freely visible, from a QoI
point of view, at least, you should be safe (but I think g++ may
have problems in this regard), but beyond that, all bets are
off.

A lot of my understanding of these issues comes from
http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html

It's my understanding from the above document and various other
sources that reinterpret_cast does not make the strict aliasing
problem go away, so I will have to disagree with your assessment
James. reinterpret_cast works to convert from values of different
types, so you could convert a float* to an int*, but it does nothing
to alleviate the requirement that accessing an int object through a
float& lvalue is undefined behavior.

James, when he mentions that the reinterpret_cast should be "in
scope", reminds me greatly of how I caution people about using unions
to type pun. (See my earlier post else-thread.) Specifically, a union
tells the compiler that its member types may alias for the scope of
the union. Note that this is not the intent of "union", merely an
extension supported by basically all compilers. However, it's my
operating assumption that, for most / all compilers, a
reinterpret_cast was not intended to, and will not in practice, tell
the compiler that in its "scope" that the unrelated pointer types may
alias. Your suggestion to the contrary is the first that I've heard.
Is this just a guess, or do you have any experience to back this up?
Admittingly, I haven't done any tests either.

Also, I know g++ actually uses the strict aliasing allowance to
optimize, but the visual studios compiler does not. I know nothing
about other compilers in this regard. I've heard that visual studios
does not because it would break too much windows code and code written
for windows. (The gcc people recognized this as well for its own
situation and provided the -fno-strict-aliasing option.)
 
J

Joshua Maurice

That's called "type casting", or just "casting".
As much as possible, it should be avoided.



Needs reinterpret_cast, actually.




I think they are all "standard compliant" in that
they're valid C++.

The "static_cast" / "reinterpret_cast" method is the
prefered way for C++, of the methods you list.  (The
other methods are leftovers from C.)

However, the C++ standard cannot guarantee that code which
uses such casts will actually perform as you intend,
because the standard has no way of knowing which machines
you will be executing your code on, or how those machines
represent objects in memory.

Hence any code that makes assumptions about how objects
are represented in memory will always have the following
flaws:

1. Fragile.  (The code may break at any time due to
   compiler updates, OS changes, CPU changes, or for
   other reasons.)

2. Unclear.  (Maintainance programmers will have a hard time
   understanding what you are doing, and may make changes
   which seem harmless to them, but end up breaking your
   program.)

3. Not portable.  (Ask yourself what would happen if you port
   your code to a machine which uses a little-endian
   representation for type A, but a big-endian representation
   for type B?  The values will get screwed up.  Or what if
   you port the code to a machine where type A uses 17 bits
   but type B uses 37 bits?  Your object b will now contain
   20 extra bits of garbage.)

That being said, casting does come in handy sometimes.
Eg, I use the following in a program of mine to get the
0-255 numerical value of a character:

   // Put character 'H' in variable A:
   char A = 'H';

   // Explict cast to unsigned char, followed by implicit cast to int:
   int Value = static_cast<unsigned char>(A);

   // Print the decimal ASCII code for character 'H':
   std::cout << Value << endl;

But it does make assumptions about how types char and
unsigned char are stored in memory.  If those assumptions
ever become invalid, the code will break.

Thread resurrection!

And no. You are not correct on many points.

reinterpret_cast was not added to the language to support type
punning. Do not use it to access an object through an lvalue of an
incorrect type. reinterpret_cast to be an improvement over some usage
of C-style casts, like the other 3 named casts. Specifically, the new
casts are not context dependent in their effect, unlike the C-style
cast, and they are more easily grep-able, unlike the C-style cast.
reinterpret_cast was just intended to clean up usage of the C-style
cast, not allow new usages such as bypassing the strict aliasing rule.
Repeating to emphasize: C-style casts and reinterpret_cast's will not
bypass the strict aliasing rule.

You can type pun using std::memcpy between POD types. The standard is
quite clear that this should produce the expected results.

Then we have char and unsigned char. The intent of the standard seems
to be to allow reading or writing any object through a char lvalue or
unsigned char lvalue. However, it's not explicitly stated as allowed.

Finally we have type punning through unions. While explicitly not
supported by the standard(s), it's supported as a compiler extension
by basically every C and C++ compiler. (Confirmation anyone?)

When doing any low level bit hackery, however, I would strongly
suggest looking at the generated assembly to confirm expectations.
Compiler bugs (and incorrect expectations) tend to be a little more
prevalent when doing such things.

Might I again suggest reading:
http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html
This is my authoritative link on the subject, and will remain so until
someone provides me good evidence that the article is incorrect.
Anyone doing low level hackery such as type punning should read this
article and understand the nuances of the strict aliasing rule.

Also, there are plenty of good reasons to do type punning. For
example, games. I'm pretty sure that using a portable UTF8 text format
for ints for their network packets would result in rather unacceptable
performance. Yes, it carries the problem that a newer compiler, OS,
etc., may render the communication not compatible, but in certain
contexts this is an acceptable drawback. (Next I'll be hearing that a
first person shooter should send its data over the network in XML
format and should utilize a full XML parser on the receiver side. /
sigh)
 
S

Stefan Ram

Robbie Hatley said:
I think they are all "standard compliant" in that
they're valid C++.

In C, there is an assertion:

»When a value is stored in a member of an object of union type,
the bytes of the object representation that do not
correspond to that member but do correspond to other
members take unspecified values«

ISO/IEC 9899:1999 (E), 6.2.6.1#7

I can not find a corresponding assertion in ISO/IEC
14882:2003(E), but I also can not find a specification of
the value of that »other members« either.

Does anyone know, whether ISO/IEC 14882:2003(E) says anything
about this question?
 
J

Joshua Maurice

Interesting.  So, what's it for?  I can't see any way to
use it that wouldn't be some form of "type punning".

Again. The C++ standard writers considered C-style casts "bad" and
"ugly", and rightfully so.

The first problem is the C-style cast is context dependent.
A* a = (B*)b;
Depending on if A and B are complete types at the point of the cast,
the cast will do different things. If they're incomplete types, it's a
reinterpret_cast, always. This will generally break when multiple
inheritance or virtual inheritance is involved because casting in such
a type hierarchy actually can change the bit value of the pointer,
change the offset into the object, but reinterpret_cast will never do
that, so you'll have an A* pointing to the wrong offset, the wrong
virtual function pointer, etc. I've hit this in production code, where
someone did a C-style cast with MI, but it broke because the types
were just forward declared. By providing 4 different named casts, we
can avoid this potential problem. Generally, you'll want a static_cast
or dynamic_cast, and both of these will fail to compile when the types
are incomplete.

The next problem is again one of vagueness. The C-style cast can
1- cast between unrelated types ala reinterpret_cast
2- do implicit casts, like casting to a base class
3- downcast, like a static_cast
4- (And it can also cast to an inaccessible base class, but the 4
named casts cannot)
By separating these different functions into different named casts,
the code becomes clearer as the intent is more easily ascertained, and
there's less chance of mistakes due to what's in scope, if the types
are complete types, etc.

Finally, it's relatively hard to grep for c-style casts, but it's
quite easy to grep for the 4 named casts. It also makes the code
clearer IMHO that a cast is going on with a quick glance. Casts should
be rare, and they should stand out more than the C-style cast stands
out.
 
J

James Kanze

That's called "type casting", or just "casting". As much as
possible, it should be avoided.

Actually, it's called type punning. In C++ (and in C), a "cast"
is an explicit type conversion. Any explicit type conversion:
int to double, for example, or even just removing const.

But you're right that type punning should be avoided in general.
It has its place in some very low level software, but unless
you're implementing a garbage collector, or something along
those lines, you probably shouldn't be using it.
Needs reinterpret_cast, actually.

Yes. reinterpret_cast is the cast for type punning in C++.

Involves undefined behavior.
I think they are all "standard compliant" in that they're
valid C++.
The "static_cast" / "reinterpret_cast" method is the prefered
way for C++, of the methods you list. (The other methods are
leftovers from C.)
However, the C++ standard cannot guarantee that code which
uses such casts will actually perform as you intend, because
the standard has no way of knowing which machines you will be
executing your code on, or how those machines represent
objects in memory.

The C++ standard also allows the compiler to assume that
pointers to different types never point to the same object (with
an exception for pointers to character types), which means that
even reinterpret_cast can be dangerous if the compiler is
aggresively optimizing. The safest way is the memcpy, because
it involves two different objects. Used correctly, however,
reinterpret_cast should be fairly safe.

[...]
That being said, casting does come in handy sometimes.

Most of the more useful conversions are implicit, but it's still
relatively frequent to use things like:

int a;
int b;
double percent = (double)a / (double)b * 100.0;
(I'd write that last line:
double percent = 100.0 * a / b;
and let the implicit type promotions do the job, but it's not
always that simple.)

Of course, that's not a bitwise conversion; not type punning.
Eg, I use the following in a program of mine to get the
0-255 numerical value of a character:
// Put character 'H' in variable A:
char A = 'H';
// Explict cast to unsigned char, followed by implicit cast to int:
int Value = static_cast<unsigned char>(A);
// Print the decimal ASCII code for character 'H':
std::cout << Value << endl;
But it does make assumptions about how types char and unsigned
char are stored in memory. If those assumptions ever become
invalid, the code will break.

What assumptions. Except for the actual encoding of the letter
'H', the code above is fully defined.

It is, in fact, one of the more frequent uses of casts: you
can't portably call any of the functions declared in <ctype.h>
with a char; you have to explicitly convert the char to unsigned
char first.
 
J

James Kanze

reinterpret_cast was not added to the language to support type
punning.

Why was it added to the language, then?

[...]
Finally we have type punning through unions. While explicitly
not supported by the standard(s), it's supported as a compiler
extension by basically every C and C++ compiler. (Confirmation
anyone?)

The only compiler I've seen that documents it as being supported
is g++ (but I've not really looked at all of the documentation).
And even with g++, it depends on the context---there are cases
where it will fail.

From a QoI point of view: if the union or the reinterpret_cast
is visible, I would expect the code to give the expected
results. Any accesses elsewhere, and all bets are off, e.g.:

int f(int* pi, double* pd)
{
int retval = *pi;
*pd = 3.14159;
return retval;
}

int
main()
{
union U { int i; double d; } x;
x.i = 42;
std::cout << f(&x.i, &x.d) << std::endl;
return 0;
}

I would not count on this code outputting 42, regardless of any
guarantees the compiler might give.

[...]
Also, there are plenty of good reasons to do type punning. For
example, games. I'm pretty sure that using a portable UTF8
text format for ints for their network packets would result in
rather unacceptable performance.

So you use any one of a number of binary formats. You still
don't need (nor want) type punning to implement them.
Yes, it carries the problem that a newer compiler, OS, etc.,
may render the communication not compatible, but in certain
contexts this is an acceptable drawback. (Next I'll be hearing
that a first person shooter should send its data over the
network in XML format and should utilize a full XML parser on
the receiver side. / sigh)

What's wrong with XDR?
 
J

Joshua Maurice

Why was it added to the language, then?

See my previous post else-thread for my understanding.

The only compiler I've seen that documents it as being supported
is g++ (but I've not really looked at all of the documentation).
And even with g++, it depends on the context---there are cases
where it will fail.

From a QoI point of view: if the union or the reinterpret_cast
is visible, I would expect the code to give the expected
results.  Any accesses elsewhere, and all bets are off, e.g.:

    int f(int* pi, double* pd)
    {
        int retval = *pi;
        *pd = 3.14159;
        return retval;
    }

    int
    main()
    {
        union U { int i; double d; } x;
        x.i = 42;
        std::cout << f(&x.i, &x.d) << std::endl;
        return 0;
    }

I would not count on this code outputting 42, regardless of any
guarantees the compiler might give.

//Start code for foo.cpp
#include <iostream>
using namespace std;

int main()
{
cout << sizeof(int) << " " << sizeof(short) << endl;
{
int x = 1;
short* s = reinterpret_cast<short*>(&x);
s[0] = 2;
s[1] = 3;
cout << x << endl;
}
{
int x = 1;
union { int u_int; short u_short_array[2]; };
u_int = x;
u_short_array[0] = 2;
u_short_array[1] = 3;
x = u_int;
cout << x << endl;
}
}
//End code

//Start prompt copy
bash-3.2$ g++ --version
g++ (GCC) 4.1.2 20080704 (Red Hat 4.1.2-44)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There
is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.

bash-3.2$ g++ -O3 foo.cpp -Wall
foo.cpp: In function âint main()â:
foo.cpp:9: warning: dereferencing type-punned pointer will break
strict-aliasing rules
bash-3.2$ ./a.out
4 2
1
196610
bash-3.2$
//end prompt copy

So, when the union is in scope, gcc "does the right thing", and when
reinterpret_cast is in scope, gcc does not "do the right thing". Now,
I don't know any other compilers offhand which optimize with the
strict aliasing allowance besides newer gcc versions, but I would
suggest you revise your understanding of the QoI implications of
reinterpret_cast. My previous argument succinctly: C-style casts were
never intended to get around strict aliasing. reinterpret_cast was
never intended to be more powerful than C-style casts. (The named
casts were each intended to fulfill a specific role of the several
roles of C-style casts to remove potential ambiguity to the code
writer and readers.) Thus reinterpret_cast was never intended to get
around strict aliasing.

Perhaps I am wrong about the original intent. However, I'm at least
right on the questions of fact, at least if we count gcc as a good
example.

So you use any one of a number of binary formats.  You still
don't need (nor want) type punning to implement them.

You would need to type pun somewhere as the OS network calls probably
only work in terms of char pointers. So either the game code is type
punning, or the network library on top of the OS is type punning, or
the device driver is type punning (or written in assembly). Someone is
probably type punning in C or C++.
 
J

Joshua Maurice

Also, I started re-reading
http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html
just now. It appears as though it's not perfectly accurate either. It
ignores the allowance that you can cast between POD types with common
leading parts and access the common leading parts as expected. The
article's very point about structs Foo and Bar is actually inaccurate,
though I support the article on its intent to say "No. It doesn't work
naively. The compiler allowed to do non-obvious and surprising things
because of the strict aliasing rule."
 
J

James Kanze

See my previous post else-thread for my understanding.

I didn't see any real explination, other than to provide a new
style cast for certain C style casts.

Note that my statement above is based on how compiler optimizers
work. The motivation behind the anti-aliasing rule (e.g. that
two pointers to different types cannot refer to the same object)
is to allow certain optimizations. Optimizations which are
important in some code. But this only works if the above is not
guaranteed to work.

As the C++ standard is currently written, the above *is*
guaranteed to work. I don't think that this was the intent,
however. C has (or had) similar rules. I know that the issue
was discussed in the C committee, but I don't know the exact
status of the proposed resolution.
//Start code for foo.cpp
#include <iostream>
using namespace std;
int main()
{
cout << sizeof(int) << " " << sizeof(short) << endl;
{
int x = 1;
short* s = reinterpret_cast<short*>(&x);
s[0] = 2;
s[1] = 3;
cout << x << endl;
}
{
int x = 1;
union { int u_int; short u_short_array[2]; };
u_int = x;
u_short_array[0] = 2;
u_short_array[1] = 3;
x = u_int;
cout << x << endl;
}
}
//End code
//Start prompt copy
bash-3.2$ g++ --version
g++ (GCC) 4.1.2 20080704 (Red Hat 4.1.2-44)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There
is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.
bash-3.2$ g++ -O3 foo.cpp -Wall
foo.cpp: In function âint main()â:
foo.cpp:9: warning: dereferencing type-punned pointer will break
strict-aliasing rules
bash-3.2$ ./a.out
4 2
1
196610
bash-3.2$
//end prompt copy
So, when the union is in scope, gcc "does the right thing",
and when reinterpret_cast is in scope, gcc does not "do the
right thing".

Which is, from a QoI point of view, an error.

In fact, I don't know whether this is an error in the coding,
a problem in the way that the optimizer works which would make
it difficult (and perhaps not worth it) to fix, or simply a bit
of stubborness on the part of g++ developers. It does mean that
reinterpret_cast is useless, which is certainly not the intent
of the committee.

The issue isn't simple. Historically (pre-ISO C), the union was
the preferred solution, at least from what I understood. For
whatever reasons, the ISO C committee (or at least the parts of
it which formallized the wording in this regard) designed to
make type punning via a union undefined behavior, which means
that only casting remains. The C++ committee simply followed
the C committee in this respect---I'm 100% sure that the intent
of the C++ committee is that a reinterpret_cast, when legal,
behave exactly the same as a cast in C.

In addition, there is a note (non-normative) in the C++ standard
(§5.2.10/3) concerning the mapping done by reinterpret_cast: "it
is intended to be unsurprising to those who know the addressing
structure of the underlying machine". Although this note is
directly attached to the pointer to integer conversion, in the
absense of any other indications, it seems reasonable to me to
apply it to the other uses of reinterpret_cast as well.

In any case, the current standard is very unclear with regards
to type punning---with the exception of character types. And I
don't think that this has changed in the more recent drafts---in
a very real sense, I think that it is more a C problem; that the
C++ committee should simply wait, and adopt whatever the C
committee finally decides.
Now, I don't know any other compilers offhand which optimize
with the strict aliasing allowance besides newer gcc versions,

I don't know of any that don't. It's a common optimization. It
was present in Microsoft C 1.0, for example (in which your union
example would break).
but I would suggest you revise your understanding of the QoI
implications of reinterpret_cast.

Why? My understanding of the QoI implications are based on the
actual words in the standard, and the various discussions that
I've followed in the standardization committees.
My previous argument succinctly: C-style casts were never
intended to get around strict aliasing.

Yes and no. An optimizer is expected to use the knowledge at
its disposal. If it can see that there is aliasing, whether
from a union or a cast, it should take it into account.

That is, by the way, the direction the C committee was going the
last time I looked.
reinterpret_cast was never intended to be more powerful than
C-style casts.

Certainly not. But since the C style cast should behave as
expected, when visible, so should the reinterpret_cast.
(The named casts were each intended to fulfill a specific role
of the several roles of C-style casts to remove potential
ambiguity to the code writer and readers.) Thus
reinterpret_cast was never intended to get around strict
aliasing.

You seem to be misunderstanding the motivation behind the strict
aliasing rule. It is to allow the compiler to assume no
aliasing in cases where it otherwise couldn't. There was never
any intent to allow the compiler to totally ignore aliasing that
it can clearly see.
Perhaps I am wrong about the original intent. However, I'm at
least right on the questions of fact, at least if we count gcc
as a good example.

I don't think you can count any single compiler as a
"reference".
You would need to type pun somewhere as the OS network calls
probably only work in terms of char pointers.

No. You do need to convert value types, but that's all.

For integral types, there's really no need for any type punning
whatsoever. In the case of floating point, the issue is more
complex, since the code necessary to portably convert a string
of bytes of a given format into a floating point value is
relatively complex, and more expensive than type punning an
uint_64 to a double, in the case where you know that the
external format and the internal format are the same (e.g.
IEEE).
So either the game code is type punning, or the network
library on top of the OS is type punning, or the device driver
is type punning (or written in assembly). Someone is probably
type punning in C or C++.

I've written a lot of network code in which there was no type
punning. I have an implemenation of an xdrstream which does no
type punning, even for floating point. For integral types, it's
about as fast as implementations which do type pun (but it is
far more portable); for floating point, it's measurably slower
(but has the advantage that it works regardless of the machine
floating point format), but not nearly as much as I initially
expected.
 
J

James Kanze

Also, I started re-reading
http://cellperformance.beyond3d.com/articles/2006/06/understanding-st...
just now. It appears as though it's not perfectly accurate either.

And how. Or rather, it seems to be discussing in detail what
g++ actually does, rather than anything based on the standard.
(Note that there's also a problem with terminology. There is a
statement "Pointers to aggregate or union types with different
tags do not alias", but the example doesn't have any tags, and
the pointers in it are, in fact, allowed to alias in C, because
in C, the struct's Foo and Bar are the same type.)
It ignores the allowance that you can cast between POD types
with common leading parts and access the common leading parts
as expected. The article's very point about structs Foo and
Bar is actually inaccurate, though I support the article on
its intent to say "No. It doesn't work naively. The compiler
allowed to do non-obvious and surprising things because of the
strict aliasing rule."

That is, of course, the crux of the matter. If you have a
function:
void f(int* pi, double* pd)
The compiler will assume that pi and pd don't refer to the same
element. The standard clearly intends to give this guarantee,
although there are a very few cases where it in fact doesn't.
And the guarantee is important for optimizing purposes. Beyond
that, the standard is far from clear: from a QoI point of view,
I would expect the compiler to recognize visible aliasing, and
take it into account; if nothing else, not doing so is being
intentionally perverse. From discussions in the C committee,
prior to the formalizing of C90, I conclude that the *intent* is
1) that a checking compiler is allowed to somehow "discriminate"
unions, and detect cases where the accessed entry is not the
last written (modulo the few cases where this is guaranteed to
work), and 2) that the intent is that type punning be done by
casting. This conclusion is, of course, based on my memory and
my interpretation of discussions which occured a long time ago.
But pratically speaking, support for pointer casts in C doesn't
make sense otherwise.

Practically speaking, from a QoI point of view: if the compiler
sees a reinterpret_cast (or a pointer cast in C), it should be
clear that the programmer is doing something tricky at a very
low level, and that there *is* aliasing. Not taking that into
account is simply perverse. From a practical point of view,
too, unless the compiler is generating extensive debugging code
and actually discriminating unions, in order to detect errors,
the compiler should also make unions work as expected (but the
earliest versions of Microsoft C didn't); even in a debugging
compiler, I would expect some sort of option or pragma to allow
this common and traditional, albeit illegal, use of unions.
 
J

Joshua Maurice

//Start code for foo.cpp
#include <iostream>
using namespace std;
int main()
{
  cout << sizeof(int) << " " << sizeof(short) << endl;
  {
    int x = 1;
    short* s = reinterpret_cast<short*>(&x);
    s[0] = 2;
    s[1] = 3;
    cout << x << endl;
  }
  {
    int x = 1;
    union { int u_int; short u_short_array[2]; };
    u_int = x;
    u_short_array[0] = 2;
    u_short_array[1] = 3;
    x = u_int;
    cout << x << endl;
  }
}
//End code
//Start prompt copy
bash-3.2$ g++ --version
g++ (GCC) 4.1.2 20080704 (Red Hat 4.1.2-44)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There
is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.
bash-3.2$ g++ -O3 foo.cpp -Wall
foo.cpp: In function âint main()â:
foo.cpp:9: warning: dereferencing type-punned pointer will break
strict-aliasing rules
bash-3.2$ ./a.out
4 2
1
196610
bash-3.2$
//end prompt copy
So, when the union is in scope, gcc "does the right thing",
and when reinterpret_cast is in scope, gcc does not "do the
right thing".

Which is, from a QoI point of view, an error.

It's an error if you believe that those are the desired semantics.

In fact, I don't know whether this is an error in the coding,
a problem in the way that the optimizer works which would make
it difficult (and perhaps not worth it) to fix, or simply a bit
of stubborness on the part of g++ developers.  It does mean that
reinterpret_cast is useless, which is certainly not the intent
of the committee.

I would argue that it's the gcc team's stubbornness to follow the
standard as written. I cannot speak to the intent of the committee,
nor can most users of C++. However, we can speak to what the standard
clearly says. That said, it is somewhat silly to provide type punning
when the union is in scope but not allow type punning when a cast is
in scope. I think it makes a little more sense if we say that they're
simply following current practice, and this is how most other
compilers do it. (Again, confirmation or evidence to the contrary
anyone?)

The issue isn't simple.  Historically (pre-ISO C), the union was
the preferred solution, at least from what I understood.  For
whatever reasons, the ISO C committee (or at least the parts of
it which formallized the wording in this regard) designed to
make type punning via a union undefined behavior, which means
that only casting remains.  The C++ committee simply followed
the C committee in this respect---I'm 100% sure that the intent
of the C++ committee is that a reinterpret_cast, when legal,
behave exactly the same as a cast in C.

Repeating for emphasis:
I'm 100% sure that the intent
of the C++ committee is that a reinterpret_cast, when legal,
behave exactly the same as a cast in C.

I agree with that. I cannot speak to the intent of the committee(s) as
you can, but I can speak to what they wrote, and the standard is quite
clear that the C-style cast does not get around the strict aliasing
rule, and thus reinterpret_cast does not get around the strict
aliasing rule.

In addition, there is a note (non-normative) in the C++ standard
(§5.2.10/3) concerning the mapping done by reinterpret_cast: "it
is intended to be unsurprising to those who know the addressing
structure of the underlying machine".  Although this note is
directly attached to the pointer to integer conversion, in the
absense of any other indications, it seems reasonable to me to
apply it to the other uses of reinterpret_cast as well.

There is no such absence in the C++ standard. It is very clear that
accessing an object through an lvalue of a sufficiently different type
is undefined behavior (except for the char and unsigned exception, and
the common leading part of POD exception).

The section you cite, including the normative note, is a very narrow
exception which states that a reinterpret_cast on a pointer will
produce an rvalue whose value should not be surprising to those who
know the addressing structure of the underlying machine. This in no
way is an exception to the strict aliasing rule. Instead, in this
context reinterpret_cast takes one value of a certain type and casts
that value to another type. It does not tell the compiler that two
different pointers alias or in any way affect the strict aliasing
rule.

In any case, the current standard is very unclear with regards
to type punning---with the exception of character types.  And I
don't think that this has changed in the more recent drafts---in
a very real sense, I think that it is more a C problem; that the
C++ committee should simply wait, and adopt whatever the C
committee finally decides.


I don't know of any that don't.  It's a common optimization.  It
was present in Microsoft C 1.0, for example (in which your union
example would break).

Really? I was under the impression that basically no Microsoft
compiler actually optimized with the strict aliasing allowance, that
too much windows code would break if it did by default. Very simple
testing like that above seems to show that the Microsoft compiler does
not.

Why?  My understanding of the QoI implications are based on the
actual words in the standard, and the various discussions that
I've followed in the standardization committees.


Yes and no.  An optimizer is expected to use the knowledge at
its disposal.  If it can see that there is aliasing, whether
from a union or a cast, it should take it into account.

That is, by the way, the direction the C committee was going the
last time I looked.


Certainly not. But since the C style cast should behave as
expected, when visible, so should the reinterpret_cast.

I cannot speak to your private discussions with the committees. It's
just that's not what's in the current standards.

You seem to be misunderstanding the motivation behind the strict
aliasing rule.  It is to allow the compiler to assume no
aliasing in cases where it otherwise couldn't.  There was never
any intent to allow the compiler to totally ignore aliasing that
it can clearly see.

Perhaps I was too strong. However, I don't think it's right to be so
dismissive of that position. It is a reasonable one. Many times I hear
"The compiler should just be smart enough", but many times this is not
the case, for various reasons, such as too hard to implement, or the
semantics would be too vague or not well defined, or it would be bad
style and confusing to the coders. I think all kinds of "type punning
but only in certain scopes [unions and casts]" qualify.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top