Stanard compliant bit-casting

J

Joshua Maurice

Practically speaking, from a QoI point of view: if the compiler
sees a reinterpret_cast (or a pointer cast in C), it should be
clear that the programmer is doing something tricky at a very
low level, and that there *is* aliasing.  Not taking that into
account is simply perverse.  From a practical point of view,
too, unless the compiler is generating extensive debugging code
and actually discriminating unions, in order to detect errors,
the compiler should also make unions work as expected (but the
earliest versions of Microsoft C didn't); even in a debugging
compiler, I would expect some sort of option or pragma to allow
this common and traditional, albeit illegal, use of unions.

I'll just make a quick note that not all reinterpret_casts are trying
to alias pointers. A reinterpret_cast from a pointer to int, or int to
pointer, does not "try" to make any two differently typed pointers
alias. It just converts one value of a type to a value of the other
type.

Also, if they did intend for C-style casts to the be the correct way
to type pun, why include rules expressly disallowing it, and not make
any exceptions for when the cast is in scope? Again, intent, and
intent specifically contrary to what's written.
 
S

Stefan Ram

Joshua Maurice said:
A reinterpret_cast from a pointer to int (...) just converts
one value of a type to a value of the other type.

Substitute »int« by »integral type large enough to hold it«.
 
J

James Kanze

On Mar 12, 2:18 am, James Kanze <[email protected]> wrote:

[...]
I'll just make a quick note that not all reinterpret_casts are
trying to alias pointers. A reinterpret_cast from a pointer to
int, or int to pointer, does not "try" to make any two
differently typed pointers alias. It just converts one value
of a type to a value of the other type.

Yes. That's another use of reinterpret_cast. It's also
"undefined behavior", but from a QoI point of view, it should
behave in some sort of rational manner.
Also, if they did intend for C-style casts to the be the
correct way to type pun, why include rules expressly
disallowing it, and not make any exceptions for when the cast
is in scope? Again, intent, and intent specifically contrary
to what's written.

The standard doesn't disallow it; it says it's undefined
behavior. There are two motivations for undefined behavior in
the standard: the first is that the standard doesn't expect
anyone to write such code, and so places no constraints on the
compiler if they do; the second is that any reasonable behavior
will depend on the implementation, in ways the standard cannot
forsee or delimit, and undefined behavior leaves the
implementation free to implement whatever is reasonable on that
platform, supposing that there is something reasonable. Thus,
for example, the standard doesn't what to get into the issues of
what happens if you reinterpret_cast an unsigned long* to a
double*, then access memory through the double*, since that
would mean considering things like trapping NaN's and such. So
it says "undefined behavior", and leaves it up to the
implementation to do something appropriate for that
implementation.

Which is a somewhat different issue than the aliasing issue:
there is a definite intention to allow the anti-aliasing rules
to be used for optimization. IMHO, the intent is clear: if the
compiler cannot see aliasing otherwise, it should be free to
assume that pointers to different types (modulo explicit
exceptions like characters to character types, or the initial
indentical sequences of structs) do not alias.

I'll admit that I'm reading a lot into the standard: part of
that is based on discussions I followed during the normalization
of C, and part is based on what I consider common sense: if it
is clear that there is aliasing, the compiler shouldn't assume
that there isn't, simply because of some arbitrary and perhaps
misinterpreted rule. Still, it seems like a reasonable set of
expectations to me.
 
J

James Kanze

On Mar 12, 1:56 am, James Kanze <[email protected]> wrote:

[...]
It's an error if you believe that those are the desired semantics.

Error may not be the best word in this case, since it implies
something unintentional. My argument is, precisely, that from a
QoI point of view, the desired semantics are the only ones which
make sense.
I would argue that it's the gcc team's stubbornness to follow the
standard as written.

I suspect that you're right. Which is, IMHO, an error from a
QoI point of view: the standard gives implementations a lot of
leeway, but from a QoI point of view, some common sense is to be
expected.
I cannot speak to the intent of the committee, nor can most
users of C++. However, we can speak to what the standard
clearly says. That said, it is somewhat silly to provide type
punning when the union is in scope but not allow type punning
when a cast is in scope.

Exactly. The standard makes both undefined behavior. In the
case of unions, this is intentional (if I recall and interpret
correctly discussions I followed during the standardization of
C); the goal is to allow hidden discriminators. Again, if I
recall and interpret such discussions correctly, the motivation
for undefined behavior in the case of reinterpret_cast (or a
pointer case in C) is that the "reasonable" behaviors in the
case of dereferencing such a cast are in fact impossible to
specify in a portable way, but that the committee expected the
implementations to do what is reasonable for whatever the
platform was.

Keeping in mind in all of this that the committee also wanted to
allow the compiler to deduce as much as possible with regards to
anti-aliasing. We have a definite conflict in the goals here,
and the standard is, regretfully, not really as clear as we'd
like with regards to how to resolve this conflict.
I think it makes a little more sense if we say that they're
simply following current practice, and this is how most other
compilers do it. (Again, confirmation or evidence to the
contrary anyone?)

As I mentionned, Microsoft C 1.0 ignored any aliasing due to
unions, but did respect that resulting from casts.

That g++ does provide additional guarantees for unions is, IMHO,
a positive point, since in fact, before the C standard, that was
the traditional solution, and it is still widespread.
Repeating for emphasis:
I agree with that. I cannot speak to the intent of the
committee(s) as you can, but I can speak to what they wrote,
and the standard is quite clear that the C-style cast does not
get around the strict aliasing rule, and thus reinterpret_cast
does not get around the strict aliasing rule.

See above. There are two motivations for undefined behavior.
There is no such absence in the C++ standard. It is very clear that
accessing an object through an lvalue of a sufficiently different type
is undefined behavior (except for the char and unsigned exception, and
the common leading part of POD exception).

It is also clear that dereferencing the result of an arbitrary
int, converted to a pointer, is undefined behavior. I think it
reasonable to apply the same text to both: "it is intended that
the results be unsurprising to those who know the addressing
structure of the underlying machine."

Note that while these words are only used for the conversions
between integral types and pointers in the standard, it is easy
to extend them (for most implementations, anyway) by using an
intermediate cast:
double d;
short* ps = reinterpret_cast<short*>(
reinterpret_cast<long long>(&d));

This is with regards to the cast, and it's immediate use. I do
think that the intend was that in a function:
void f(int* pi, double* pd)
, the compiler should be allowed to assume no aliasing between
pi and pd (even though it is possible to construct cases where
the standard doesn't allow this).
The section you cite, including the normative note, is a very
narrow exception which states that a reinterpret_cast on a
pointer will produce an rvalue whose value should not be
surprising to those who know the addressing structure of the
underlying machine.

Exactly. Any use of reinterpret_cast must be considered
unportable (except to and from character type pointers).
This in no way is an exception to the strict aliasing rule.
Instead, in this context reinterpret_cast takes one value of a
certain type and casts that value to another type. It does not
tell the compiler that two different pointers alias or in any
way affect the strict aliasing rule.

No. And any use of the resulting pointers is undefined behavior
according to the standard. There's no disagreement there. The
question is what the motivation for this undefined behavior is,
why it is there, and what we should expect from a QoI point of
view.
Really? I was under the impression that basically no Microsoft
compiler actually optimized with the strict aliasing
allowance,

Current compilers certainly don't do the same optimizations that
Microsoft C 1.0 did. In some cases, they do a lot more, but in
a few special cases, they do less. (There is no common code
from 1.0 in the current compilers.) However...
that too much windows code would break if it did by default.
Very simple testing like that above seems to show that the
Microsoft compiler does not.

I've not done too much testing with regards to how the Microsoft
compiler does aliasing analysis, but the fact that it does
optimize better than g++ (for Windows platforms) leads me to
think that it does use the anti-aliasing rule. Or that the
anti-aliasing rule doesn't really buy much in practice, and
should perhaps be dropped. (I've not tried special test cases,
but from what little I've looked at, I suspect that most of the
aliasing which causes problems for optimization involves
pointers of the same type, so ignoring the anti-aliasing rule
actually has no impact with regards to optimizating. In which
case, from a QoI point of view, the compiler should ignore it,
and suppose that all pointers may alias.)
I cannot speak to your private discussions with the
committees. It's just that's not what's in the current
standards.

Just for the record, it's not private discussions, in the sense
of just me and the committee. Any member of the committee can
view them.

As for the current standards, the "anti-aliasing" rule doesn't
allow any optimizations, because of the following case:

int f(int* pi, double* pd)
{
int retval = *pi;
*pd = 3.14159;
return retval;
}

void main()
{
union { int i; double d; } u;
u.i = 42;
f(&u.i, &u.d);
std::cout << u.d << std::endl;
return 0;
}

According to the strict words of the current standard, this is
guaranteed to output 42. In practice, if the compiler applies
the anti-aliasing rule in f, it may assign *pd before actually
reading *pi, which will result in a wrong output.

The current wording of the standard guarantee this. IMHO, this
is not the intent, and the discussions in the C committee lead
me to believe that there will be a clarification in this
respect. In the meantime, however, we are left speculating with
regards to the intent of the standard.

But regardless of the words in the standard, common sense says
that if I explicitly say that there is aliasing (and that is
what reinterpret_cast says), then the compiler shouldn't assume
that there isn't.
Perhaps I was too strong. However, I don't think it's right to
be so dismissive of that position. It is a reasonable one.
Many times I hear "The compiler should just be smart enough",
but many times this is not the case, for various reasons, such
as too hard to implement, or the semantics would be too vague
or not well defined, or it would be bad style and confusing to
the coders. I think all kinds of "type punning but only in
certain scopes [unions and casts]" qualify.

The issue is, obviously, not a simple one for compiler
implementers. There is a definite motivation to generate the
fastest code possible, and a number of "undefined behavior" in
the standard are present precisely to allow the compiler
implementer the most freedom possible to do so. My argument is
simply that reinterpret_cast is, or should be, a red flag: the
programmer is effectively telling the compiler that he knows
something that the compiler doesn't. And that the compiler
should respond in consequence, and not ignore what the
programmer is telling it. (And IMHO, reinterpret_cast should be
rare enough that even turning off all optimization in a function
that uses it shouldn't matter.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,197
Latest member
Sean29G025

Latest Threads

Top