Re: Aliasing in C99

Discussion in 'C Programming' started by Tim Rentsch, May 31, 2012.

  1. Tim Rentsch

    Tim Rentsch Guest

    David Brown <> writes:

    > I am trying to figure out how to get aliasing to work correctly according
    > to the C99 rules. For example, converting between a float and its binary
    > representation.
    >
    > float negPCast(float x) {
    > uint32_t u = *((uint32_t *) &x);
    > u ^= 0x80000000u;
    > return *((float *) &u);
    > }
    >
    > In the absence of type-based aliasing, this will negate a float using
    > just a simple xor operation (ignore any issues with endianness, int
    > sizes, NaNs, etc., since this is just an example).
    >
    >
    > The pointer typecasting here will break strict aliasing rules, and is
    > therefore not valid C99. (I'm guessing that in this case, most compilers
    > will generate code that works as desired - but I'm looking for strictly
    > conforming methods.)


    Point of terminology: the appropriate term is "effective type"
    rules, ie, this terminology is what the C Standard uses. The
    term "strict aliasing" is a gcc-ism; the rules for "strict
    aliasing" are similar to, but not exactly the same as, effective
    type rules as the C Standard defines them.


    > It is possible to re-implement it using type-punning unions:
    >
    > float negUnion(float x) {
    > union { float f; uint32_t u; } uf;
    > uf.f = x;
    > uf.u ^= 0x80000000;
    > return uf.f;
    > }
    >
    > This doesn't use pointer typecasting, but I believe type-punning unions
    > are undefined in C but implemented "properly" in most compilers.


    Defined, not undefined. The specific behavior depends on the
    representations of the types in question, but presumably these
    representations are suitable on the systems you want to run on.


    > It is also possible to use pointers to unions in casts:
    >
    > float negUnionPCast(float x) {
    > typedef union { float f; uint32_t u; } UF;
    > uint32_t u = ((UF*) &x)->u;
    > u ^= 0x80000000u;
    > return ((UF*) &u)->f;
    > }
    >
    > I /think/ pointer casts like this are not subject to strict aliasing
    > rules, but I don't know if the union usage is valid.


    This approach gives undefined behavior, for several different
    reasons, as others have explained. There's a good chance it will
    work, but certainly that's not guaranteed under the Standard.


    > Does anyone know of other ways that are strictly valid and defined in
    > C99, and that also are efficient in use (I'd like to avoid things like
    > casting back and forth between char or char pointers, or volatile
    > accesses, etc.)?


    The union method is well-defined (modulo the proviso about type
    representations) and should work just fine. Since you are using
    C99, this approach can be coded more directly using compound
    literls, viz., (I am using 'unsigned' rather than 'uint32_t'
    but they are the same on my system):

    float
    negate_float( float x ){
    typedef union { float f; unsigned u; } UF;
    return (UF){ .u = (UF){ x }.u ^ 0x80000000 }.f;
    }

    Compiling this function with gcc (using -O2 or -O3), the
    generated code looks pretty good, about what I'd expect and also
    as good as I think you would hope for. If an 'inline' qualifier
    is added to the function definition, then generated code for a
    call is just three instructions (this is on an x86), ie, load,
    xor, store, and no floating point instructions.
     
    Tim Rentsch, May 31, 2012
    #1
    1. Advertising

  2. Tim Rentsch

    Tim Rentsch Guest

    David Brown <> writes:

    > On Thu, 31 May 2012 13:57:58 -0700, Tim Rentsch wrote:
    >
    >> David Brown <> writes:
    >> [snip]

    >
    > OK - this is the main point I've learned with this thread (and the reason
    > I asked in the first place). I know how to use such unions, I knew they
    > worked in practice - know I also know they work in theory (assuming, as
    > you say, the underlying representations are known).


    Good deal.


    >>> It is also possible to use pointers to unions in casts:
    >>>
    >>> float negUnionPCast(float x) {
    >>> typedef union { float f; uint32_t u; } UF; uint32_t u = ((UF*)
    >>> &x)->u;
    >>> u ^= 0x80000000u;
    >>> return ((UF*) &u)->f;
    >>> }
    >>>
    >>> I /think/ pointer casts like this are not subject to strict aliasing
    >>> rules, but I don't know if the union usage is valid.

    >>
    >> This approach gives undefined behavior, for several different reasons,
    >> as others have explained. There's a good chance it will work, but
    >> certainly that's not guaranteed under the Standard.
    >>

    >
    > OK. I'm not entirely confident about /why/ this is not correct according
    > to the standard, and it seems to be in conflict with other things I've
    > read. But either way, it leads to the same conclusion - it can't be
    > relied on to work, and so should not be used.


    For the same reason (among others) you were concerned in the
    first place, ie, aliasing rules. You have something that is a
    float, and in particular a float not in a union, and you access
    it through a pointer to a union type! It's possible -- and I'm
    not sure about this -- that the Standard does indeed allow this
    under effective type rules. However, it's a grey area, and
    because of that compilers may take liberties with what kinds of
    optimizations that allow in such cases. In almost all cases
    accessing something through a pointer converted to a type
    not the same as that of the target is best avoided. There
    also are alignment and padding byte issues, as others have
    mentioned; no reason to start down that path when there
    is another one that is easier and safer.


    >>> Does anyone know of other ways that are strictly valid and defined in
    >>> C99, and that also are efficient in use (I'd like to avoid things like
    >>> casting back and forth between char or char pointers, or volatile
    >>> accesses, etc.)?

    >>
    >> The union method is well-defined (modulo the proviso about type
    >> representations) and should work just fine. Since you are using C99,
    >> this approach can be coded more directly using compound literls, viz.,
    >> (I am using 'unsigned' rather than 'uint32_t' but they are the same on
    >> my system):
    >>
    >> float
    >> negate_float( float x ){
    >> typedef union { float f; unsigned u; } UF;
    >> return (UF){ .u = (UF){ x }.u ^ 0x80000000 }.f;
    >> }
    >>
    >> Compiling this function with gcc (using -O2 or -O3), the generated code
    >> looks pretty good, about what I'd expect and also as good as I think you
    >> would hope for. If an 'inline' qualifier is added to the function
    >> definition, then generated code for a call is just three instructions
    >> (this is on an x86), ie, load, xor, store, and no floating point
    >> instructions.


    [Note that the quoted function body had line-wrapping issues not
    present in the original, which I have repaired above.]

    > I like the idea, as it is an elegant and efficient solution. However,
    > I'm a big fan of clear and explicit code, and my code has to be easily
    > understood by others (even those not yet well versed in C99) - I think
    > this looks a bit convoluted for common use. But in some circumstances,
    > it could be the best solution.


    I am also a fan of clear code. I suspect the issue here is
    not lack of clarity but lack of familiarity; compound literals
    were introduced in C99 and few people use them. So there is
    something of a chicken and egg problem. However, if you
    aren't comfortable using compound literals, we can still
    write a simple function using a direct, functional style
    (disclaimer: not compiled):

    float
    negate_float( float x ){
    typedef union { unsigned u; float f; } UF;
    const UF f = { .f = x }, u = { .u = f.u ^ 0x80000000 };
    return u.f;
    }

    The type punning change from float to unsigned happens at 'f.u',
    and from unsigned to float happens at 'u.f'. Taking a functional
    approach allows the two union variables to be 'const'. Personally
    I think this functional style is easier to understand than an
    imperative one where a single union object is serving two different
    purposes.


    > With my brief testing (with gcc on amd64), all versions of the functions
    > gave the same code, including the movement of data onto the stack
    > mentioned elsewhere in the thread. But that's not a concern for me, as
    > that is not one of my targets.


    Another reason for writing this function using a functional style
    rather than an updating assignment is that it's often easier for a
    compiler to optimize such code, mapping as it does very
    straightforwardly onto a single-assignment canonical form. Gcc
    is pretty clever at optimizing, but for another compiler of
    unknown abilities I think there is a better chance of it
    optimizing nicely if this kind of functional approach is taken.
     
    Tim Rentsch, Jun 1, 2012
    #2
    1. Advertising

  3. Tim Rentsch

    Tim Rentsch Guest

    David Brown <> writes:

    > On 01/06/2012 03:15, Tim Rentsch wrote:
    >> David Brown<> writes:
    >>
    >>> On Thu, 31 May 2012 13:57:58 -0700, Tim Rentsch wrote:
    >>>
    >>>> David Brown<> writes:

    [several snips done in the following, for compactness]

    >>> [on the matter of using a casted pointer]

    >>
    >> For the same reason (among others) you were concerned in the
    >> first place, ie, aliasing rules. You have something that is a
    >> float, and in particular a float not in a union, and you access
    >> it through a pointer to a union type!


    I should have been more specific about my reaction. The wording
    of the effective type rules is rather clumsy. I think it's hard
    to make a convincing argument either way, based just on that
    wording and nothing else. Despite that, I think it's reasonable
    to make an educated guess as to the intention behind what was
    actually written, and that would go like this: on the one hand
    we have a simple variable, ie, not in a union (or struct), and on
    the other hand we have an access to a member of a union (struct
    member access would be equivalent); we know that the standalone
    variable cannot possibly be in a struct or union, whereas member
    access _must_ refer to an actual struct or union object, and
    therefore the two objects in question must be distinct, ie, no
    aliasing can definedly occur between them.

    To repeat myself, I wouldn't call this an ironclad argument,
    reasoning as it does based somewhat on speculating about the
    underlying intention. However, I think the basic reasoning
    is convincing enough so that some implementations might take
    the same view, and that's why I think it's a grey area, and
    consequently best avoided.

    > Casts to unions are discussed in the gcc documentation as a gcc
    > extension, and are safe to use (carefully) with gcc - but while I
    > often use various gcc ports, I also use other compilers, so gcc
    > extensions are not a solution.


    Even if casts to unions were universally available, casting
    a pointer to a non-compound type (ie, not a struct or union)
    to a pointer to a compound type is a totally different beast,
    because casting to a union (or struct) operates on values,
    whereas pointer casting implicitly operates on object
    representations and may have resulting aliasing issues.
    Casting that works on values can never, in and of itself,
    have even potential aliasing issues; casting that works
    on pointers always can.


    >> However, if you
    >> aren't comfortable using compound literals, we can still
    >> write a simple function using a direct, functional style
    >> (disclaimer: not compiled):
    >>
    >> float
    >> negate_float( float x ){
    >> typedef union { unsigned u; float f; } UF;
    >> const UF f = { .f = x }, u = { .u = f.u ^ 0x80000000 };
    >> return u.f;
    >> }
    >>

    >
    > I'd split the line in two:
    > const UF f = (UF) { .f = x };
    > const UF u = (UF) { .u = f.u ^ 0x80000000 };


    I'm fine with either one line or two. Any developer worth his
    salt should be able to read either form with no difficulty, and
    I think it's wrong to be overly dogmatic about a "one declaration
    per line" rule. That said, the specific case here is (for me at
    least) below the threshold of arguing one way or another.

    Incidentally, it's usually a good rule in newsgroup postings
    to use multiple spaces rather than tabs for indentation (and
    for that matter all other uses too).


    > Then I think it is clear even to people unfamiliar with
    > compound literals.


    Note that the function body I wrote did not use compound
    literals, but just regular initialization. Your two lines
    above could (and perhaps should) have been written thusly:

    const UF f = { .f = x };
    const UF u = { .u = f.u ^ 0x80000000 };

    I expect most developers would prefer this writing to the
    earlier alternative.

    If compound literals are okay for your audience, it seems
    natural to avoid the temporary variable 'f', which is used
    in only one place; that would allow one declaration instead
    of two:

    const UF u = { .u = 0x80000000 ^ (UF){ .f = x }.u };

    But then I think you know where this is going. :)


    > Thanks for your suggestions and comments.


    You're welcome, it's good to know they have been helpful.
     
    Tim Rentsch, Jun 1, 2012
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Glen Low

    C99 complex numbers and aliasing

    Glen Low, Aug 3, 2004, in forum: C Programming
    Replies:
    5
    Views:
    432
    David R Tribble
    Aug 20, 2004
  2. Mike

    C99, strict aliasing

    Mike, Jul 20, 2010, in forum: C Programming
    Replies:
    5
    Views:
    669
    Tim Rentsch
    Jul 21, 2010
  3. James Kuyper

    Re: Aliasing in C99

    James Kuyper, May 31, 2012, in forum: C Programming
    Replies:
    0
    Views:
    252
    James Kuyper
    May 31, 2012
  4. Eric Sosman

    Re: Aliasing in C99

    Eric Sosman, May 31, 2012, in forum: C Programming
    Replies:
    3
    Views:
    324
    Tim Rentsch
    May 31, 2012
  5. Xavier Roche

    Re: Aliasing in C99

    Xavier Roche, May 31, 2012, in forum: C Programming
    Replies:
    1
    Views:
    304
    Xavier Roche
    May 31, 2012
Loading...

Share This Page