Aliasing/Torek's strtod() experience

Discussion in 'C Programming' started by Adam Warner, Jun 29, 2005.

  1. Adam Warner

    Adam Warner Guest

    Hi all,

    Message ID <> is one of many informative
    articles by Chris Torek about C. The particular message discusses aliasing
    and concludes with this paragraph:

    Under these strict type-aliasing rules, casting from (e.g.) "int *" to
    "short *" is not only quite suspicious, it is also likely to cause
    puzzling behavior, at least if you expect your "short *" to access or
    modify your "int". Even the time-honored, albeit dubious, practise of
    breaking a 64-bit IEEE "double" into two 32-bit integers (int or long
    depending on the CPU involved) via a union need not work, and sometimes
    does not. (We had a problem with strtod() not working right because of
    code just like this. It worked in older gcc compilers, and eventually
    failed when gcc began doing type-specific alias analysis and
    optimizations.)

    The code I've written below breaks an 8 byte double into two 4 byte
    unsigned integers via a union. How should this code be modified so it
    conforms to C's aliasing rules?

    #include <assert.h>
    #include <stdint.h>
    #include <stdlib.h>
    #include <stdio.h>

    union u {
    double f64;
    uint32_t u32[2];
    };

    int main() {
    assert(sizeof(double)==8);
    double val=strtod("1.23", NULL);
    printf("%i %i\n", ((union u) val).u32[0], ((union u) val).u32[1]);
    return 0;
    }

    Many thanks,
    Adam
    Adam Warner, Jun 29, 2005
    #1
    1. Advertising

  2. Adam Warner

    Grumble Guest

    Adam Warner wrote:

    > Message ID <> is one of many informative
    > articles by Chris Torek about C. The particular message discusses aliasing
    > and concludes with this paragraph:
    >
    > Under these strict type-aliasing rules, casting from (e.g.) "int *" to
    > "short *" is not only quite suspicious, it is also likely to cause
    > puzzling behavior, at least if you expect your "short *" to access or
    > modify your "int". Even the time-honored, albeit dubious, practise of
    > breaking a 64-bit IEEE "double" into two 32-bit integers (int or long
    > depending on the CPU involved) via a union need not work, and sometimes
    > does not. (We had a problem with strtod() not working right because of
    > code just like this. It worked in older gcc compilers, and eventually
    > failed when gcc began doing type-specific alias analysis and
    > optimizations.)
    >
    > The code I've written below breaks an 8 byte double into two 4 byte
    > unsigned integers via a union. How should this code be modified so it
    > conforms to C's aliasing rules?
    >
    > #include <assert.h>
    > #include <stdint.h>
    > #include <stdlib.h>
    > #include <stdio.h>
    >
    > union u {
    > double f64;
    > uint32_t u32[2];
    > };
    >
    > int main() {
    > assert(sizeof(double)==8);
    > double val=strtod("1.23", NULL);
    > printf("%i %i\n", ((union u) val).u32[0], ((union u) val).u32[1]);
    > return 0;
    > }


    I am not sure it is safe to cast 'double' to 'union u'.

    In C89, writing to member f64, then reading from member u32 has
    implementation-defined behavior - 6.5.2.3 #5.
    Grumble, Jun 29, 2005
    #2
    1. Advertising

  3. Adam Warner

    Adam Warner Guest

    On Wed, 29 Jun 2005 10:22:13 +0200, Grumble wrote:

    >> The code I've written below breaks an 8 byte double into two 4 byte
    >> unsigned integers via a union. How should this code be modified so it
    >> conforms to C's aliasing rules?
    >>
    >> #include <assert.h>
    >> #include <stdint.h>
    >> #include <stdlib.h>
    >> #include <stdio.h>
    >>
    >> union u {
    >> double f64;
    >> uint32_t u32[2];
    >> };
    >>
    >> int main() {
    >> assert(sizeof(double)==8);
    >> double val=strtod("1.23", NULL);
    >> printf("%i %i\n", ((union u) val).u32[0], ((union u) val).u32[1]);
    >> return 0;
    >> }

    >
    > I am not sure it is safe to cast 'double' to 'union u'.
    >
    > In C89, writing to member f64, then reading from member u32 has
    > implementation-defined behavior - 6.5.2.3 #5.


    I suspect aliasing rules are better specified in C99 (6.5 #7):

    An object shall have its stored value accessed only by an lvalue
    expression that has one of the following types:
    -- a type compatible with the effective type of the object,
    -- a qualified version of a type compatible with the effective type of
    the object,
    -- a type that is the signed or unsigned type corresponding to the
    effective type of the object,
    -- a type that is the signed or unsigned type corresponding to a
    qualified version of the effective type of the object,
    -- an aggregate or union type that includes one of the aforementioned
    types among its members (including, recursively, a member of a
    subaggregate or contained union), or
    -- a character type.

    Doesn't the second to last point mean that writing to member f64 then
    reading from member u32 is well specified in C99?

    If so is this approach conforming:

    double val=strtod("1.23", NULL);
    union u tmp;
    tmp.f64=val;
    printf("%i %i\n", tmp.u32[0], tmp.u32[1]);

    (This eliminates the dubious casts, which is aways a good sign!)

    Regards,
    Adam
    Adam Warner, Jun 29, 2005
    #3
  4. On Wed, 29 Jun 2005 13:29:59 +1200, Adam Warner wrote:

    > Hi all,
    >
    > Message ID <> is one of many informative
    > articles by Chris Torek about C. The particular message discusses aliasing
    > and concludes with this paragraph:
    >
    > Under these strict type-aliasing rules, casting from (e.g.) "int *" to
    > "short *" is not only quite suspicious, it is also likely to cause
    > puzzling behavior, at least if you expect your "short *" to access or
    > modify your "int". Even the time-honored, albeit dubious, practise of
    > breaking a 64-bit IEEE "double" into two 32-bit integers (int or long
    > depending on the CPU involved) via a union need not work, and sometimes
    > does not. (We had a problem with strtod() not working right because of
    > code just like this. It worked in older gcc compilers, and eventually
    > failed when gcc began doing type-specific alias analysis and
    > optimizations.)
    >
    > The code I've written below breaks an 8 byte double into two 4 byte
    > unsigned integers via a union. How should this code be modified so it
    > conforms to C's aliasing rules?


    What is it you want to achieve by doing this? It is inherently
    non-portable even without the aliasing rules. The simple answer would be
    don't do it at all.

    > #include <assert.h>
    > #include <stdint.h>
    > #include <stdlib.h>
    > #include <stdio.h>
    >
    > union u {
    > double f64;
    > uint32_t u32[2];
    > };
    >
    > int main() {
    > assert(sizeof(double)==8);
    > double val=strtod("1.23", NULL);
    > printf("%i %i\n", ((union u) val).u32[0], ((union u) val).u32[1]);
    > return 0;
    > }


    One way to get around the aliasing rules is to memcpy() from a double
    object to a separate array of uint32_t. Or maybe you don't need to use
    uint32_t, you can access any object as a array of unsigned char which
    allows you to access the representation of that object. So

    double f64;
    unsigned char *p = (unsigned char *)&f64;

    and you can access p[0] to p[(sizeof f64)-1]. That's essentially that the
    memcpy() is doing.

    Lawrence
    Lawrence Kirby, Jun 29, 2005
    #4
  5. Adam Warner

    Adam Warner Guest

    On Wed, 29 Jun 2005 12:32:56 +0100, Lawrence Kirby wrote:
    > On Wed, 29 Jun 2005 13:29:59 +1200, Adam Warner wrote:
    >
    >> Hi all,
    >>
    >> Message ID <> is one of many informative
    >> articles by Chris Torek about C. The particular message discusses aliasing
    >> and concludes with this paragraph:
    >>
    >> Under these strict type-aliasing rules, casting from (e.g.) "int *" to
    >> "short *" is not only quite suspicious, it is also likely to cause
    >> puzzling behavior, at least if you expect your "short *" to access or
    >> modify your "int". Even the time-honored, albeit dubious, practise of
    >> breaking a 64-bit IEEE "double" into two 32-bit integers (int or long
    >> depending on the CPU involved) via a union need not work, and sometimes
    >> does not. (We had a problem with strtod() not working right because of
    >> code just like this. It worked in older gcc compilers, and eventually
    >> failed when gcc began doing type-specific alias analysis and
    >> optimizations.)
    >>
    >> The code I've written below breaks an 8 byte double into two 4 byte
    >> unsigned integers via a union. How should this code be modified so it
    >> conforms to C's aliasing rules?

    >
    > What is it you want to achieve by doing this?


    Knowledge of how the issue described above might have been worked around.

    >> #include <assert.h>
    >> #include <stdint.h>
    >> #include <stdlib.h>
    >> #include <stdio.h>
    >>
    >> union u {
    >> double f64;
    >> uint32_t u32[2];
    >> };
    >>
    >> int main() {
    >> assert(sizeof(double)==8);
    >> double val=strtod("1.23", NULL);
    >> printf("%i %i\n", ((union u) val).u32[0], ((union u) val).u32[1]);
    >> return 0;
    >> }

    >
    > One way to get around the aliasing rules is to memcpy() from a double
    > object to a separate array of uint32_t. Or maybe you don't need to use
    > uint32_t, you can access any object as a array of unsigned char which
    > allows you to access the representation of that object. So
    >
    > double f64;
    > unsigned char *p = (unsigned char *)&f64;
    >
    > and you can access p[0] to p[(sizeof f64)-1]. That's essentially that the
    > memcpy() is doing.


    That's two ways I hadn't thought of, thanks.

    Can you please confirm that my followup suggestion to assign the double to
    the union and then access it as integers is also a conforming (to C99)
    approach:

    double val=strtod("1.23", NULL);
    union u tmp;
    tmp.f64=val;
    printf("%i %i\n", tmp.u32[0], tmp.u32[1]);

    Regards,
    Adam
    Adam Warner, Jun 29, 2005
    #5
  6. Adam Warner

    Michael Mair Guest

    Adam Warner wrote:
    > On Wed, 29 Jun 2005 10:22:13 +0200, Grumble wrote:
    >
    >
    >>>The code I've written below breaks an 8 byte double into two 4 byte
    >>>unsigned integers via a union. How should this code be modified so it
    >>>conforms to C's aliasing rules?
    >>>
    >>>#include <assert.h>
    >>>#include <stdint.h>
    >>>#include <stdlib.h>
    >>>#include <stdio.h>
    >>>
    >>>union u {
    >>> double f64;
    >>> uint32_t u32[2];
    >>>};
    >>>
    >>>int main() {
    >>> assert(sizeof(double)==8);
    >>> double val=strtod("1.23", NULL);
    >>> printf("%i %i\n", ((union u) val).u32[0], ((union u) val).u32[1]);
    >>> return 0;
    >>>}

    >>
    >>I am not sure it is safe to cast 'double' to 'union u'.
    >>
    >>In C89, writing to member f64, then reading from member u32 has
    >>implementation-defined behavior - 6.5.2.3 #5.

    >
    >
    > I suspect aliasing rules are better specified in C99 (6.5 #7):
    >
    > An object shall have its stored value accessed only by an lvalue
    > expression that has one of the following types:
    > -- a type compatible with the effective type of the object,
    > -- a qualified version of a type compatible with the effective type of
    > the object,
    > -- a type that is the signed or unsigned type corresponding to the
    > effective type of the object,
    > -- a type that is the signed or unsigned type corresponding to a
    > qualified version of the effective type of the object,
    > -- an aggregate or union type that includes one of the aforementioned
    > types among its members (including, recursively, a member of a
    > subaggregate or contained union), or
    > -- a character type.
    >
    > Doesn't the second to last point mean that writing to member f64 then
    > reading from member u32 is well specified in C99?
    >
    > If so is this approach conforming:
    >
    > double val=strtod("1.23", NULL);
    > union u tmp;
    > tmp.f64=val;
    > printf("%i %i\n", tmp.u32[0], tmp.u32[1]);
    >
    > (This eliminates the dubious casts, which is aways a good sign!)


    I do not have a standard handy right now, so I cannot prove the
    following by chapter and verse; AFAIR there is nothing explicitly
    stating that you can only access a union member you previously stored
    to but for something in the infamous Annex J.
    For members of the same size the only convincing argument I know
    (and saw once upon a time in c.l.c) is that an implementation could
    store different members in different places, e.g. the compiler stores
    a 64bit floating point variable in a register and "leaves" the yet
    unused array where it is in memory. As there is nothing explicitly
    forbidding this, you could see a nasty surprise.
    One can come up with volatile, though.

    Cheers
    Michael
    --
    E-Mail: Mine is an /at/ gmx /dot/ de address.
    Michael Mair, Jun 29, 2005
    #6
  7. In article <>,
    Adam Warner <> wrote:
    >I suspect aliasing rules are better specified in C99 (6.5 #7):


    > An object shall have its stored value accessed only by an lvalue
    > expression that has one of the following types:


    > -- an aggregate or union type that includes one of the aforementioned
    > types among its members (including, recursively, a member of a
    > subaggregate or contained union), or


    >Doesn't the second to last point mean that writing to member f64 then
    >reading from member u32 is well specified in C99?


    I don't have the C99 standard available, but such a thing would be
    a notable departure from C89.

    In C89, it is clear that the only time you can read a union with
    "a different type" than you last stored into it, is in the case
    where the two union elements have a common prefix, so at the lowest
    level you are reading the same type, even if the aggregate type name
    is different.

    One cannot, though, expect this to work if the common prefix is not
    exactly compatable at each element, as there could be differences in
    padding. For example, one might know that sizeof(float) == sizeof(int)
    but if the prefix in one version was a float, and the prefix in the
    other version was an int, then the behaviour of reading the next value
    afterwards is not defined, since the padding for float could be
    different than the padding for int.

    --
    'ignorandus (Latin): "deserving not to be known"'
    -- Journal of Self-Referentialism
    Walter Roberson, Jun 29, 2005
    #7
  8. Adam Warner

    Adam Warner Guest

    On Wed, 29 Jun 2005 22:39:39 +0200, Michael Mair wrote:

    >>>I am not sure it is safe to cast 'double' to 'union u'.
    >>>
    >>>In C89, writing to member f64, then reading from member u32 has
    >>>implementation-defined behavior - 6.5.2.3 #5.

    >>
    >> I suspect aliasing rules are better specified in C99 (6.5 #7):
    >>
    >> An object shall have its stored value accessed only by an lvalue
    >> expression that has one of the following types:
    >> -- a type compatible with the effective type of the object,
    >> -- a qualified version of a type compatible with the effective type of
    >> the object,
    >> -- a type that is the signed or unsigned type corresponding to the
    >> effective type of the object,
    >> -- a type that is the signed or unsigned type corresponding to a
    >> qualified version of the effective type of the object,
    >> -- an aggregate or union type that includes one of the aforementioned
    >> types among its members (including, recursively, a member of a
    >> subaggregate or contained union), or
    >> -- a character type.
    >>
    >> Doesn't the second to last point mean that writing to member f64 then
    >> reading from member u32 is well specified in C99?
    >>
    >> If so is this approach conforming:
    >>
    >> double val=strtod("1.23", NULL);
    >> union u tmp;
    >> tmp.f64=val;
    >> printf("%i %i\n", tmp.u32[0], tmp.u32[1]);
    >>
    >> (This eliminates the dubious casts, which is always a good sign!)

    >
    > I do not have a standard handy right now, so I cannot prove the
    > following by chapter and verse; AFAIR there is nothing explicitly
    > stating that you can only access a union member you previously stored
    > to but for something in the infamous Annex J.


    "The following are unspecified: ... The value of a union member other
    than the last one stored into (6.2.6.1)."

    There appear to be two instances where this is unspecified in 6.2.6.1:

    When a value is stored in an object of structure or union type,
    including in a member object, the bytes of the object representation
    that correspond to any padding bytes take unspecified values.42) The
    values of padding bytes shall not affect whether the value of such an
    object is a trap representation. Those bits of a structure or union
    object that are in the same byte as a bit-field member, but are not
    part of that member, shall similarly not affect whether the value of
    such an object is a trap representation.

    When a value is stored in a member of an object of union type, the
    bytes of the object representation that do not correspond to that
    member but do correspond to other members take unspecified values, but
    the value of the union object shall not thereby become a trap
    representation.

    Unspecified instance 1 is not applicable when there are no corresponding
    padding bytes to take unspecified values. I mapped two 4 byte integers
    onto an 8 byte double (I checked the double was 8 bytes with an assertion).
    As all members of a union are aligned to the same starting address the two
    member objects overlap perfectly.

    Unspecified instance 2 is also not applicable because the bytes of both
    object representations overlap perfectly.

    > For members of the same size the only convincing argument I know
    > (and saw once upon a time in c.l.c) is that an implementation could
    > store different members in different places, e.g. the compiler stores
    > a 64bit floating point variable in a register and "leaves" the yet
    > unused array where it is in memory. As there is nothing explicitly
    > forbidding this, you could see a nasty surprise.
    > One can come up with volatile, though.


    "A union type describes an overlapping nonempty set of member objects
    ...." If member objects behave as if they are stored in different places
    then they don't semantically overlap. I don't think this argument you
    read is at all convincing. Regardless of how member objects within a union
    are implemented they should behave _as if they overlap_.

    Regards,
    Adam
    Adam Warner, Jun 30, 2005
    #8
  9. Adam Warner

    Tim Rentsch Guest

    Michael Mair <> writes:

    > Adam Warner wrote:
    >
    > [... storing into one member of a union, accessing another ...]
    >
    > I do not have a standard handy right now, so I cannot prove the
    > following by chapter and verse; AFAIR there is nothing explicitly
    > stating that you can only access a union member you previously stored
    > to but for something in the infamous Annex J.
    > For members of the same size the only convincing argument I know
    > (and saw once upon a time in c.l.c) is that an implementation could
    > store different members in different places, e.g. the compiler stores
    > a 64bit floating point variable in a register and "leaves" the yet
    > unused array where it is in memory. As there is nothing explicitly
    > forbidding this, you could see a nasty surprise.
    > One can come up with volatile, though.


    My understanding is that the storing one member of a union in
    different memory than another member was the result of unclear
    language in the standard, and that the unclear language is
    expected to be addressed through a TC. See:

    http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_283.htm
    Tim Rentsch, Jul 13, 2005
    #9
  10. Adam Warner

    Michael Mair Guest

    Tim Rentsch wrote:
    > Michael Mair <> writes:
    >
    >
    >>Adam Warner wrote:
    >>
    >>[... storing into one member of a union, accessing another ...]
    >>
    >>I do not have a standard handy right now, so I cannot prove the
    >>following by chapter and verse; AFAIR there is nothing explicitly
    >>stating that you can only access a union member you previously stored
    >>to but for something in the infamous Annex J.
    >>For members of the same size the only convincing argument I know
    >>(and saw once upon a time in c.l.c) is that an implementation could
    >>store different members in different places, e.g. the compiler stores
    >>a 64bit floating point variable in a register and "leaves" the yet
    >>unused array where it is in memory. As there is nothing explicitly
    >>forbidding this, you could see a nasty surprise.
    >>One can come up with volatile, though.

    >
    >
    > My understanding is that the storing one member of a union in
    > different memory than another member was the result of unclear
    > language in the standard, and that the unclear language is
    > expected to be addressed through a TC. See:
    >
    > http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_283.htm


    Thank you very much!
    So, this really is not outlawed but only to be used with care
    (and, usually, in an implementation defined way).


    Cheers :)
    Michael
    --
    E-Mail: Mine is an /at/ gmx /dot/ de address.
    Michael Mair, Jul 13, 2005
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Leslaw Bieniasz

    std::string and strtod()

    Leslaw Bieniasz, Sep 20, 2004, in forum: C++
    Replies:
    3
    Views:
    501
    Matt Hurd
    Sep 21, 2004
  2. Mathieu Malaterre

    strtod / setlocale

    Mathieu Malaterre, Dec 13, 2004, in forum: C++
    Replies:
    1
    Views:
    544
    Buster
    Dec 13, 2004
  3. Marky C

    strtod - Dynamic Memory?

    Marky C, Apr 1, 2004, in forum: C Programming
    Replies:
    20
    Views:
    911
    Keith Thompson
    Apr 4, 2004
  4. Adam Warner

    Aliasing/Torek's strtod() experience

    Adam Warner, Jun 29, 2005, in forum: C Programming
    Replies:
    0
    Views:
    312
    Adam Warner
    Jun 29, 2005
  5. CrazyBoB

    Chris Torek

    CrazyBoB, Jul 26, 2011, in forum: C Programming
    Replies:
    18
    Views:
    1,083
    Keith Thompson
    Aug 10, 2011
Loading...

Share This Page