dereferencing type-punned pointer

Discussion in 'C Programming' started by Billy Mays, Jun 21, 2010.

  1. Billy Mays

    Billy Mays Guest

    I looked in the GCC documentation but didn't get a satisfactory answer.
    I keep getting this warning in GCC with the code segment below.

    file.c:256 "warning: dereferencing type-punned pointer will break
    strict-aliasing rules"

    /*****************************************/
    char buffer[SIZE];
    int length, type;

    /* Code to fill buffer */

    length = ntohs( *((unsigned short *)buffer) );
    type = ntohs( *((unsigned short *)buffer + 1) );

    /*****************************************/


    I'm trying to take the first two bytes (in network byte order) off the
    buffer and put them in the length, followed by the next two bytes into
    the type. GCC complains about the length conversion, but not the type.
    What does this warning mean, and should I bother with it?

    Bill
     
    Billy Mays, Jun 21, 2010
    #1
    1. Advertising

  2. Billy Mays

    Eric Sosman Guest

    On 6/21/2010 5:05 PM, Billy Mays wrote:
    > I looked in the GCC documentation but didn't get a satisfactory answer.
    > I keep getting this warning in GCC with the code segment below.
    >
    > file.c:256 "warning: dereferencing type-punned pointer will break
    > strict-aliasing rules"
    >
    > /*****************************************/
    > char buffer[SIZE];
    > int length, type;
    >
    > /* Code to fill buffer */
    >
    > length = ntohs( *((unsigned short *)buffer) );
    > type = ntohs( *((unsigned short *)buffer + 1) );
    >
    > /*****************************************/
    >
    >
    > I'm trying to take the first two bytes (in network byte order) off the
    > buffer and put them in the length, followed by the next two bytes into
    > the type. GCC complains about the length conversion, but not the type.
    > What does this warning mean, and should I bother with it?


    Strict aliasing means that the compiler assumes an object of
    type T is only pointed to by a T* (or by a char*, to get at the
    underlying array of bytes). Turn this around, and it means the
    compiler can assume that storing through a T* doesn't modify any
    non-T object. Also, storing through a (non-char*) S* won't affect
    anything that's already been fetched via a T*. Pointer-punning
    breaks these assumptions. I believe you can tell gcc not to make
    them, but this may make the code re-fetch values "just in case" an
    unexpected and apparently unrelated pointer trampled on them. In
    other words, it hamstrings the optimizer.

    As to why you get only one complaint rather than two, I can only
    guess that gcc is able to figure out that the unadorned `buffer' in
    the first expression will produce a fairly safe pointer. In the
    second, "an expression involving `buffer' and some other things"
    probably leaves gcc unsure of where the resulting short* might be
    pointing -- maybe into the middle of a double or something, with
    unpleasant results.

    Finally, you've got a potential alignment problem. Since `buffer'
    is a char[], the compiler sees no reason to give it any special
    alignment: A char can reside anywhere. But on many platforms a short
    can only reside at an address that's divisible by two -- and if the
    compiler happens to start buffer on an odd address, you'll be out of
    luck. In theory anything might happen; in practice, you're likely to
    get a program that crashes, or runs correctly but very slowly, or
    runs quickly but incorrectly (I've seen all three of these behaviors).

    There are at least two approaches to fixing both these problems.
    First, you can access the individual bytes yourself and combine them
    into the values you need. It's probably best to use `unsigned char'
    for this method:

    unsigned char buffer[SIZE];
    ...
    length = buffer[0] * 256 + buffer[1];
    type = buffer[2] * 256 + buffer[3];

    Another is to put the buffer and its "overlay" into a union, which
    is a sanctioned way of telling C you intend to access it via multiple
    type aliases. In this case, it's probably better to make the two-
    byte items `unsigned short' (unless you like negative lengths ...):

    union {
    unsigned char buffer[SIZE];
    unsigned short words[2];
    } both;
    ...
    length = ntohs(both.words[0]);
    type = ntohs(both.words[1]);

    There are probably other approaches, too, but these have the
    advantage of being simple. I slightly prefer the first, as it avoids
    assuming that sizeof(short)==2, but either would pass most musters.

    --
    Eric Sosman
    lid
     
    Eric Sosman, Jun 21, 2010
    #2
    1. Advertising

  3. Billy Mays

    Billy Mays Guest

    On 6/21/2010 7:12 PM, William Ahern wrote:
    > Billy Mays<> wrote:
    >> I looked in the GCC documentation but didn't get a satisfactory answer.
    >> I keep getting this warning in GCC with the code segment below.

    >
    >> file.c:256 "warning: dereferencing type-punned pointer will break
    >> strict-aliasing rules"

    >
    >> /*****************************************/
    >> char buffer[SIZE];
    >> int length, type;

    >
    >> /* Code to fill buffer */

    >
    >> length = ntohs( *((unsigned short *)buffer) );
    >> type = ntohs( *((unsigned short *)buffer + 1) );

    >
    >> /*****************************************/

    >
    >
    >> I'm trying to take the first two bytes (in network byte order) off the
    >> buffer and put them in the length, followed by the next two bytes into
    >> the type. GCC complains about the length conversion, but not the type.
    >> What does this warning mean, and should I bother with it?

    >
    > What's wrong with:
    >
    > unsigned char buffer[SIZE];
    > int length, type;
    >
    > ...
    >
    > length = (buffer[0]<< 8 | buffer[1]);
    > length = (buffer[2]<< 8 | buffer[3]);
    >
    > There are several issues with your original code, as mentioned elsethread.
    >
    > Casting buffers to pointers to integers is a very poor habit, IMHO. It's not
    > uncommon, though. I did it when I first began network programming, until I
    > learned better.
    >
    > Also a poor habit, IMO, is storing network data--especially data serialized
    > into "network" byte order--in char buffers. For one thing, signed data is
    > relatively rare in network protocols, and where it exists signedness is
    > typically stored in special formats. More importantly, unexpected things can
    > happen when you convert from signed to unsigned types; and when operating on
    > untrusted network data (which should be _assumed_ untrusted) that kind of
    > trouble can easily lead to nasty bugs and exploits.
    >




    I will change the buffer type to unsigned, it just slipped my mind when
    I wrote it. I was unable to find a "Best Practices" guide for network
    programming. You mentioned "...until I learned better.", what would you
    suggest in place of the code block?

    Bill
     
    Billy Mays, Jun 22, 2010
    #3
  4. Eric Sosman <> writes:

    [...]

    > There are at least two approaches to fixing both these problems.
    > First, you can access the individual bytes yourself and combine them
    > into the values you need. It's probably best to use `unsigned char'
    > for this method:
    >
    > unsigned char buffer[SIZE];
    > ...
    > length = buffer[0] * 256 + buffer[1];
    > type = buffer[2] * 256 + buffer[3];
    >
    > Another is to put the buffer and its "overlay" into a union, which
    > is a sanctioned way of telling C you intend to access it via multiple
    > type aliases. In this case, it's probably better to make the two-
    > byte items `unsigned short' (unless you like negative lengths ...):
    >
    > union {
    > unsigned char buffer[SIZE];
    > unsigned short words[2];
    > } both;
    > ...
    > length = ntohs(both.words[0]);
    > type = ntohs(both.words[1]);
    >
    > There are probably other approaches, too, but these have the
    > advantage of being simple. I slightly prefer the first, as it avoids
    > assuming that sizeof(short)==2, but either would pass most musters.


    The second approache may generate faster code though...

    --
    Francis
     
    Francis Moreau, Jun 22, 2010
    #4
  5. Billy Mays

    Ian Collins Guest

    On 06/23/10 02:43 AM, Billy Mays wrote:
    > On 6/21/2010 7:12 PM, William Ahern wrote:
    >>
    >> What's wrong with:
    >>
    >> unsigned char buffer[SIZE];
    >> int length, type;
    >>
    >> ...
    >>
    >> length = (buffer[0]<< 8 | buffer[1]);
    >> length = (buffer[2]<< 8 | buffer[3]);
    >>
    >> There are several issues with your original code, as mentioned
    >> elsethread.
    >>
    >> Casting buffers to pointers to integers is a very poor habit, IMHO.
    >> It's not
    >> uncommon, though. I did it when I first began network programming,
    >> until I
    >> learned better.
    >>
    >> Also a poor habit, IMO, is storing network data--especially data
    >> serialized
    >> into "network" byte order--in char buffers. For one thing, signed data is
    >> relatively rare in network protocols, and where it exists signedness is
    >> typically stored in special formats. More importantly, unexpected
    >> things can
    >> happen when you convert from signed to unsigned types; and when
    >> operating on
    >> untrusted network data (which should be _assumed_ untrusted) that kind of
    >> trouble can easily lead to nasty bugs and exploits.

    >
    > I will change the buffer type to unsigned, it just slipped my mind when
    > I wrote it. I was unable to find a "Best Practices" guide for network
    > programming. You mentioned "...until I learned better.", what would you
    > suggest in place of the code block?


    William's suggestion is in the message you quoted!

    --
    Ian Collins
     
    Ian Collins, Jun 22, 2010
    #5
  6. William Ahern will...@wilbur.25thandClement.com> wrote:
    > > > William Ahern wrote:
    > > > > length = (buffer[0]<< 8 | buffer[1]);
    > > > > length = (buffer[2]<< 8 | buffer[3]);

    >
    > For the record, I misplaced a paranthesis. It still works fine
    > --and doesn't even need parantheses--but a better example might
    > be:
    >
    > length = (0xffU & (buffer[0] << 8U))
    > | (0xffU & (buffer[1] << 0U));


    There's not much point in the U suffix on 8U and 0U. The operands
    of << are not subject to usual arithmetic promotion.

    --
    Peter
     
    Peter Nilsson, Jun 23, 2010
    #6
  7. On Jun 22, 10:52 pm, William Ahern <will...@wilbur.25thandClement.com>
    wrote:
    > Ian Collins <> wrote:
    > > On 06/23/10 02:43 AM, Billy Mays wrote:
    > > > On 6/21/2010 7:12 PM, William Ahern wrote:
    > > >> length = (buffer[0]<< 8 | buffer[1]);
    > > >> length = (buffer[2]<< 8 | buffer[3]);

    >
    > <snip>
    > > William's suggestion is in the message you quoted!

    >
    > For the record, I misplaced a paranthesis. It still works fine--and doesn't
    > even need parantheses--but a better example might be:
    >
    >         length = (0xffU & (buffer[0] << 8U))
    >                | (0xffU & (buffer[1] << 0U));
    >


    I don't see how this can work.
     
    Francis Moreau, Jun 23, 2010
    #7
  8. Billy Mays

    Guest

    In article <hvom73$l6v$-september.org>,
    Eric Sosman <> wrote:
    > On 6/21/2010 5:05 PM, Billy Mays wrote:
    > > I looked in the GCC documentation but didn't get a satisfactory answer.
    > > I keep getting this warning in GCC with the code segment below.


    [ snip ]

    > Another is to put the buffer and its "overlay" into a union, which
    > is a sanctioned way of telling C you intend to access it via multiple
    > type aliases.


    It is? Usually you seem to know what you're talking about,
    but I was sure that not long ago I had been somewhat surprised to
    discover that what the standard says about unions is almost exactly
    the opposite -- that you can use the same memory to store, say,
    either an integer or four characters, but if you store a value
    into it as integer, you're supposed to read it as an integer too,
    and an attempt to instead read it as four characters might or
    might not work. Am I not understanding .... ?

    [ snip ]

    --
    B. L. Massingill
    ObDisclaimer: I don't speak for my employers; they return the favor.
     
    , Jun 23, 2010
    #8
  9. Billy Mays

    Eric Sosman Guest

    On 6/23/2010 4:23 PM, wrote:
    > In article<hvom73$l6v$-september.org>,
    > Eric Sosman<> wrote:
    >> On 6/21/2010 5:05 PM, Billy Mays wrote:
    >>> I looked in the GCC documentation but didn't get a satisfactory answer.
    >>> I keep getting this warning in GCC with the code segment below.

    >
    > [ snip ]
    >
    >> Another is to put the buffer and its "overlay" into a union, which
    >> is a sanctioned way of telling C you intend to access it via multiple
    >> type aliases.

    >
    > It is? Usually you seem to know what you're talking about,


    Keep reading; I'll disabuse you ...

    > but I was sure that not long ago I had been somewhat surprised to
    > discover that what the standard says about unions is almost exactly
    > the opposite -- that you can use the same memory to store, say,
    > either an integer or four characters, but if you store a value
    > into it as integer, you're supposed to read it as an integer too,
    > and an attempt to instead read it as four characters might or
    > might not work. Am I not understanding .... ?


    Maybe you're understanding that I thought and wrote sloppily.
    For most pairs of types, you're right:

    union { int i; double d; } u = { 42 }; // u.i == 42
    int before = u.i; // gets 42
    u.d = 0.0;
    int after = u.i; // could still get 42

    The argument runs somewhat like "If the second fetch is valid, it
    must get the same value as the first one (because it fetches an int
    and we haven't modified any ints), so we can use the same 42 we've
    already got in a CPU register. If the fetch is not valid, we can
    *still* use the same register-resident 42 because the behavior is
    undefined and any result is as good as any other."

    But things get fuzzier when one of the types is some flavor of
    char, because it's always permitted to peek and poke the individual
    bytes of a (non-register, non-bit-field) object. If you store to a
    char and the compiler can't prove that the char isn't part of some
    bigger object, the compiler has to assume that the bigger object's
    value may have changed:

    union { int i; unsigned char c; ) u = { 42 };
    int before = u.i; // gets 42
    u.c ^= -1;
    int after = u.i; // must re-fetch (I think)

    Since the code stores to a char that the compiler cannot prove is
    disjoint from u.i (because it's not), I think it must regard any
    cached value as potentially stale. (Note that *all* the bytes of
    u.i are potentially stale; storing to the single byte of u.c may
    disturb the other bytes in u.)

    So, "sanctioned" may be too strong. It depends, I guess, on
    how you read the list of allowable aliases in 6.5p7: Are the accesses
    being made to u.i and u.c "independently," or are they being made
    to u itself? Surely, storing to u.c changes the value of u and hence
    u is being accessed -- but it's certainly open to debate.

    In summary, I'm sorta glad I stated a preference for the other
    approach ...

    --
    Eric Sosman
    lid
     
    Eric Sosman, Jun 23, 2010
    #9
  10. Billy Mays

    Tim Rentsch Guest

    Eric Sosman <> writes:

    > On 6/23/2010 4:23 PM, wrote:
    >> In article<hvom73$l6v$-september.org>,
    >> Eric Sosman<> wrote:
    >>> On 6/21/2010 5:05 PM, Billy Mays wrote:
    >>>> I looked in the GCC documentation but didn't get a satisfactory answer.
    >>>> I keep getting this warning in GCC with the code segment below.

    >>
    >> [ snip ]
    >>
    >>> Another is to put the buffer and its "overlay" into a union, which
    >>> is a sanctioned way of telling C you intend to access it via multiple
    >>> type aliases.

    >>
    >> It is? Usually you seem to know what you're talking about,

    >
    > Keep reading; I'll disabuse you ...
    >
    >> but I was sure that not long ago I had been somewhat surprised to
    >> discover that what the standard says about unions is almost exactly
    >> the opposite -- that you can use the same memory to store, say,
    >> either an integer or four characters, but if you store a value
    >> into it as integer, you're supposed to read it as an integer too,
    >> and an attempt to instead read it as four characters might or
    >> might not work. Am I not understanding .... ?

    >
    > Maybe you're understanding that I thought and wrote sloppily.
    > For most pairs of types, you're right:
    >
    > union { int i; double d; } u = { 42 }; // u.i == 42
    > int before = u.i; // gets 42
    > u.d = 0.0;
    > int after = u.i; // could still get 42
    >
    > The argument runs somewhat like "If the second fetch is valid, it
    > must get the same value as the first one (because it fetches an int
    > and we haven't modified any ints), so we can use the same 42 we've
    > already got in a CPU register. If the fetch is not valid, we can
    > *still* use the same register-resident 42 because the behavior is
    > undefined and any result is as good as any other."


    The behavior is not undefined. It depends on implementation-defined
    encodings, which can produce undefined behavior if (double) has trap
    representations. Or it can produce a (partially) unspecified value if
    sizeof (double) > sizeof (int). But accessing 'u.d' after storing into
    'u.i' is not undefined behavior ipso facto. If (double) has no trap
    representations, the value of 'u.d' is implementation-defined, with
    unspecified values for any bytes of the object representation of
    (double) that lie beyond sizeof (int). Reading one member of a union
    after storing into another member is required to reinterpret the bytes
    of the stored member as the object representation of the read member
    (including unspecified values for any bytes beyond that of the stored
    member).


    > [snip stuff about character types]
     
    Tim Rentsch, Jun 23, 2010
    #10
  11. Billy Mays

    Eric Sosman Guest

    On 6/23/2010 6:17 PM, Tim Rentsch wrote:
    > Eric Sosman<> writes:
    >
    >> On 6/23/2010 4:23 PM, wrote:
    >>> In article<hvom73$l6v$-september.org>,
    >>> Eric Sosman<> wrote:
    >>>> On 6/21/2010 5:05 PM, Billy Mays wrote:
    >>>>> I looked in the GCC documentation but didn't get a satisfactory answer.
    >>>>> I keep getting this warning in GCC with the code segment below.
    >>>
    >>> [ snip ]
    >>>
    >>>> Another is to put the buffer and its "overlay" into a union, which
    >>>> is a sanctioned way of telling C you intend to access it via multiple
    >>>> type aliases.
    >>>
    >>> It is? Usually you seem to know what you're talking about,

    >>
    >> Keep reading; I'll disabuse you ...
    >>
    >>> but I was sure that not long ago I had been somewhat surprised to
    >>> discover that what the standard says about unions is almost exactly
    >>> the opposite -- that you can use the same memory to store, say,
    >>> either an integer or four characters, but if you store a value
    >>> into it as integer, you're supposed to read it as an integer too,
    >>> and an attempt to instead read it as four characters might or
    >>> might not work. Am I not understanding .... ?

    >>
    >> Maybe you're understanding that I thought and wrote sloppily.
    >> For most pairs of types, you're right:
    >>
    >> union { int i; double d; } u = { 42 }; // u.i == 42
    >> int before = u.i; // gets 42
    >> u.d = 0.0;
    >> int after = u.i; // could still get 42
    >>
    >> The argument runs somewhat like "If the second fetch is valid, it
    >> must get the same value as the first one (because it fetches an int
    >> and we haven't modified any ints), so we can use the same 42 we've
    >> already got in a CPU register. If the fetch is not valid, we can
    >> *still* use the same register-resident 42 because the behavior is
    >> undefined and any result is as good as any other."

    >
    > The behavior is not undefined. It depends on implementation-defined
    > encodings, which can produce undefined behavior if (double) has trap
    > representations. Or it can produce a (partially) unspecified value if
    > sizeof (double)> sizeof (int). But accessing 'u.d' after storing into
    > 'u.i' is not undefined behavior ipso facto. If (double) has no trap
    > representations, the value of 'u.d' is implementation-defined, with
    > unspecified values for any bytes of the object representation of
    > (double) that lie beyond sizeof (int).


    Okay: There's something wrong here, possibly just a momentary
    dyxlesia. After the code fragment shown, the value of u.d is zero,
    not implementation-defined and not a trap representation. It's u.i
    that's questionable -- or have I overlooked some additional point
    you're making?

    > Reading one member of a union
    > after storing into another member is required to reinterpret the bytes
    > of the stored member as the object representation of the read member
    > (including unspecified values for any bytes beyond that of the stored
    > member).


    Can you cite chapter and verse for the requirement? Such a
    requirement would certainly simplify things, but does the Standard
    actually have such a thing?

    --
    Eric Sosman
    lid
     
    Eric Sosman, Jun 24, 2010
    #11
  12. Billy Mays

    Guest

    <> wrote:
    >
    > but I was sure that not long ago I had been somewhat surprised to
    > discover that what the standard says about unions is almost exactly
    > the opposite -- that you can use the same memory to store, say,
    > either an integer or four characters, but if you store a value
    > into it as integer, you're supposed to read it as an integer too,
    > and an attempt to instead read it as four characters might or
    > might not work. Am I not understanding .... ?


    It depends to some extent on which standard you're looking at: C99
    provides significantly more guarantees than C90 did.
    --
    Larry Jones

    Hmm... That might not be politic. -- Calvin
     
    , Jun 24, 2010
    #12
  13. Billy Mays

    Tim Rentsch Guest

    Eric Sosman <> writes:

    > On 6/23/2010 6:17 PM, Tim Rentsch wrote:
    >> Eric Sosman<> writes:
    >>
    >>> On 6/23/2010 4:23 PM, wrote:
    >>>> In article<hvom73$l6v$-september.org>,
    >>>> Eric Sosman<> wrote:
    >>>>> On 6/21/2010 5:05 PM, Billy Mays wrote:
    >>>>>> I looked in the GCC documentation but didn't get a satisfactory answer.
    >>>>>> I keep getting this warning in GCC with the code segment below.
    >>>>
    >>>> [ snip ]
    >>>>
    >>>>> Another is to put the buffer and its "overlay" into a union, which
    >>>>> is a sanctioned way of telling C you intend to access it via multiple
    >>>>> type aliases.
    >>>>
    >>>> It is? Usually you seem to know what you're talking about,
    >>>
    >>> Keep reading; I'll disabuse you ...
    >>>
    >>>> but I was sure that not long ago I had been somewhat surprised to
    >>>> discover that what the standard says about unions is almost exactly
    >>>> the opposite -- that you can use the same memory to store, say,
    >>>> either an integer or four characters, but if you store a value
    >>>> into it as integer, you're supposed to read it as an integer too,
    >>>> and an attempt to instead read it as four characters might or
    >>>> might not work. Am I not understanding .... ?
    >>>
    >>> Maybe you're understanding that I thought and wrote sloppily.
    >>> For most pairs of types, you're right:
    >>>
    >>> union { int i; double d; } u = { 42 }; // u.i == 42
    >>> int before = u.i; // gets 42
    >>> u.d = 0.0;
    >>> int after = u.i; // could still get 42
    >>>
    >>> The argument runs somewhat like "If the second fetch is valid, it
    >>> must get the same value as the first one (because it fetches an int
    >>> and we haven't modified any ints), so we can use the same 42 we've
    >>> already got in a CPU register. If the fetch is not valid, we can
    >>> *still* use the same register-resident 42 because the behavior is
    >>> undefined and any result is as good as any other."

    >>
    >> The behavior is not undefined. It depends on implementation-defined
    >> encodings, which can produce undefined behavior if (double) has trap
    >> representations. Or it can produce a (partially) unspecified value if
    >> sizeof (double)> sizeof (int). But accessing 'u.d' after storing into
    >> 'u.i' is not undefined behavior ipso facto. If (double) has no trap
    >> representations, the value of 'u.d' is implementation-defined, with
    >> unspecified values for any bytes of the object representation of
    >> (double) that lie beyond sizeof (int).

    >
    > Okay: There's something wrong here, possibly just a momentary
    > dyxlesia. After the code fragment shown, the value of u.d is zero,
    > not implementation-defined and not a trap representation. It's u.i
    > that's questionable -- or have I overlooked some additional point
    > you're making?


    Sorry, yes, it's 'u.d' that has the stored value and definitely == 0,
    and it's 'u.i' that is being read and has the uncertain value. I
    mixed them up in my head. (note to self: do not, do not, do /not/
    post when rushed...) Presumably everyone can see the necessary
    exchanges to make so that the bass-ackwards comments from Tim
    Rentsch make sense here.


    >> Reading one member of a union
    >> after storing into another member is required to reinterpret the bytes
    >> of the stored member as the object representation of the read member
    >> (including unspecified values for any bytes beyond that of the stored
    >> member).

    >
    > Can you cite chapter and verse for the requirement? Such a
    > requirement would certainly simplify things, but does the Standard
    > actually have such a thing?


    It's defined by the rules for accessing stored values and by the
    definition of union types. Union types are defined in 6.2.5p20
    (the /'s indicate italic type, meaning a defined term):

    A /union type/ describes an overlapping nonempty set of member
    objects, each of which has an optionally specified name and
    possibly distinct type.

    The rules for reading or writing a member of a union are the same as
    for any other object (because there are no statements of different
    behavior), except there is 6.2.6.1p7:

    When a value is stored in a member of an object of union type,
    the bytes of the object representation that do not correspond to
    that member but do correspond to other members take unspecified
    values.

    There isn't any statement that's specific to objects corresponding
    to union members because it's the same as accessing other objects.
    (There also is a provision that storing into a union member allows
    padding bytes to take unspecified values, but obviously that doesn't
    change things beyond what 6.2.6.1p7 allows.)

    Because union members overlap in a well-defined way, and because
    accessing through an lvalue is defined to read or write the
    bytes that make up the object being accessed, accessing one
    union member after storing another is defined. This conclusion
    is noted in a footnote (not normative, but it gives support to
    the idea that the conclusion is correct) to 6.5.2.3p3 (footnote
    82):

    If the member used to access the contents of a union object is
    not the same as the member last used to store a value in the
    object, the appropriate part of the object representation of the
    value is reinterpreted as an object representation in the new
    type as described in 6.2.6 (a process sometimes called "type
    punning"). This might be a trap representation.

    It possible in some cases (involving pointers) that storing into one
    member and reading from another doesn't work because the rules for
    effective type are transgressed. (And which cases might fall into
    that category is a whole other discussion, because the effective type
    rules are not always as clear cut as they might be in such cases.)
    However, in this example, both 'u.i' and 'u.d' have declared types
    that are compatible (obviously) with the types being used to access
    the (overlapping) objects in question, so the effective type rules
    are necessarily satisfied in the example code and other cases like
    it.

    The one remaining piece is, what defines how values of one type
    appear when interpreted as a different type? I believe this result
    is a consequence of 6.2.6.1p2:

    Except for bit-fields, objects are composed of contiguous
    sequences of one or more bytes, the number, order, and
    encoding of which are either explicitly specified or
    implementation-defined.

    I take this statement as requiring the object representations for
    each type be implementation-defined (within the limits of what other
    specifications in the Standard require). That fills in the last piece.

    These excerpts, plus other parts of the Standard saying how accesses
    work generally, provide the key parts of my argument (or defense?)
    for the claim that reading from a not-the-last-stored member of
    a union must reinterpret the bytes of the last-stored member.


    (P.S. I was confused at one point while writing/editing this reply;
    I wanted to jump to a particular point in the text, and I was
    trying to find it by searching for 'dyslexia'.)
     
    Tim Rentsch, Jun 24, 2010
    #13
  14. On Thu, 24 Jun 2010, wrote:

    > <> wrote:
    >>
    >> but I was sure that not long ago I had been somewhat surprised to
    >> discover that what the standard says about unions is almost exactly
    >> the opposite -- that you can use the same memory to store, say,
    >> either an integer or four characters, but if you store a value
    >> into it as integer, you're supposed to read it as an integer too,
    >> and an attempt to instead read it as four characters might or
    >> might not work. Am I not understanding .... ?

    >
    > It depends to some extent on which standard you're looking at: C99
    > provides significantly more guarantees than C90 did.


    I'll try to restore the context here, because when I read what Eric wrote,
    I had to stop and think bit too -- I was having C89 in mind:


    On Mon, 21 Jun 2010, Eric Sosman wrote:

    > Another is to put the buffer and its "overlay" into a union, which is a
    > sanctioned way of telling C you intend to access it via multiple type
    > aliases. In this case, it's probably better to make the two-byte items
    > `unsigned short' (unless you like negative lengths ...):
    >
    > union {
    > unsigned char buffer[SIZE];
    > unsigned short words[2];
    > } both;
    > ...
    > length = ntohs(both.words[0]);
    > type = ntohs(both.words[1]);


    The idea presumably being, one fills in "both.buffer" first, in network
    byte order, then accesses those parts of storage through "both.words".

    I didn't have doubts about "buffer" and "words" starting at the same
    correctly aligned address, but I did doubt whether the technique would be
    allowed "everywhere" (especially wrt. SUSv[12], which are based on C89).

    IMVHO Eric is right, and I thank him for the idea (I used memcpy() before
    for such purposes); C89 6.3.2.3 "Structure and union members" says:

    [...] With one exception, if a member of a union object is accessed after
    a value has been stored in different member of the object, the behavior is
    implementation-defined. ^{41} One special guarantee is made in order to
    simplify the use of unions: If a union contains several structures that
    share a common initial sequence (see below), and if the union object
    currently contains one of these structures, it is permitted to inspect the
    common initial part of any of them. [...]"

    The relevant part here is exactly *not* the special exception of the
    common initial sequence, but the *implementation-defined* nature of the
    access pattern in question. That is, if one can ensure that on a given
    implementation the access' result won't lead later to a trap
    representation or some other undefined behavior, everything's OK.

    Footnote 41: "The `byte orders' for scalar types are invisible to isolated
    programs that do not indulge in type punning (for example, by assigning to
    one member of a union and inspecting the storage by accessing another
    member that is an appropriately sized array of character type), but must
    be accounted for when conforming to externally imposed storage layouts".

    The parts that are important to me now are: "The `byte orders' for scalar
    types [...] must be accounted for when conforming to externally imposed
    storage layouts".

    Thus the normative part calls the access implementation-dependent, and the
    informative part actually endorses caring about the byte order when doing
    network IO.

    [going OT] If only we had some C89-based standard, offering a way to make
    the further result of that implementation-dependent access safe for some
    bigger types, on all implementations conforming to that standard!

    Oh wait, we have. I'll modify the code a bit:

    For SUSv1:

    #define _XOPEN_SOURCE /* SUSv1 */
    #define _XOPEN_SOURCE_EXTENDED 1 /* X/Open UNIX Extension */

    #include <limits.h> /* CHAR_BIT */
    #include <arpa/inet.h> /* in_port_t, ntohs() */

    #if CHAR_BIT != 8
    # error "CHAR_BIT != 8, sorry"
    #endif

    union {
    in_port_t words[2];
    char unsigned buffer[4];
    } both;

    length = ntohs(both.words[0]);
    type = ntohs(both.words[1]);

    "in_port_t" is defined in <netinet/in.h>, and also made visible by
    <arpa/inet.h>:

    ----v----
    The <netinet/in.h> header defines the following types through typedef:

    in_port_t An unsigned integral type of exactly 16 bits.
    ----^----

    And

    ----v----
    htonl, htons, ntohl, ntohs -- convert values between host and network byte
    order

    [...]

    in_port_t ntohs(in_port_t netshort);

    [...]

    These functions convert 16-bit and 32-bit quantities between network byte
    order and host byte order.

    [...]
    ----^----

    That is, "in_port_t" ensures that no trap representation is possible, and
    ntohs() covers the implementation-dependent part.


    For SUSv2:

    #define _XOPEN_SOURCE 500 /* SUSv2 */

    #include <arpa/inet.h> /* uint16_t, ntohs() */

    union {
    uint16_t words[2];
    char unsigned buffer[4];
    /* might as well use uint8_t */
    } both;

    length = ntohs(both.words[0]);
    type = ntohs(both.words[1]);

    I removed the CHAR_BIT check, because CHAR_BIT must be at least 8
    (CHAR_BIT >= 8), and uint8_t is a required type (which can't be smaller
    than a char, 8 >= CHAR_BIT).

    Sorry for this jumble again :(

    Cheers,
    lacos
     
    Ersek, Laszlo, Jun 24, 2010
    #14
  15. Billy Mays

    Guest

    In article <hvtu9v$c4i$-september.org>,
    Eric Sosman <> wrote:
    > On 6/23/2010 4:23 PM, wrote:
    > > In article<hvom73$l6v$-september.org>,
    > > Eric Sosman<> wrote:
    > >> On 6/21/2010 5:05 PM, Billy Mays wrote:
    > >>> I looked in the GCC documentation but didn't get a satisfactory answer.
    > >>> I keep getting this warning in GCC with the code segment below.

    > >
    > > [ snip ]
    > >
    > >> Another is to put the buffer and its "overlay" into a union, which
    > >> is a sanctioned way of telling C you intend to access it via multiple
    > >> type aliases.

    > >
    > > It is? Usually you seem to know what you're talking about,

    >
    > Keep reading; I'll disabuse you ...


    :)

    Replying here to thank everyone who has responded so far -- it may
    take me a while to read through all the replies carefully, but it
    sounds like I was either confused by C90-versus-C99 differences or
    just plain wrong, and in any case I'm glad to hear about details.

    [ snip ]

    --
    B. L. Massingill
    ObDisclaimer: I don't speak for my employers; they return the favor.
     
    , Jun 24, 2010
    #15
  16. Billy Mays

    Tim Rentsch Guest

    writes:

    > <> wrote:
    >>
    >> but I was sure that not long ago I had been somewhat surprised to
    >> discover that what the standard says about unions is almost exactly
    >> the opposite -- that you can use the same memory to store, say,
    >> either an integer or four characters, but if you store a value
    >> into it as integer, you're supposed to read it as an integer too,
    >> and an attempt to instead read it as four characters might or
    >> might not work. Am I not understanding .... ?

    >
    > It depends to some extent on which standard you're looking at: C99
    > provides significantly more guarantees than C90 did.


    Although this is true in principle, does it make any
    significant difference in practice? Are there any C90
    implementations whose implementation-defined behavior
    here is different from what C99 requires -- reinterpret
    the bytes as the type of the member read, with undefined
    behavior if the object representation doesn't constitute
    a valid value of of that type (with the understanding that
    C90 doesn't have all the terminology that C99 does but the
    ideas can be back projected in an obvious way)?
     
    Tim Rentsch, Jun 26, 2010
    #16
  17. Billy Mays

    Guest

    Tim Rentsch <> wrote:
    > writes:
    > > It depends to some extent on which standard you're looking at: C99
    > > provides significantly more guarantees than C90 did.

    >
    > Although this is true in principle, does it make any
    > significant difference in practice?


    No. C99 just clarified what everyone knows really happens.
    --
    Larry Jones

    I always have to help Dad establish the proper context. -- Calvin
     
    , Jun 26, 2010
    #17
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. David Ford

    help with type-punned warning please

    David Ford, Feb 26, 2004, in forum: C Programming
    Replies:
    16
    Views:
    681
    Chris Torek
    Feb 28, 2004
  2. David Mathog
    Replies:
    3
    Views:
    718
    Chris Torek
    Jul 5, 2007
  3. David Mathog

    dereferencing type-punned pointer, redux

    David Mathog, Jul 10, 2007, in forum: C Programming
    Replies:
    1
    Views:
    669
    Eric Sosman
    Jul 10, 2007
  4. SG
    Replies:
    0
    Views:
    501
  5. John May

    Dereferencing type-punned pointer

    John May, Jul 18, 2012, in forum: C Programming
    Replies:
    8
    Views:
    468
    Nobody
    Jul 20, 2012
Loading...

Share This Page