Portability issues (union, bitfields)

Discussion in 'C Programming' started by Noob, Nov 4, 2009.

  1. Noob

    Noob Guest

    Hello,

    I'm dealing with a library whose author seems to have relied implicitly
    on several non-portable features. I'm trying to expose every assumption.
    (In the context of C89).

    typedef struct {
    unsigned int aa : 1;
    unsigned int bb : 1;
    unsigned int cc : 1;
    unsigned int dd : 1;
    unsigned int ee : 1;
    unsigned int ff : 1;
    unsigned int gg : 1;
    unsigned int hh : 1;
    unsigned int reserved : 24;
    } bitmap_t;

    typedef union {
    unsigned char uc;
    bitmap_t map;
    } foo_t;

    unsigned frob(unsigned u)
    {
    foo_t foo;
    foo.map.aa = (u >> 0) & 1;
    foo.map.bb = (u >> 1) & 1;
    foo.map.cc = (u >> 2) & 1;
    foo.map.dd = (u >> 3) & 1;
    foo.map.ee = (u >> 4) & 1;
    foo.map.ff = (u >> 5) & 1;
    foo.map.gg = (u >> 6) & 1;
    foo.map.hh = (u >> 7) & 1;
    /* garbage in foo.map.reserved */
    return foo.uc;
    }

    int main(void)
    {
    unsigned res;
    /* my tests */
    res = frob(1); /* 0000 0001 */
    res = frob(0x40); /* 0100 0000 */
    res = frob(0xAC); /* 1010 1100 */
    res = frob(0x35); /* 0011 0101 */
    return 0;
    }

    1. The definition of bitmap_t seems to imply that the author thinks
    (unsigned int) is at least (or exactly) 32-bits wide.
    I think we get UB on platforms where (unsigned int) is only 16-bits
    wide? Even if the "reserved" field is never accessed?

    2. (I'm not sure about this one.) foo is written to using the map field,
    then read from using a different field. This specific instance might be
    OK, because uc's type is unsigned char?

    3. The code seems to assume field "aa" maps to the least-significant bit
    of the parameter (bit 0), "bb" maps to bit 1, etc.
    Consider frob(1);
    foo.map.aa <- 1
    all other fields <- 0
    reserved is left undefined

    The assumption seems to be:
    For any 0 <= u <= 255, frob(u) == u

    Moreover, "uc" will typically be only 8-bits wide, while "map" is
    32-bits wide. Is there any guarantee whether which bits of "uc" and
    "map" overlap? (May depend on endianness?)

    Did I miss any more assumptions?

    Regards.
     
    Noob, Nov 4, 2009
    #1
    1. Advertising

  2. Noob

    Eric Sosman Guest

    Noob wrote:
    > Hello,
    >
    > I'm dealing with a library whose author seems to have relied implicitly
    > on several non-portable features. I'm trying to expose every assumption.
    > (In the context of C89).
    >
    > typedef struct {
    > unsigned int aa : 1;
    > unsigned int bb : 1;
    > unsigned int cc : 1;
    > unsigned int dd : 1;
    > unsigned int ee : 1;
    > unsigned int ff : 1;
    > unsigned int gg : 1;
    > unsigned int hh : 1;
    > unsigned int reserved : 24;
    > } bitmap_t;
    >
    > typedef union {
    > unsigned char uc;
    > bitmap_t map;
    > } foo_t;
    >
    > unsigned frob(unsigned u)
    > {
    > foo_t foo;
    > foo.map.aa = (u >> 0) & 1;
    > foo.map.bb = (u >> 1) & 1;
    > foo.map.cc = (u >> 2) & 1;
    > foo.map.dd = (u >> 3) & 1;
    > foo.map.ee = (u >> 4) & 1;
    > foo.map.ff = (u >> 5) & 1;
    > foo.map.gg = (u >> 6) & 1;
    > foo.map.hh = (u >> 7) & 1;
    > /* garbage in foo.map.reserved */
    > return foo.uc;
    > }
    >
    > int main(void)
    > {
    > unsigned res;
    > /* my tests */
    > res = frob(1); /* 0000 0001 */
    > res = frob(0x40); /* 0100 0000 */
    > res = frob(0xAC); /* 1010 1100 */
    > res = frob(0x35); /* 0011 0101 */
    > return 0;
    > }
    >
    > 1. The definition of bitmap_t seems to imply that the author thinks
    > (unsigned int) is at least (or exactly) 32-bits wide.


    I don't believe so. He assumes that an int is at least 24
    bits wide, but I don't see an assumption of any specific >=24-bit
    width.

    > I think we get UB on platforms where (unsigned int) is only 16-bits
    > wide? Even if the "reserved" field is never accessed?


    Since a bit-field cannot (portably) be wider than an int,
    and since an int can be as narrow as sixteen bits, it's possible
    that the attempt to declare a 24-bit field may fail.

    > 2. (I'm not sure about this one.) foo is written to using the map field,
    > then read from using a different field. This specific instance might be
    > OK, because uc's type is unsigned char?


    The value of foo.uc is unspecified. Since unsigned char has no
    trap representations you won't get UB by fetching it, but there's
    no telling what value you'll get.

    > 3. The code seems to assume field "aa" maps to the least-significant bit
    > of the parameter (bit 0), "bb" maps to bit 1, etc.


    The assumption is unwarranted, but I'm not sure the code
    really makes the assumption. We'd have to know something about
    the expected/intended inputs and outputs to know what the
    assumptions are. Maybe this code is used to discover something
    about the way a particular compiler lays out bit-fields?

    > Consider frob(1);
    > foo.map.aa <- 1
    > all other fields <- 0
    > reserved is left undefined
    >
    > The assumption seems to be:
    > For any 0 <= u <= 255, frob(u) == u


    As above, I can't tell what's being assumed. If that's the
    desired transformation, there are portable (and easier!) ways
    to achieve it.

    > Moreover, "uc" will typically be only 8-bits wide, while "map" is
    > 32-bits wide. Is there any guarantee whether which bits of "uc" and
    > "map" overlap? (May depend on endianness?)


    No guarantees, or at any rate very few. All we know is

    1) The bit-fields are packed into "addressable storage
    units" of a size that the compiler chooses (but is not
    required to document, as far as I know).

    2) Since an ASU is at least one byte long and aa...hh will
    all fit in one byte, they will occupy the same ASU.

    3) If the ASU has at least 32 bits (and if int has at least
    24), reserved will occupy the same ASU as aa...hh.

    3a) Otherwise, reserved may occupy an ASU of its own, or
    may "straddle" multiple adjacent ASU's, possibly using
    part of the ASU containing aa...hh.

    4) We know that the ASU(s) containing reserved will not
    precede the ASU containing aa...hh.

    We don't know the size of the ASU (I think the ASU's for
    different bit-fields may even have different sizes), and we
    don't know which of an ASU's bits are used for which fields,
    we don't know whether the fields are "tightly" or "loosely"
    packed, and we don't know what values any "slack" bits in
    the ASU's might take.

    > Did I miss any more assumptions?


    Hard to tell. It would be helpful to know what the code
    is trying to do, or thinks it's trying to do.

    --
    Eric Sosman
    lid
     
    Eric Sosman, Nov 4, 2009
    #2
    1. Advertising

  3. Noob

    Noob Guest

    Eric Sosman wrote:

    > Noob wrote:
    >
    >> I'm dealing with a library whose author seems to have relied implicitly
    >> on several non-portable features. I'm trying to expose every assumption.
    >> (In the context of C89).
    >>
    >> typedef struct {
    >> unsigned int aa : 1;
    >> unsigned int bb : 1;
    >> unsigned int cc : 1;
    >> unsigned int dd : 1;
    >> unsigned int ee : 1;
    >> unsigned int ff : 1;
    >> unsigned int gg : 1;
    >> unsigned int hh : 1;
    >> unsigned int reserved : 24;
    >> } bitmap_t;
    >>
    >> typedef union {
    >> unsigned char uc;
    >> bitmap_t map;
    >> } foo_t;
    >>
    >> unsigned frob(unsigned u)
    >> {
    >> foo_t foo;
    >> foo.map.aa = (u >> 0) & 1;
    >> foo.map.bb = (u >> 1) & 1;
    >> foo.map.cc = (u >> 2) & 1;
    >> foo.map.dd = (u >> 3) & 1;
    >> foo.map.ee = (u >> 4) & 1;
    >> foo.map.ff = (u >> 5) & 1;
    >> foo.map.gg = (u >> 6) & 1;
    >> foo.map.hh = (u >> 7) & 1;
    >> /* garbage in foo.map.reserved */
    >> return foo.uc;
    >> }
    >>
    >> int main(void)
    >> {
    >> unsigned res;
    >> /* my tests */
    >> res = frob(1); /* 0000 0001 */
    >> res = frob(0x40); /* 0100 0000 */
    >> res = frob(0xAC); /* 1010 1100 */
    >> res = frob(0x35); /* 0011 0101 */
    >> return 0;
    >> }
    >>
    >> 1. The definition of bitmap_t seems to imply that the author thinks
    >> (unsigned int) is at least (or exactly) 32-bits wide.

    >
    > I don't believe so. He assumes that an int is at least 24
    > bits wide, but I don't see an assumption of any specific >=24-bit
    > width.


    You are correct when you say the code only assumes width >= 24, but IMO,
    it is clear that, in the author's mind, width == 32 and he is padding
    the struct "by hand" to fill the 32 bits.

    I don't think it has ever crossed the author's mind that an int could be
    24-bits wide. (IMHO, not many people who call themselves "C programmers"
    are aware than an int could be 24-bits wide.)

    >> I think we get UB on platforms where (unsigned int) is only 16-bits
    >> wide? Even if the "reserved" field is never accessed?

    >
    > Since a bit-field cannot (portably) be wider than an int,
    > and since an int can be as narrow as sixteen bits, it's possible
    > that the attempt to declare a 24-bit field may fail.


    What does it mean for a declaration to fail?
    Compiler warning then UB?

    >> 2. (I'm not sure about this one.) foo is written to using the map field,
    >> then read from using a different field. This specific instance might be
    >> OK, because uc's type is unsigned char?

    >
    > The value of foo.uc is unspecified. Since unsigned char has no
    > trap representations you won't get UB by fetching it, but there's
    > no telling what value you'll get.


    Do you agree that, on some platforms, the bits in the union will map to
    the bits of foo.uc? Do you say its value is unspecified because this
    might not be the case (point 3) or for some other reason?

    >> 3. The code seems to assume field "aa" maps to the least-significant bit
    >> of the parameter (bit 0), "bb" maps to bit 1, etc.

    >
    > The assumption is unwarranted, but I'm not sure the code
    > really makes the assumption. We'd have to know something about
    > the expected/intended inputs and outputs to know what the
    > assumptions are. Maybe this code is used to discover something
    > about the way a particular compiler lays out bit-fields?


    You're right, I did leave out some critical piece of information (a
    comment) which stated :

    /* Bit 0 : aa is used for X
    Bit 1 : bb is used for Y
    ... */

    The bit mask is used by the library user to request specific features.

    For example, if the user wants features aa and cc, then he calls

    frob(1<<0 | 1<<2);

    Then frob does the little dance with the bit field, but needs to pass a
    bit mask down the chain. The obvious answer would be to never introduce
    any bit fields, and to work with the masks all the way (as was done
    before), but (IMO) the author is convinced that the new code is easier
    to maintain because the meaning of each bit is spelled out in the
    field's name. This happens to work because GCC packs the bits least
    significant-bit-first, but it will break with a vengeance if we ever
    move to a different compiler, or if GCC suddenly changes the bit order
    (though that seems rather unlikely).

    >> Consider frob(1);
    >> foo.map.aa <- 1
    >> all other fields <- 0
    >> reserved is left undefined
    >>
    >> The assumption seems to be:
    >> For any 0 <= u <= 255, frob(u) == u

    >
    > As above, I can't tell what's being assumed. If that's the
    > desired transformation, there are portable (and easier!) ways
    > to achieve it.


    Yes, wrapper macros seem to nicely solve the problem of portability and
    maintainability. I was told that bit fields are "nicer" to debug. (They
    may have a point, but nicer at the cost of hell breaking loose when we
    change compilers seems like a hefty price to pay.)

    >> Moreover, "uc" will typically be only 8-bits wide, while "map" is
    >> 32-bits wide. Is there any guarantee whether which bits of "uc" and
    >> "map" overlap? (May depend on endianness?)

    >
    > No guarantees, or at any rate very few. All we know is
    >
    > 1) The bit-fields are packed into "addressable storage
    > units" of a size that the compiler chooses (but is not
    > required to document, as far as I know).
    >
    > 2) Since an ASU is at least one byte long and aa...hh will
    > all fit in one byte, they will occupy the same ASU.
    >
    > 3) If the ASU has at least 32 bits (and if int has at least
    > 24), reserved will occupy the same ASU as aa...hh.
    >
    > 3a) Otherwise, reserved may occupy an ASU of its own, or
    > may "straddle" multiple adjacent ASU's, possibly using
    > part of the ASU containing aa...hh.
    >
    > 4) We know that the ASU(s) containing reserved will not
    > precede the ASU containing aa...hh.
    >
    > We don't know the size of the ASU (I think the ASU's for
    > different bit-fields may even have different sizes), and we
    > don't know which of an ASU's bits are used for which fields,
    > we don't know whether the fields are "tightly" or "loosely"
    > packed, and we don't know what values any "slack" bits in
    > the ASU's might take.
    >
    >> Did I miss any more assumptions?

    >
    > Hard to tell. It would be helpful to know what the code
    > is trying to do, or thinks it's trying to do.
     
    Noob, Nov 5, 2009
    #3
  4. Noob

    Eric Sosman Guest

    Noob wrote:
    > Eric Sosman wrote:
    >
    >> Noob wrote:
    >>> [... bit-fields in a struct, union-punned with unsigned char ...]
    >>> 1. The definition of bitmap_t seems to imply that the author thinks
    >>> (unsigned int) is at least (or exactly) 32-bits wide.

    >> I don't believe so. He assumes that an int is at least 24
    >> bits wide, but I don't see an assumption of any specific >=24-bit
    >> width.

    >
    > You are correct when you say the code only assumes width >= 24, but IMO,
    > it is clear that, in the author's mind, width == 32 and he is padding
    > the struct "by hand" to fill the 32 bits.


    Yes, the author almost certainly had "thirty-two" somewhere
    in the back of his brain. But since the code makes no use of the
    24-bit padding field, I'm still not sure that the assumption of
    a 32-bit int is really relevant.

    > I don't think it has ever crossed the author's mind that an int could be
    > 24-bits wide. (IMHO, not many people who call themselves "C programmers"
    > are aware than an int could be 24-bits wide.)


    It has been many years since I used a machine with 24-bit
    words (four six-bit characters per word). I doubt we'll see
    such things again in general-purpose machines. (Special-purpose
    hardware may be a different story.)

    >> Since a bit-field cannot (portably) be wider than an int,
    >> and since an int can be as narrow as sixteen bits, it's possible
    >> that the attempt to declare a 24-bit field may fail.

    >
    > What does it mean for a declaration to fail?
    > Compiler warning then UB?


    With a 16-bit int, say, the `unsigned int reserved : 24;'
    struct member would be an invalid declaration. It would "fail"
    in the same way that `int array[-42];' would "fail."

    The exact requirement is that the specified width shall not
    exceed the width of the bit-field's base type, and it's in a
    Constraints section (6.7.2.1p3) so a diagnostic is required for
    violations. 6.7.2.1p4 goes on to list the allowable base types:
    _Bool (C99 only), the two flavors of int, and "some other
    implementation-defined type." No diagnostic is required for
    the use of base types beyond the required three -- but the
    implementation is not obliged to accept them, either.

    As far as I can see, there is no 100% portable way to
    specify a 24-bit bit-field. The widest 100% portable base type
    is int, and int could be as narrow as 16 bits, and you're stuck.
    You could specify `unsigned int reserved : 24;' and hope int is
    wide enough, or you could write `unsigned long reserved : 24;'
    and hope the implementation accepts long (necessarily >=32 bits),
    but your hopes might be dashed either way.

    >>> 2. (I'm not sure about this one.) foo is written to using the map field,
    >>> then read from using a different field. This specific instance might be
    >>> OK, because uc's type is unsigned char?

    >> The value of foo.uc is unspecified. Since unsigned char has no
    >> trap representations you won't get UB by fetching it, but there's
    >> no telling what value you'll get.

    >
    > Do you agree that, on some platforms, the bits in the union will map to
    > the bits of foo.uc? Do you say its value is unspecified because this
    > might not be the case (point 3) or for some other reason?


    To the first, yes. To the second, I'm relying on 6.2.6.1p7:

    When a value is stored in a member of an object of union
    type, the bytes of the object representation that do not
    correspond to that member but do correspond to other
    members take unspecified values, [...]

    In the case at hand, values are stored in the struct member of
    a union, and then an unsigned char member is fetched. We know
    that the struct's eight 1-bit bit-fields occupy the first
    "addressable storage unit" in the struct, and that the ASU is
    the first thing in that struct and hence the first thing in the
    union. We also know that the unsigned char is the first thing
    in the union -- but we don't know how big the ASU is, nor which
    of its bits hold the bit-fields. The union's first byte -- the
    unsigned char fetched at the end -- might be in an unused part
    of the ASU, and the values of the bits that correspond to no
    member of the struct are unspecified.

    That's how I understand it, anyhow.

    >>> 3. The code seems to assume field "aa" maps to the least-significant bit
    >>> of the parameter (bit 0), "bb" maps to bit 1, etc.

    >> The assumption is unwarranted, but I'm not sure the code
    >> really makes the assumption. We'd have to know something about
    >> the expected/intended inputs and outputs to know what the
    >> assumptions are. Maybe this code is used to discover something
    >> about the way a particular compiler lays out bit-fields?

    >
    > You're right, I did leave out some critical piece of information (a
    > comment) which stated :
    >
    > /* Bit 0 : aa is used for X
    > Bit 1 : bb is used for Y
    > ... */
    >
    > The bit mask is used by the library user to request specific features.
    >
    > For example, if the user wants features aa and cc, then he calls
    >
    > frob(1<<0 | 1<<2);
    >
    > Then frob does the little dance with the bit field, but needs to pass a
    > bit mask down the chain. The obvious answer would be to never introduce
    > any bit fields, and to work with the masks all the way (as was done
    > before), but (IMO) the author is convinced that the new code is easier
    > to maintain because the meaning of each bit is spelled out in the
    > field's name. This happens to work because GCC packs the bits least
    > significant-bit-first, but it will break with a vengeance if we ever
    > move to a different compiler, or if GCC suddenly changes the bit order
    > (though that seems rather unlikely).


    He could keep using bit-fields (aside from the problematic
    24-bit field, which he doesn't seem to need anyhow). It's the
    type-punning that makes the trouble: He goes to all this trouble
    to set and clear the bit-fields without assuming anything about
    their order, and then he messes it up with a different assumption.
    Why not just pass the struct and its bit-fields around? (And why
    not jettison that 24-bit element, if it's not used?)

    --
    Eric Sosman
    lid
     
    Eric Sosman, Nov 5, 2009
    #4
  5. Noob

    John Temples Guest

    On 2009-11-05, Eric Sosman <> wrote:
    >> I don't think it has ever crossed the author's mind that an int could be
    >> 24-bits wide. (IMHO, not many people who call themselves "C programmers"
    >> are aware than an int could be 24-bits wide.)

    >
    > It has been many years since I used a machine with 24-bit
    > words (four six-bit characters per word). I doubt we'll see
    > such things again in general-purpose machines. (Special-purpose
    > hardware may be a different story.)


    Some compilers for 8-bit platforms support a 24-bit integer type via
    an extension such as "short long". But I'm not aware of any that
    allow "int" to become 24 bits with a compile-time option.

    --
    John W. Temples, III
     
    John Temples, Nov 5, 2009
    #5
  6. Noob

    Seebs Guest

    On 2009-11-05, John Temples <> wrote:
    > Some compilers for 8-bit platforms support a 24-bit integer type via
    > an extension such as "short long". But I'm not aware of any that
    > allow "int" to become 24 bits with a compile-time option.


    I'd doubt you'd see them outside custom DSP work, but I've almost
    certainly got a machine in my basement containing a 24-bit DSP.

    -s
    --
    Copyright 2009, all wrongs reversed. Peter Seebach /
    http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
    http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
     
    Seebs, Nov 5, 2009
    #6
  7. Noob

    Walter Banks Guest

    Seebs wrote:

    > On 2009-11-05, John Temples <> wrote:
    > > Some compilers for 8-bit platforms support a 24-bit integer type via
    > > an extension such as "short long". But I'm not aware of any that
    > > allow "int" to become 24 bits with a compile-time option.

    >
    > I'd doubt you'd see them outside custom DSP work, but I've almost
    > certainly got a machine in my basement containing a 24-bit DSP.


    There are quite a few 24 bit processors with an int of 24 bits.

    We have 24bit int data type support on some 8 bit embedded
    system compilers. The type is defined as a size specific int rather
    than some weird combination of long and short. 24 bit ints
    fit the needs of many applications and significantly reduce cycle
    counts and RAM requirements on 8 bit processors

    Walter..
     
    Walter Banks, Nov 5, 2009
    #7
  8. Noob

    John Temples Guest

    On 2009-11-05, Seebs <> wrote:
    > On 2009-11-05, John Temples <> wrote:
    >> Some compilers for 8-bit platforms support a 24-bit integer type via
    >> an extension such as "short long". But I'm not aware of any that
    >> allow "int" to become 24 bits with a compile-time option.

    >
    > I'd doubt you'd see them outside custom DSP work,


    No, just conventional 8-bit processors. On an 8-bit CPU, working with
    24 bits generates less code for math, argument passing, etc., than
    working with 32 bits.

    --
    John W. Temples, III
     
    John Temples, Nov 6, 2009
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Matt Garman
    Replies:
    1
    Views:
    669
    Matt Garman
    Apr 25, 2004
  2. MJL

    portability issues

    MJL, Aug 1, 2004, in forum: C++
    Replies:
    11
    Views:
    821
  3. Peter Dunker

    union in struct without union name

    Peter Dunker, Apr 26, 2004, in forum: C Programming
    Replies:
    2
    Views:
    875
    Chris Torek
    Apr 26, 2004
  4. Replies:
    7
    Views:
    344
    Keith Thompson
    Jan 9, 2006
  5. OzBob

    Portability / compatibility issues

    OzBob, Jan 15, 2006, in forum: C Programming
    Replies:
    23
    Views:
    847
    Keith Thompson
    Jan 16, 2006
Loading...

Share This Page