bitfield confusion

Discussion in 'C Programming' started by mathog, Jul 11, 2013.

  1. mathog

    mathog Guest

    I am having one of those days - what I am doing wrong here?

    1. The Microsoft EMF+ specification, section 2.2.2.19

    http://msdn.microsoft.com/en-us/library/cc231004.aspx

    says that GraphicsVersion objects are 32 bits as:

    0-19 Metafile Signature
    20-31 GraphicsVersion enumeration

    2. So I defined what I thought would be the corresponding struct, at
    least for Intel platforms. This assumes bitfields are listed in the
    struct from least to most significant bits, perhaps they are the other
    way around? Or is this one of those areas where the compiler can do
    anything it wants?

    typedef struct {
    unsigned int Signature : 20;
    unsigned int GrfVersion : 12;
    } U_PMF_GRAPHICSVERSION;

    3. Opened a file with EMF+ records and found the corresponding 32 bits.
    This is on an Intel architecture machine and the file was made by
    Powerpoint on this machine. Examine the value various ways with this code:

    U_PMF_GRAPHICSVERSION Version;
    printf("DEBUG at offset:%8.8X\n",
    *(uint32_t *)contents);
    printf("DEBUG at offset by byte:%2.2X %2.2X %2.2X %2.2X\n",
    *(uint8_t *)(contents + 0),
    *(uint8_t *)(contents + 1),
    *(uint8_t *)(contents + 2),
    *(uint8_t *)(contents + 3)
    );
    memcpy(&Version, contents, sizeof(U_PMF_GRAPHICSVERSION));
    printf("DEBUG Sig:%X GrfV:%X\n",
    Version.Signature, Version.GrfVersion);

    The output is

    DEBUG at offset:DBC01002
    DEBUG at offset by byte:02 10 C0 DB
    DEBUG Sig:1002 GrfV:DBC

    For an EMF+ file signature must be 0xDBC01, and version can be 2.

    I must be screwing up somewhere, but where? The first two DEBUG lines
    are consistent with this being a little endian system. "DB" is clearly
    at the most significant bit end of the
    32 bits, but the EMF+ specification appears to say that it should be
    somewhere in the middle. For a little endian machine, doesn't (1) say
    that for sig == DBC01 sig and version == 002 the bytes in the file
    should be: 01 BC 2D 00 ?? Swapping the order of the bit fields in the
    struct above does put DBC01 in Sig and 2 in GrfV, but it does not seem
    to be consistent with the documentation.

    Thanks,

    David Mathog
     
    mathog, Jul 11, 2013
    #1
    1. Advertisements

  2. mathog

    James Kuyper Guest

    Almost. There are only a few restrictions imposed by the C standard on
    the allocation of bit-fields. The relevant requirements are in terms
    addressable storage units, about which the standard says very little -
    it does not say how big they are, and does not require the
    implementation to document how big they are, nor does it require that
    the size be the same in all contexts.

    "... If enough space remains, a bit-field that immediately follows
    another bit-field in a structure shall be packed into adjacent bits of
    the same unit. If insufficient space remains, whether a bit-field that
    does not fit is put into the next unit or overlaps adjacent units is
    implementation-defined. The order of allocation of bit-fields within a
    unit (high-order to low-order or low-order to high-order) is
    implementation-defined. The alignment of the addressable storage unit is
    unspecified." (6.7.2.1p6)
    "... As a special case, a bit-field structure member with a width of 0
    indicates that no further bit-field is to be packed into the unit in
    which the previous bitfield, if any, was placed." (6.7.2.1p7)

    What the standard fails to guarantee about bit-field layouts renders
    them useless for such purposes, at least in code that needs to be
    portable. That's a bit of a shame, because such purposes would otherwise
    be overwhelming the most popular reasons for using them.
     
    James Kuyper, Jul 11, 2013
    #2
    1. Advertisements

  3. mathog

    Lew Pitcher Guest

    The C standard doesn't define how bitfields are ordered within the
    underlying storage, leaving that up to the implementation. For Microsoft C
    compilers (eg Visual Studio), bitfields are allocated from lo-order to
    hi-order against the underlying integer used as storage for bitfields.
    (See http://msdn.microsoft.com/en-us/library/yszfawxh(v=vs.80).aspx)

    Your structure, when mapped by a MS C compiler, would result in
    Signature being mapped to the 2^19 through 2^0 bits, and
    GrfVersion being mapped to the 2^31 through 2^20 bits
    of the underlying unsigned integer that the bit fields are packed into.

    3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1
    1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
    |<----GrfVersion------->|<-------------Signature--------------->|

    This is not a bit-level mapping of storage; it is a logical mapping against
    the interpreted value in storage; in other words, for Microsoft C products,
    the bitfields are interpreted in the same little-endian order that the
    underlying integer type is interpreted.

    When you memcpy()ed the 4 byte value into your structure, you, in effect,
    set that underlying integer value to 0xDBC01002, not 0x0201C0DB.
    Consequently, GrfVersion mapped to the 0xDBC portion of the underlying
    integer value, and Signature mapped to the 0x01002 portion.

    To fix this, change your structure to match the compiler's implicit bitmap
    mapping:

    typedef struct {
    unsigned int GrfVersion : 12;
    unsigned int Signature : 20;
    } U_PMF_GRAPHICSVERSION;

    This will map GrfVersion to the 12 lo-order bits of the underlying integer
    used as bitmap storage (in your case, the 0x002), and Signature to the 20
    hi-order bits of the underlying integer (in your case, the 0xDBC01).

    HTH
     
    Lew Pitcher, Jul 12, 2013
    #3
  4. mathog

    JohnF Guest

    [...] just forget those bitfields entirely, and do it the hard way,
    for example,
    /* ---
    * bitfield macros (byte_bits=76543210, with lsb=bit#0 and 128=bit#7set)
    * --------------------------------------------------------------------- */
    #define getbit(x,bit) ( ((x) >> (bit)) & 1 ) /* get bit-th bit of x */
    #define setbit(x,bit) ( (x) |= (1<<(bit)) ) /* set bit-th bit of x */
    #define clearbit(x,bit) ( (x) &= ~(1<<(bit)) ) /* clear bit-th bit of x */
    #define putbit(x,bit,val) \
    if(((int)(val))==0) clearbit((x),(bit)); else setbit((x),(bit))
    #define bitmask(nbits) ((1<<(nbits))-1) /* a mask of nbits 1's */
    #define getbitfield(x,bit1,nbits) (((x)>>(bit1)) & (bitmask(nbits)))
    #define putbitfield(x,bit1,nbits,val) /* x:bit1...bit1+nbits-1 = val */ \
    if ( (nbits)>0 && (bit1)>=0 ) { /* check input */ \
    (x) &= (~((bitmask((nbits))) << (bit1))); /*set field=0's*/ \
    (x) |= (((val)&(bitmask((nbits)))) << (bit1)); /*set field=val*/ \
    } else /* let user supply final ; */
    this will be a little more portable.
     
    JohnF, Jul 12, 2013
    #4
  5. It's safer (more portable and reliable) to define bitfields manually - i.e.
    with masks and shifts. There is some simple C code showing how to access and
    manipulate such fields at

    http://codewiki.wikispaces.com/bitfield_operations.c

    James
     
    James Harris \(es\), Jul 12, 2013
    #5
  6. mathog

    Rosario1903 Guest

    i not see the definition of contents...
    i suppose "uint8_t *contents;"

    if contents is a pointer to uint8_t [or int8_t or char if char is 8
    bit the same for unsigned char] this would print the first 4
    contiguous chars

    if contents is a pointer to u32 or pointer to int [or long or float]
    this would print the first char of the first 4 elements
    of the array of u32 [or int or long or float] etc
    "contents" point to
     
    Rosario1903, Jul 12, 2013
    #6
  7. mathog

    Ian Collins Guest

    I really don't understand why people get so hung up about bit fields, or
    why they'd want to muck about with shifts and masks. That low level
    stuff is the compiler's job. It isn't rocket science to determine the
    order of bit fields (my day to day platform has preprocessor macros for
    this) and to use them correctly and portably.
     
    Ian Collins, Jul 12, 2013
    #7
  8. Controlling the layout and contents of a byte/char/int/word/etc. on a
    bit by bit basis is indispensable in embedded work, were is
    commonplace to have to read and write hardware registers containing
    several fields defined by bit position and width in bits.

    Not being able to use bitfields in a portable way, (because of the
    implementation dependent aspects,) leaves no choice but to muck about
    with shifts and masks ...
     
    Roberto Waltman, Jul 12, 2013
    #8
  9. mathog

    Les Cargill Guest


    I've principally used them for FPGA register maps
    and certain binary comms protocols. In neither case must they
    be fully portable - a different architecture would likely
    mean a different FPGA anyway.

    And for network protocols, it's not hard to have a header file per
    "endianness" permutation.

    Bit fields are way better than bit shifts and macros.
     
    Les Cargill, Jul 12, 2013
    #9
  10. mathog

    Les Cargill Guest


    In general, if the target boards have changed enough to where bit field
    orientation matters, you'll have other portability fun as well.
     
    Les Cargill, Jul 12, 2013
    #10
  11. mathog

    James Kuyper Guest

    Well, I don't like using shifts and masks, but I use them because I
    don't known how to use bit-fields to correctly and portably parse an
    externally defined data structure. Would you care to demonstrate how it
    is done? To make things concrete, lets consider the following case:

    The raw data has a record size of 32 bits. The fields in that record
    have lengths of 10, 10, and 12 bits, respectively. I would use the
    following shift and mask instructions to extract them from an unsigned
    char buffer of length 4:

    #if CHAR_BIT != 8
    #error this code requires CHAR_BIT == 8
    #endif
    unsigned field1 = buffer[0] << 2 | buffer[1] >> 6;
    unsigned field2 = (buffer[1] & 0x3f) << 6 | buffer[2] >> 4;
    unsigned field3 = (buffer[2] & 0x0F) << 4 | buffer[3];
    If I were doing a lot of this, I'd define appropriate macros to simplify
    the extraction of those bit-fields, but this is what those macros would
    expand to.

    What would your code using bit-fields and preprocessor macros to extract
    these fields look like, given that it must be portable to both of the
    following fully-conforming implementations of C (among others)?

    Implementation A uses addressable storage units with a size of 16 bits,
    interprets those units as little-endian 16-bit integers. It forces
    consecutive bit-fields to share a storage unit, even if that means that
    they will have to cross a storage unit boundary. It assigns bit-fields
    to the bits of those 16-bit integers in order from high to low.

    Implementation B uses addressable storage units with a size of 8 bits.
    Bit-fields that are too big to fit in the remaining space of one storage
    unit start in the next storage unit. Within each storage unit, bits are
    assigned to bit-fields in order from low to high.
     
    James Kuyper, Jul 13, 2013
    #11
  12. mathog

    Ian Collins Guest

    No it doesn't. I've been happily using bit fields for register mappings
    for the past three decades :) Nine times out of ten an embedded project
    uses a single compiler, so even if portability was an issue (which it
    seldom is) it is irrelevant. For driver development on bigger systems,
    the compilers for that system use the same mapping, and the platform
    provides appropriate architecture specific macros. An example from a
    Solaris header:

    struct ip {
    #ifdef _BIT_FIELDS_LTOH
    uchar_t ip_hl:4, /* header length */
    ip_v:4; /* version */
    #else
    uchar_t ip_v:4, /* version */
    ip_hl:4; /* header length */
    #endif
     
    Ian Collins, Jul 13, 2013
    #12
  13. mathog

    mathog Guest

    I found the problem, finally. Section 1.3.2 of the EMF+ documentation says:

    Data in the EMF+ metafile records are stored in little-endian format.

    Some computer architectures number bytes in a binary word from left
    to right, which is referred to as big-endian. The byte numbering used
    for bitfields in this specification is big-endian. Other
    architectures number the bytes in a binary word from right to left,
    which is referred to as little-endian. The byte numbering used for
    enumerations, objects, and records in this specification is little-
    endian.

    Why in the world would they use big-endian for bitfields and
    little-endian for everything else???????

    Thanks,

    David Mathog
     
    mathog, Jul 13, 2013
    #13
  14. mathog

    Ian Collins Guest

    Before we get to that, I'd just like to make it clear that I'm not
    claiming bit-fields are 100% applicable, more like 90+%. In all of the
    real wold code I've worked with they have been an appropriate solution.
    I have yet to encounter two compilers for the same platform that use
    different bit-field ordering. I know this isn't guaranteed, but in the
    real world this is one area where common sense prevails.
    In the absence of bit-fields, you should always use functions (or if you
    have a strong stomach, macros) for bit manipulation, otherwise you are
    vulnerable to changes in the layout.
    Given that set of requirements, I would parse the data once into a
    naturally aligned struct and work with that. I would consider this more
    of a data serialisation task.
     
    Ian Collins, Jul 13, 2013
    #14
  15. mathog

    mathog Guest

    (sorry, posted this in the wrong place a minute ago, so this is a duplicate)

    I found the problem, finally. Section 1.3.2 of the EMF+ documentation says:

    Data in the EMF+ metafile records are stored in little-endian format.

    Some computer architectures number bytes in a binary word from left
    to right, which is referred to as big-endian. The byte numbering used
    for bitfields in this specification is big-endian. Other
    architectures number the bytes in a binary word from right to left,
    which is referred to as little-endian. The byte numbering used for
    enumerations, objects, and records in this specification is little-
    endian.

    Why in the world would they use big-endian for bitfields and
    little-endian for everything else???????

    Thanks,

    David Mathog
     
    mathog, Jul 13, 2013
    #15
  16. mathog

    Eric Sosman Guest

    Bit-fields are useless as a means of mapping an externally-
    defined format portably.

    This is just a special case of "structs are useless as a
    means of mapping an externally-defined format portably," except
    that when the struct has bit-fields it's even worse.

    For a specified compiler and target you may be privy to
    extra information that allows you to define a struct (with or
    without bit-fields) that matches a particular externally-defined
    format. But don't kid yourself by imagining that the recipe
    for one compiler/target pair will work with the next.

    As for macros -- Well, let's just start and end with the
    observation that the nature and size of the "addressable storage
    unit" that holds bit-fields is entirely the implementation's
    prerogative, and since the implementation is not even obliged
    to document it (it is "unspecified," not "implementation-defined")
    the only way you can define your macros is by hoping the compiler
    tells you more than is required, or by resorting to guesswork
    and hope.

    Structs (with or without bit-fields) tempt you, they seduce
    you, they lead you on and go nudge-nudge-wink-wink to entice
    you into using them to map external formats. But you'll hate
    yourself in the morning, even if you don't find yourself in the
    gutter minus your wallet and watch and plus a venereal disease.
     
    Eric Sosman, Jul 13, 2013
    #16
  17. mathog

    Eric Sosman Guest

    For portability.

    ;-)
     
    Eric Sosman, Jul 13, 2013
    #17
  18. mathog

    Ian Collins Guest

    Tell that to the writers of most platform's IP headers.
    "portability" has many meanings, ranging from "theoretically portable"
    to "works on windows"...
    Some do (like IP related headers) and some don't. How you tackle
    marshaling them depends on the detail. Would you use masks to extract
    data from and insert data into an IP header, or would you use the
    structs provided by the system?
    As I said in another response, in all of the real wold situations I have
    seen, common sense prevails here.
    :)
     
    Ian Collins, Jul 13, 2013
    #18
  19. mathog

    Eric Sosman Guest

    I hope for your sake it's not penicillin-resistant.
     
    Eric Sosman, Jul 13, 2013
    #19
  20. mathog

    Joe Pfeiffer Guest

    Exactly. I don't remember whether it was moving code from 68K to
    Sparc, or from Sun Sparc Solaris to i386 Linux that did it... but
    virtually all my bitfields were broken. I don't think I've used one
    since that particular burning.
     
    Joe Pfeiffer, Jul 13, 2013
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.