Storing 4 characters using 3 bytes

Discussion in 'C Programming' started by becte, Oct 15, 2003.

  1. becte

    becte Guest

    I need to use three bytes to store four 6-bit integers (4 * 6 = 3 * 8)
    like this
    11111122|22223333|33444444
    Suppose the input is, int c1, c2, c3, c4, range 0 .. 2^6 -1
    and the output is int o1,o2,o3, range 0 .. 2^8-1
    How to do this in a clever way?

    (The 6 bits integers represent characters in range A-Z and 0-9)
    becte, Oct 15, 2003
    #1
    1. Advertising

  2. Greetings.

    In article <>, becte wrote:
    > I need to use three bytes to store four 6-bit integers (4 * 6 = 3 * 8)
    > like this
    > 11111122|22223333|33444444
    > Suppose the input is, int c1, c2, c3, c4, range 0 .. 2^6 -1
    > and the output is int o1,o2,o3, range 0 .. 2^8-1
    > How to do this in a clever way?


    Sounds suspiciously like a homework problem, so you're unlikely to get
    complete solutions here. Have you read up on C's bit-shifting and bitwise
    comparison operators? If not, do so and ask again if you still need help.

    Regards,
    Tristan

    --
    _
    _V.-o Tristan Miller [en,(fr,de,ia)] >< Space is limited
    / |`-' -=-=-=-=-=-=-=-=-=-=-=-=-=-=-= <> In a haiku, so it's hard
    (7_\\ http://www.nothingisreal.com/ >< To finish what you
    Tristan Miller, Oct 15, 2003
    #2
    1. Advertising

  3. (becte) wrote in
    news::

    > I need to use three bytes to store four 6-bit integers (4 * 6 = 3 * 8)
    > like this
    > 11111122|22223333|33444444
    > Suppose the input is, int c1, c2, c3, c4, range 0 .. 2^6 -1
    > and the output is int o1,o2,o3, range 0 .. 2^8-1
    > How to do this in a clever way?
    >
    > (The 6 bits integers represent characters in range A-Z and 0-9)


    How about the untested:

    #define PACKED_INT_RANGE 0x3F /* 6-bits */

    /* pPacked had better be 3 bytes or more
    */
    int packInts(int first, int second, int third, int fourth, char *pPacked)
    {
    int failures = !0;

    if ((first & PACKED_INT_RANGE) == first
    && (second & PACKED_INT_RANGE) == second
    && (third & PACKED_INT_RANGE) == third
    && (fourth & PACKED_INT_RANGE) == fourth)
    {
    pPacked[0] = (first & PACKED_INT_RANGE) << 2;
    pPacked[0] |= (second & PACKED_INT_RANGE) >> 4;
    pPacked[1] = (second & PACKED_INT_RANGE) << 4;
    pPacked[1] |= (third & PACKED_INT_RANGE) >> 2;
    pPacked[2] = (third & PACKED_INT_RANGE) << 6;
    pPacked[2] |= (fourth & PACKED_INT_RANGE) >> 0;

    failures = 0;
    }

    return failures;
    }


    --
    - Mark ->
    --
    Mark A. Odell, Oct 15, 2003
    #3
  4. "becte" <> wrote in message
    news:...
    > I need to use three bytes to store four 6-bit integers (4 * 6 = 3 * 8)
    > like this
    > 11111122|22223333|33444444
    > Suppose the input is, int c1, c2, c3, c4, range 0 .. 2^6 -1
    > and the output is int o1,o2,o3, range 0 .. 2^8-1
    > How to do this in a clever way?
    >
    > (The 6 bits integers represent characters in range A-Z and 0-9)


    You could store the 4 6-bit values contiguously into a 32-bit
    unsigned long, then split these 24-bit value back into 3 bytes.

    I think you should give it a try, and post some code if you
    need additional help. All you need is the bit-shift and
    bit-or/bit-and operators ( << >> | & ).


    Cheers,
    Ivan
    --
    http://ivan.vecerina.com
    Ivan Vecerina, Oct 15, 2003
    #4
  5. becte

    Eric Sosman Guest

    becte wrote:
    >
    > I need to use three bytes to store four 6-bit integers (4 * 6 = 3 * 8)
    > like this
    > 11111122|22223333|33444444
    > Suppose the input is, int c1, c2, c3, c4, range 0 .. 2^6 -1
    > and the output is int o1,o2,o3, range 0 .. 2^8-1
    > How to do this in a clever way?


    "Straightforward" suffices:

    o1 = (c1 << 2) | (c2 >> 4);
    o2 = ((c2 & 0xF) << 4) | (c3 >> 2);
    03 = ((c3 & 0x3) << 2) | c4;

    For "clever" you might try something like this (requires
    an `int' of 25 or more bits):

    o1 = (c1 << 18) | (c2 << 12) | (c3 << 6) | c4;
    o3 = o1 & 0xFF;
    o1 >>= 8;
    o2 = o1 & 0xFF;
    o1 >>= 8;

    --
    Eric Sosman, Oct 15, 2003
    #5
  6. Mark A. Odell <> spoke thus:

    > int packInts(int first, int second, int third, int fourth, char *pPacked)
    > int failures = !0;


    Just FMI, this assumes that !0 isn't a trap representation and that CHAR_BIT
    is 8, right? Does it do anything else non-portable?

    --
    Christopher Benson-Manica | I *should* know what I'm talking about - if I
    ataru(at)cyberspace.org | don't, I need to know. Flames welcome.
    Christopher Benson-Manica, Oct 15, 2003
    #6
  7. Christopher Benson-Manica <> scribbled the following:
    > Mark A. Odell <> spoke thus:
    >> int packInts(int first, int second, int third, int fourth, char *pPacked)
    >> int failures = !0;


    > Just FMI, this assumes that !0 isn't a trap representation and that CHAR_BIT
    > is 8, right? Does it do anything else non-portable?


    Umm... what do you think !0 equals?

    --
    /-- Joona Palaste () ------------- Finland --------\
    \-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
    "When a man talks dirty to a woman, that's sexual harassment. When a woman talks
    dirty to a man, that's 14.99 per minute + local telephone charges!"
    - Ruben Stiller
    Joona I Palaste, Oct 15, 2003
    #7
  8. Joona I Palaste <> spoke thus:

    > Umm... what do you think !0 equals?


    Um... nevermind? :(

    --
    Christopher Benson-Manica | I *should* know what I'm talking about - if I
    ataru(at)cyberspace.org | don't, I need to know. Flames welcome.
    Christopher Benson-Manica, Oct 15, 2003
    #8
  9. becte

    Nudge Guest

    becte wrote:
    > I need to use three bytes to store four 6-bit integers (4 * 6 = 3 * 8)
    > like this
    > 11111122|22223333|33444444
    > Suppose the input is, int c1, c2, c3, c4, range 0 .. 2^6 -1
    > and the output is int o1,o2,o3, range 0 .. 2^8-1
    > How to do this in a clever way?


    How about bit fields?
    Nudge, Oct 15, 2003
    #9
  10. becte

    cody Guest

    cody, Oct 15, 2003
    #10
  11. becte

    Chris Torek Guest

    In article <bmkdmb$n8b4a$-berlin.de>
    cody <> writes:
    >typedef union
    >{
    > char chars[3];
    > struct
    > {
    > int _1 : 6;
    > int _2 : 6;
    > int _3 : 6;
    > int _4 : 6;
    > } BITS
    >
    >} PACKED


    Even once the syntax errors are fixed, the code simply does not
    work at all on the SPARC. Try it and see.

    (You will find that c[0] is never changed. This is a hint as to
    what is going wrong. :) )
    --
    In-Real-Life: Chris Torek, Wind River Systems
    Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
    email: forget about it http://67.40.109.61/torek/index.html (for the moment)
    Reading email is like searching for food in the garbage, thanks to spammers.
    Chris Torek, Oct 15, 2003
    #11
  12. "cody" <> wrote:

    >typedef union
    >{
    > char chars[3];
    > struct
    > {
    > int _1 : 6;
    > int _2 : 6;
    > int _3 : 6;
    > int _4 : 6;
    > } BITS
    >
    >} PACKED


    If you'd followed the thread "How to extract bytes from a long?"
    you'd already know this is calling for nasal demons.

    Regards
    --
    Irrwahn
    ()
    Irrwahn Grausewitz, Oct 15, 2003
    #12
  13. becte

    cody Guest

    > >typedef union
    > >{
    > > char chars[3];
    > > struct
    > > {
    > > int _1 : 6;
    > > int _2 : 6;
    > > int _3 : 6;
    > > int _4 : 6;
    > > } BITS
    > >
    > >} PACKED

    >
    > Even once the syntax errors are fixed, the code simply does not
    > work at all on the SPARC. Try it and see.


    Missing semicolons? Sorry.

    > (You will find that c[0] is never changed. This is a hint as to
    > what is going wrong. :) )


    Why? Structure padding? I don't think the struct BITS will be padding bits
    inserted. added at the end maybe but that is not of interest here.

    --
    cody

    [Freeware, Games and Humor]
    www.deutronium.de.vu || www.deutronium.tk
    cody, Oct 15, 2003
    #13
  14. becte

    CBFalconer Guest

    becte wrote:
    >
    > I need to use three bytes to store four 6-bit integers (4 * 6 = 3 * 8)
    > like this
    > 11111122|22223333|33444444
    > Suppose the input is, int c1, c2, c3, c4, range 0 .. 2^6 -1
    > and the output is int o1,o2,o3, range 0 .. 2^8-1
    > How to do this in a clever way?
    >
    > (The 6 bits integers represent characters in range A-Z and 0-9)


    You can do better. You can store three of those chars into 16
    bits. You should include at least a blank, for 37 chars each
    represented by a value in 0 through 39

    You store it by: i = 40 * (40 * c3 + c2) + c1. You extract by
    dividing by 40 and taking the remainder.

    This is a very old mechanism, and is at the root of 6 char
    filenames in bygone years. Look up Rad50 (the 50 is in octal).

    --
    Chuck F () ()
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net> USE worldnet address!
    CBFalconer, Oct 16, 2003
    #14
  15. becte

    Chris Torek Guest

    [regarding a union of "char" array and bitfields, I wrote:]
    >> Even once the syntax errors are fixed, the code simply does not
    >> work at all on the SPARC. Try it and see.


    Actually, I picked the wrong machine; I now think it *would* work
    (depending on desired output) on the SPARC. Ones on which it would
    likely fail badly are certain Motorola 680x0 implementations.

    In article <bmkjf3$i4f7p$-berlin.de>
    cody <> writes:
    >Missing semicolons? Sorry.


    That, and the names "_1" through "_4" are dubious. (I would have
    to consult the C standards to see whether those names are reserved
    to users in the structure-member namespace. Instead of using names
    whose form requires careful scrutiny of a C standard, why not use
    names that are clearly off-limits to the implementor?)

    >Why? Structure padding? I don't think the struct BITS will be padding bits
    >inserted. added at the end maybe but that is not of interest here.


    For discussion purposes, here is the proposed code, turned into
    compilable C:

    void pack(unsigned char *out, int in1, int in2, int in3, int in4) {
    union {
    unsigned char c[3];
    struct {
    unsigned int one:6, two:6, three:6, four:6;
    } bits;
    } u;
    u.bits.one = in1;
    u.bits.two = in2;
    u.bits.three = in3;
    u.bits.four = in4;

    out[0] = u.c[0];
    out[1] = u.c[1];
    out[2] = u.c[2];
    }

    Now, suppose we have a fairly typical (in today's terms) machine
    with 8-bit "char"s and 4-byte "int"s and 4-byte "storage units".
    If you print out the size of the union, you will find that it
    requires four bytes, not three.

    These four bytes are laid out in memory in the following way,
    as required by the C standards:

    a) when viewed as array-of-char:
    at offset 0: u.c[0] (one byte)
    at offset 1: u.c[1] (one byte)
    at offset 2: u.c[2] (one byte)
    at offset 3: <unnamed> (one byte of padding)
    (remember that arrays *must* be contiguous)

    b) when viewed as unnamed structure with bitfields:
    storage unit at offset 0: <four bytes wide, unspecified layout>

    Part (b) is where the problem occurs. The layout of the bitfields
    within the storage unit is unspecified. Suppose the compiler
    chooses to start at "bit 0" of a 32-bit big-endian word. In this
    case, u.bits.one occupies bits 0 through 5 inclusive of that 32-bit
    word, which live in the unnamed padding byte at offset 3 in the
    array in u.c. u.bits.two occupies bits 6 through 11, which uses
    two more bits in the unnamed byte and then four bits in the array
    element u.c[3]. (At least one Motorola 68020 C compiler handles
    -- or handled -- bitfields this way; I used it in the early Sun
    days.)

    As it happens, SPARC compilers in big-endian mode (which is to say
    most of them) are "encouraged" to allocate bitfields starting at
    bit 31 and working down towards bit 0. This means that u.bits.one
    occupies bits 31 through 26 inclusive, which live in the array
    element u.c[0].

    On Intel x86 compilers, the situation is reversed: the usual
    convention is to allocate bitfields starting at bit 0 and working
    up towards bit 31. As it happens, Intel x86 CPUs are little-endian
    -- so this means that u.bits.one occupies bits 0 through 5 inclusive,
    which are also bits 0 through 5 of u.c[0]. u.bits.two occupies
    bits 6 through 11, which are in bits 6 and 7 of u.c[0] (for the
    two low-order bits of u.bits.two) and then bits 0 through 3 of
    u.c[1].

    Of course, there is nothing but convention to prevent an x86 C
    compiler from allocating bits from 31 down to 0, in which case
    u.c[0] would not map to any of the bits in u.bits. In any case,
    the output on the x86 seems a bit peculiar -- the bits are spread
    over the bytes in an interesting manner:

    #include <stdio.h>

    int main(void) {
    unsigned char buf[3];
    pack(buf, 0x00, 0x3f, 0x00, 0x00);
    printf("result: buf[0..3] = 0x%2.2x%2.2x%2.2x\n",
    buf[0], buf[1], buf[2]);
    return 0;
    }

    produces:

    result: buf[0..3] = 0xc00f00

    Packing the tuple <0x00, 0x03, 0x00, 0x00> results in 0xc00000, so
    we see that, indeed, buf[0] contains as its two *uppermost* bits
    the two *lowermost* bits of the 2nd value.

    (The result on the big-endian SPARC is probably closer to what was
    expected.)
    --
    In-Real-Life: Chris Torek, Wind River Systems
    Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
    email: forget about it http://67.40.109.61/torek/index.html (for the moment)
    Reading email is like searching for food in the garbage, thanks to spammers.
    Chris Torek, Oct 16, 2003
    #15
  16. becte

    Dan Pop Guest

    In <bmkjf3$i4f7p$-berlin.de> "cody" <> writes:

    >> >typedef union
    >> >{
    >> > char chars[3];
    >> > struct
    >> > {
    >> > int _1 : 6;
    >> > int _2 : 6;
    >> > int _3 : 6;
    >> > int _4 : 6;
    >> > } BITS
    >> >
    >> >} PACKED

    >>
    >> Even once the syntax errors are fixed, the code simply does not
    >> work at all on the SPARC. Try it and see.

    >
    >Missing semicolons? Sorry.
    >
    >> (You will find that c[0] is never changed. This is a hint as to
    >> what is going wrong. :) )

    >
    >Why? Structure padding? I don't think the struct BITS will be padding bits
    >inserted. added at the end maybe but that is not of interest here.


    Clue: in what order do bit fields get allocated inside the storage unit?
    What if this storage unit happens to be a 32-bit entity?

    10 An implementation may allocate any addressable storage unit large
    enough to hold a bit-field. If enough space remains,
    a bit-field that immediately follows another bit-field in
    a structure shall be packed into adjacent bits of the same
    unit. If insufficient space remains, whether a bit-field that
    does not fit is put into the next unit or overlaps adjacent
    units is implementation-defined. The order of allocation of
    bit-fields within a unit (high-order to low-order or low-order
    to high-order) is implementation-defined. The alignment of the
    addressable storage unit is unspecified.

    Dan
    --
    Dan Pop
    DESY Zeuthen, RZ group
    Email:
    Dan Pop, Oct 16, 2003
    #16
  17. becte

    CBFalconer Guest

    Chris Torek wrote:
    >
    > [regarding a union of "char" array and bitfields, I wrote:]
    > >> Even once the syntax errors are fixed, the code simply does
    > >> not work at all on the SPARC. Try it and see.

    >

    .... snip ...
    >
    > For discussion purposes, here is the proposed code, turned into
    > compilable C:
    >
    > void pack(unsigned char *out, int in1, int in2, int in3, int in4) {
    > union {
    > unsigned char c[3];
    > struct {
    > unsigned int one:6, two:6, three:6, four:6;
    > } bits;
    > } u;
    > u.bits.one = in1;
    > u.bits.two = in2;
    > u.bits.three = in3;
    > u.bits.four = in4;
    >
    > out[0] = u.c[0];
    > out[1] = u.c[1];
    > out[2] = u.c[2];
    > }

    ... snip about problems etc. ...

    However, for this problem, simply work with values using
    multiplication and division and all the problems and
    non-portabilities go away. See my post elsethread.

    --
    Chuck F () ()
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net> USE worldnet address!
    CBFalconer, Oct 16, 2003
    #17
  18. becte

    becte Guest

    Eric Sosman <> wrote in message news:<>...
    > becte wrote:
    > >
    > > I need to use three bytes to store four 6-bit integers (4 * 6 = 3 * 8)
    > > like this
    > > 11111122|22223333|33444444
    > > Suppose the input is, int c1, c2, c3, c4, range 0 .. 2^6 -1
    > > and the output is int o1,o2,o3, range 0 .. 2^8-1
    > > How to do this in a clever way?

    >
    > "Straightforward" suffices:
    >
    > o1 = (c1 << 2) | (c2 >> 4);
    > o2 = ((c2 & 0xF) << 4) | (c3 >> 2);
    > 03 = ((c3 & 0x3) << 2) | c4;
    >


    I go for the straightforward solution.
    BTW, there is an error in the last line, should be
    o3 = ((c3 & 0x3) << 6) | c4; // shift 6, not 2
    becte, Oct 16, 2003
    #18
  19. becte

    cody Guest

    > Actually, I picked the wrong machine; I now think it *would* work
    > (depending on desired output) on the SPARC. Ones on which it would
    > likely fail badly are certain Motorola 680x0 implementations.


    Why? byte order doesn't matter here.You put 3 bytes in and get 4 bytes out.

    > That, and the names "_1" through "_4" are dubious. (I would have
    > to consult the C standards to see whether those names are reserved
    > to users in the structure-member namespace. Instead of using names
    > whose form requires careful scrutiny of a C standard, why not use
    > names that are clearly off-limits to the implementor?)


    OK, I admit that _1 is a stupid identifier.

    > >Why? Structure padding? I don't think the struct BITS will be padding

    bits
    > >inserted. added at the end maybe but that is not of interest here.

    >
    > For discussion purposes, here is the proposed code, turned into
    > compilable C:
    >
    > void pack(unsigned char *out, int in1, int in2, int in3, int in4) {
    > union {
    > unsigned char c[3];
    > struct {
    > unsigned int one:6, two:6, three:6, four:6;
    > } bits;
    > } u;
    > u.bits.one = in1;
    > u.bits.two = in2;
    > u.bits.three = in3;
    > u.bits.four = in4;
    >
    > out[0] = u.c[0];
    > out[1] = u.c[1];
    > out[2] = u.c[2];
    > }
    >
    > Now, suppose we have a fairly typical (in today's terms) machine
    > with 8-bit "char"s and 4-byte "int"s and 4-byte "storage units".
    > If you print out the size of the union, you will find that it
    > requires four bytes, not three.
    >
    > These four bytes are laid out in memory in the following way,
    > as required by the C standards:
    >
    > a) when viewed as array-of-char:
    > at offset 0: u.c[0] (one byte)
    > at offset 1: u.c[1] (one byte)
    > at offset 2: u.c[2] (one byte)
    > at offset 3: <unnamed> (one byte of padding)
    > (remember that arrays *must* be contiguous)
    >
    > b) when viewed as unnamed structure with bitfields:
    > storage unit at offset 0: <four bytes wide, unspecified layout>
    >
    > Part (b) is where the problem occurs. The layout of the bitfields
    > within the storage unit is unspecified. Suppose the compiler
    > chooses to start at "bit 0" of a 32-bit big-endian word. In this
    > case, u.bits.one occupies bits 0 through 5 inclusive of that 32-bit
    > word, which live in the unnamed padding byte at offset 3 in the
    > array in u.c. u.bits.two occupies bits 6 through 11, which uses
    > two more bits in the unnamed byte and then four bits in the array
    > element u.c[3]. (At least one Motorola 68020 C compiler handles
    > -- or handled -- bitfields this way; I used it in the early Sun
    > days.)
    >
    > As it happens, SPARC compilers in big-endian mode (which is to say
    > most of them) are "encouraged" to allocate bitfields starting at
    > bit 31 and working down towards bit 0. This means that u.bits.one
    > occupies bits 31 through 26 inclusive, which live in the array
    > element u.c[0].
    >
    > On Intel x86 compilers, the situation is reversed: the usual
    > convention is to allocate bitfields starting at bit 0 and working
    > up towards bit 31. As it happens, Intel x86 CPUs are little-endian
    > -- so this means that u.bits.one occupies bits 0 through 5 inclusive,
    > which are also bits 0 through 5 of u.c[0]. u.bits.two occupies
    > bits 6 through 11, which are in bits 6 and 7 of u.c[0] (for the
    > two low-order bits of u.bits.two) and then bits 0 through 3 of
    > u.c[1].
    >
    > Of course, there is nothing but convention to prevent an x86 C
    > compiler from allocating bits from 31 down to 0, in which case
    > u.c[0] would not map to any of the bits in u.bits. In any case,
    > the output on the x86 seems a bit peculiar -- the bits are spread
    > over the bytes in an interesting manner:
    >
    > #include <stdio.h>
    >
    > int main(void) {
    > unsigned char buf[3];
    > pack(buf, 0x00, 0x3f, 0x00, 0x00);
    > printf("result: buf[0..3] = 0x%2.2x%2.2x%2.2x\n",
    > buf[0], buf[1], buf[2]);
    > return 0;
    > }
    >
    > produces:
    >
    > result: buf[0..3] = 0xc00f00
    >
    > Packing the tuple <0x00, 0x03, 0x00, 0x00> results in 0xc00000, so
    > we see that, indeed, buf[0] contains as its two *uppermost* bits
    > the two *lowermost* bits of the 2nd value.
    >
    > (The result on the big-endian SPARC is probably closer to what was
    > expected.)


    Who says that the size of the struct will be 32bit? The standard just says:
    [#9] An implementation may allocate any addressable storage
    unit large enough to hold a bit-field. If enough space
    remains, a bit-field that immediately follows another bit-
    field in a structure shall be packed into adjacent bits of
    the same unit. If insufficient space remains, whether a
    bit-field that does not fit is put into the next unit or
    overlaps adjacent units is implementation-defined. The
    order of allocation of bit-fields within a unit (high-order
    to low-order or low-order to high-order) is implementation-
    defined. The alignment of the addressable storage unit is
    unspecified.Additionally you can adjust the structure padding with
    compiler options.

    union {
    unsigned char c[3*4];
    struct {
    unsigned int bit0:6, bit1:6, bit2:6, bit3:6;
    unsigned int bit4:6, bit5:6, bit6:6, bit7:6;
    unsigned int bit8:6, bit2:9, bit3:10, bit11:6;
    unsigned int bit12:6, bit13:6, bit14:6, bit15:6;
    } bits;
    } u;

    This should solve the problem because the union is now a multiple of 32 bits
    in size so there will be no internal padding and byte order won't matter.

    --
    cody

    [Freeware, Games and Humor]
    www.deutronium.de.vu || www.deutronium.tk
    cody, Oct 16, 2003
    #19
  20. becte

    Chris Torek Guest

    [I wrote, in part:]
    >> Now, suppose we have a fairly typical (in today's terms) machine
    >> with 8-bit "char"s and 4-byte "int"s and 4-byte "storage units".


    In article <bmn2oj$orobt$-berlin.de>
    cody <> wrote:
    > Who says that the size of the struct will be 32bit?


    I did, right in the text you quoted (and I retained) above. And
    it is -- on today's typical implementations. It does not *have*
    to be, but it *is*, right now. (On a few implementations the
    storage-units are 16 bits, and on a few more they are 64 bits, and
    there are no doubt some oddballs with 24 bits and the like out
    there today. But 32 is pretty common at the moment.)

    >The standard just says:
    > [#9] An implementation may allocate any addressable storage
    > unit large enough to hold a bit-field. If enough space
    > remains, a bit-field that immediately follows another bit-
    > field in a structure shall be packed into adjacent bits of
    > the same unit. If insufficient space remains, whether a
    > bit-field that does not fit is put into the next unit or
    > overlaps adjacent units is implementation-defined. The
    > order of allocation of bit-fields within a unit (high-order
    > to low-order or low-order to high-order) is implementation-
    > defined. The alignment of the addressable storage unit is
    > unspecified.


    Yes, this is what it says.

    >Additionally you can adjust the structure padding with compiler options.


    (If they exist.)

    >
    > union {
    > unsigned char c[3*4];
    > struct {
    > unsigned int bit0:6, bit1:6, bit2:6, bit3:6;
    > unsigned int bit4:6, bit5:6, bit6:6, bit7:6;
    > unsigned int bit8:6, bit2:9, bit3:10, bit11:6;
    > unsigned int bit12:6, bit13:6, bit14:6, bit15:6;
    > } bits;
    > } u;
    >
    >This should solve the problem because the union is now a multiple of 32 bits
    >in size so there will be no internal padding and byte order won't matter.


    Will it? (Assume we have fixed the typos above -- bit2:9 and
    bit3:10 should be bit9:6 and bit10:6. I let a compiler find this.)

    "If enough space remains, a bit-field that immediately follows
    another bit-field in a structure shall be packed into adjacent
    bits of the same [storage] unit."

    Suppose storage-units are 32 bits wide. Then 6+6+6+6+6 = 30, so
    that the fields name "bit0" through "bit4" inclusive are packed
    into the first 32-bit storage unit. The field named "bit5" now
    needs another 6 bits, which is greater than the remaining space
    (2 bits), so the Standard's wording continues:

    "If insufficient space remains," -- and this is indeed the
    case -- "whether a bit-field that does not fit is put into
    the next unit or overlaps adjacent units is implementation-
    defined."

    So the implementation must document whether "bit5" is split across
    two storage units, or the two leftover bits are skipped and then
    "bit5" is placed in the next storage-unit.

    Suppose the implementation's documentation says that bitfields are
    *not* so split. In this case, "bit5" through "bit9" inclusive go
    in the 2nd 32-bit storage-unit, "bit10" through "bit14" in the
    third, and "bit15" occupies some implementation-defined portion
    of the last 32-bit storage unit.

    Thus, the "struct ... bits" member will need four 32-bit storage
    units, or 16 bytes (remember that we are still assuming a common
    8-bit-byte 32-bit-int machine). This is indeed a multiple of 32
    bits -- but hardly the desired multiple, as illustrated by the
    following program and its output:

    #include <stdio.h>

    union u {
    unsigned char c[3*4];
    struct s {
    unsigned int bit0:6, bit1:6, bit2:6, bit3:6;
    unsigned int bit4:6, bit5:6, bit6:6, bit7:6;
    unsigned int bit8:6, bit9:6, bit10:6, bit11:6;
    unsigned int bit12:6, bit13:6, bit14:6, bit15:6;
    } bits;
    };

    int main(void) {
    printf("sizeof union u = %lu\n",
    (unsigned long)sizeof(union u));
    printf("sizeof struct s = %lu\n",
    (unsigned long)sizeof(struct s));
    printf("sizeof unsigned char [3*4] = %lu\n",
    (unsigned long)sizeof(unsigned char [3*4]));
    return 0;
    }

    When compiled with "gcc -ansi -pedantic" and run, this prints:

    sizeof union u = 16
    sizeof struct s = 16
    sizeof c[3*4] = 12

    which means that the various bitfields were in fact padded out in
    just the way I suggested above. The array, of course, has no
    padding.

    Ultimately, you *can* do what the original poster asked using unions
    and bitfields -- but the result is inherently implementation
    dependent, because of all those "implementation-defined" items
    above; and it is surprisingly hard to get the right answers.
    (Dennis Ritchie once called C's bitfields "a botch and a blemish",
    and I think this helps illustrate why.) As a general rule, if you
    want to do bitwise computations in C, it all works a lot better if
    you use the bitwise operators instead of bitfield members of
    structures. The results are far easier to predict -- thus easier
    to get right -- and you have much more hope of producing portable,
    or even just "mostly portable", code.
    --
    In-Real-Life: Chris Torek, Wind River Systems
    Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
    email: forget about it http://67.40.109.61/torek/index.html (for the moment)
    Reading email is like searching for food in the garbage, thanks to spammers.
    Chris Torek, Oct 17, 2003
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jason Collins
    Replies:
    3
    Views:
    5,997
    Jason Collins
    Feb 18, 2004
  2. mrby

    4-bytes or 8-bytes alignment?

    mrby, Nov 2, 2004, in forum: C Programming
    Replies:
    8
    Views:
    408
    Mark McIntyre
    Nov 2, 2004
  3. Replies:
    5
    Views:
    520
    Flash Gordon
    Apr 9, 2006
  4. Yandos
    Replies:
    12
    Views:
    5,101
    Pete Becker
    Sep 15, 2005
  5. Replies:
    8
    Views:
    486
    Bob Hairgrove
    Apr 10, 2006
Loading...

Share This Page