byte alignment in structures and unions

Discussion in 'C Programming' started by anon.asdf@gmail.com, Aug 9, 2007.

  1. Guest

    Hi!

    I want to assign the number 129 (binary 10000001) to the MSB (most
    significant byte) of a 4-byte long and leave the other lower bytes in-
    tact!

    -working on normal pentium (...endian)

    I want to do it with code that does NOT use shifts (<<) , bit-
    operations (| &) !!
    So the compiler will have to do the work and I'll introduce
    appropriate structs and unions.


    #include <stdio.h>

    struct each_of_four {
    unsigned char byte0;
    unsigned char byte1;
    unsigned char byte2;
    unsigned char byte3;
    }
    /*__attribute__ ((packed))*/
    ;

    union align_long_and_each_of_four {
    long dummy; /* 4 bytes */
    struct each_of_four four;
    }
    /*__attribute__ ((packed))*/
    ;


    int main(void)
    {
    long val; // 4 bytes

    /****************** TEST A: COMPILER ERROR - WHY?
    *********************/
    ((union align_long_and_each_of_four) val).four.byte3 = (unsigned
    char) 129;






    #define FUNNY_NUMBER ((union align_long_and_each_of_four) \
    (const long) ((129<<24) | (val & 16777215))).four.byte3
    // 16777215 = 2^24-1

    printf("test FUNNY_NUMBER: %d\n", FUNNY_NUMBER);

    /****************** TEST B: COMPILER ERROR - WHY?
    *********************/
    ((union align_long_and_each_of_four) val).four.byte3 = FUNNY_NUMBER;

    return 0;
    }


    Compiler error report--->
    test_align.c:25: error: invalid lvalue in assignment
    test_align.c:39: error: invalid lvalue in assignment



    How can this be fixed??

    Thanks
    anon.asdf
     
    , Aug 9, 2007
    #1
    1. Advertising

  2. Guest

    > /****************** TEST A: COMPILER ERROR - WHY?
    > *********************/
    > ((union align_long_and_each_of_four) val).four.byte3 = (unsigned
    > char) 129;



    The really interesting here, is that the following code DOES work!

    {
    union align_long_and_each_of_four tmp;

    tmp.four.byte3 = (unsigned char) 129;
    }

    But still - how can the compiler error in TEST A be fixed??
     
    , Aug 9, 2007
    #2
    1. Advertising

  3. Eric Sosman Guest

    wrote On 08/09/07 13:38,:
    > Hi!
    >
    > I want to assign the number 129 (binary 10000001) to the MSB (most
    > significant byte) of a 4-byte long and leave the other lower bytes in-
    > tact!
    >
    > -working on normal pentium (...endian)
    >
    > I want to do it with code that does NOT use shifts (<<) , bit-
    > operations (| &) !!
    > So the compiler will have to do the work and I'll introduce
    > appropriate structs and unions.
    >
    >
    > #include <stdio.h>
    >
    > struct each_of_four {
    > unsigned char byte0;
    > unsigned char byte1;
    > unsigned char byte2;
    > unsigned char byte3;
    > }
    > /*__attribute__ ((packed))*/
    > ;
    >
    > union align_long_and_each_of_four {
    > long dummy; /* 4 bytes */
    > struct each_of_four four;
    > }
    > /*__attribute__ ((packed))*/
    > ;
    >
    >
    > int main(void)
    > {
    > long val; // 4 bytes
    >
    > /****************** TEST A: COMPILER ERROR - WHY?
    > *********************/
    > ((union align_long_and_each_of_four) val).four.byte3 = (unsigned
    > char) 129;
    >
    >
    >
    >
    >
    >
    > #define FUNNY_NUMBER ((union align_long_and_each_of_four) \
    > (const long) ((129<<24) | (val & 16777215))).four.byte3
    > // 16777215 = 2^24-1
    >
    > printf("test FUNNY_NUMBER: %d\n", FUNNY_NUMBER);
    >
    > /****************** TEST B: COMPILER ERROR - WHY?
    > *********************/
    > ((union align_long_and_each_of_four) val).four.byte3 = FUNNY_NUMBER;


    Because you cannot cast to or from a union (or struct)
    type: They are not "scalar types" (6.5.4p2). Keep in mind
    that a cast is an operator that converts a value, not a
    magical "let's pretend" construct. And in any case, the
    value produced by a cast operator has the same status as
    a value produced by (for example) a unary minus operator:
    You cannot write `-x = 42', either.

    > return 0;
    > }
    >
    >
    > Compiler error report--->
    > test_align.c:25: error: invalid lvalue in assignment
    > test_align.c:39: error: invalid lvalue in assignment
    >
    >
    >
    > How can this be fixed??


    One way is

    ((unsigned char*)&val)[3] = 129;

    Of course, this fails miserably if `val' is not four bytes
    long with the MSB in the fourth position. A better way is

    val = (val & 0xffffffUL) | (129UL << 24);

    (Yes, I know you said you didn't want to use shifts or
    bitwise operators. Tough: It's a better way anyhow.)

    A final thought: *Every* solution has the problem that
    it makes non-portable assumptions about what happens to
    the value of `val' when you reach in and hammer one of its
    bytes. When you do so, you have left the guarantees of the
    C language behind, and will need to make your way in
    uncharted territory without their protection. Things would
    be somewhat better with `unsigned long', but ...

    --
     
    Eric Sosman, Aug 9, 2007
    #3
  4. Army1987 Guest

    On Thu, 09 Aug 2007 17:38:22 +0000, anon.asdf wrote:

    > Hi!
    >
    > I want to assign the number 129 (binary 10000001) to the MSB (most
    > significant byte) of a 4-byte long and leave the other lower bytes in-
    > tact!
    >
    > -working on normal pentium (...endian)
    >
    > I want to do it with code that does NOT use shifts (<<) , bit-
    > operations (| &) !!

    l %= 0x01000000;
    l += 129 * 0x01000000;
    This works regardless of endianness.
    > So the compiler will have to do the work and I'll introduce
    > appropriate structs and unions.
    > #include <stdio.h>
    >
    > struct each_of_four {
    > unsigned char byte0;
    > unsigned char byte1;
    > unsigned char byte2;
    > unsigned char byte3;
    > }
    > /*__attribute__ ((packed))*/

    What was wrong with unsigned char bytes[4], which causes the same
    thing that the stuff you commented out would do somewhere, but in
    standard C?
    > ;


    > union align_long_and_each_of_four {
    > long dummy; /* 4 bytes */
    > struct each_of_four four;
    > }
    > /*__attribute__ ((packed))*/
    > ;

    What was wrong with { long dummy; unsigned char four[4]; }?

    > int main(void)
    > {
    > long val; // 4 bytes
    >
    > /****************** TEST A: COMPILER ERROR - WHY?
    > *********************/
    > ((union align_long_and_each_of_four) val).four.byte3 = (unsigned
    > char) 129;

    Because the result of a cast isn't a lvalue.
    Try
    ((unsigned char *)&val)[3] = 129;
    No unions and no struct needed.

    > #define FUNNY_NUMBER ((union align_long_and_each_of_four) \
    > (const long) ((129<<24) | (val & 16777215))).four.byte3

    Didn't you say you didn't want to use bitwise operations?
    --
    Army1987 (Replace "NOSPAM" with "email")
    No-one ever won a game by resigning. -- S. Tartakower
     
    Army1987, Aug 9, 2007
    #4
  5. Army1987 Guest

    On Thu, 09 Aug 2007 17:49:01 +0000, anon.asdf wrote:

    >> /****************** TEST A: COMPILER ERROR - WHY?
    >> *********************/
    >> ((union align_long_and_each_of_four) val).four.byte3 = (unsigned
    >> char) 129;

    [snip]
    > But still - how can the compiler error in TEST A be fixed??

    If you *really* want to do that, try
    (union align_long_and_each_of_four *)val->four.byte3 = 129;
    But there are better ways to do that, see my other reply.
    --
    Army1987 (Replace "NOSPAM" with "email")
    No-one ever won a game by resigning. -- S. Tartakower
     
    Army1987, Aug 9, 2007
    #5
  6. Guest

    On Aug 9, 8:40 pm, Eric Sosman <> wrote:
    > One way is
    >
    > ((unsigned char*)&val)[3] = 129;


    Thank you for the insights!

    ((unsigned char*)&val)[3] = 129;
    is elegant.

    I wonder if the compiler resolves it (above) to the same shifts as
    val = (val & 0xffffffUL) | (129UL << 24);
    or utilizes some tighter optimization, if the architecture allows it.
    ??


    > Things would
    > be somewhat better with `unsigned long', but ...


    How does `unsigned long' change the situation?

    -anon.asdf
     
    , Aug 9, 2007
    #6
  7. Guest

    On Aug 9, 9:07 pm, Army1987 <> wrote:

    > l %= 0x01000000;
    > l += 129 * 0x01000000;
    > This works regardless of endianness.



    Unfortunately this does not work! Try

    {
    long val = (129<<24) + 1;
    val %= 0x01000000;
    val += 129 * 0x01000000;
    printf("%ld\n", val); // get -2147483647, but should be -2130706431
    }


    > What was wrong with { long dummy; unsigned char four[4]; }?


    Nothing. I could use:
    ((unsigned char *)&dummy)[3] = four[3];



    > Try
    > ((unsigned char *)&val)[3] = 129;
    > No unions and no struct needed.


    Yes - that's perfect!

    > > #define FUNNY_NUMBER ((union align_long_and_each_of_four) \
    > > (const long) ((129<<24) | (val & 16777215))).four.byte3

    >
    > Didn't you say you didn't want to use bitwise operations?


    True.
    But I'm hoping the compiler will resolve it to a constant, so the
    shifts are only in the c-code, but not in the machine code.

    Thanks for your comments!

    -anon.asdf
     
    , Aug 9, 2007
    #7
  8. Guest

    On Aug 9, 9:10 pm, Army1987 <> wrote:
    > If you *really* want to do that, try
    > (union align_long_and_each_of_four *)val->four.byte3 = 129;


    Thanks! thats good - what I was looking for!
    - but you forgot the & and parenthesis:

    ((union align_long_and_each_of_four *)&val)->four.byte3 = 129;

    Regards,
    anon.asdf
     
    , Aug 9, 2007
    #8
  9. Guest

    > > > #define FUNNY_NUMBER ((union align_long_and_each_of_four) \
    > > > (const long) ((129<<24) | (val & 16777215))).four.byte3

    >
    > > Didn't you say you didn't want to use bitwise operations?

    >
    > True.
    > But I'm hoping the compiler will resolve it to a constant, so the
    > shifts are only in the c-code, but not in the machine code.
    >
    > Thanks for your comments!
    >
    > -anon.asdf


    My comment here is incorrect! It can never be a constant, since it
    includes the variable val .
    -anon.asdf
     
    , Aug 9, 2007
    #9
  10. Chris Torek Guest

    In article <>
    <> wrote:
    >... ((unsigned char*)&val)[3] = 129; is elegant.


    Elegant, but not terribly portable, and on some machines, a lot
    slower than the shift-and-mask method:

    >I wonder if the compiler resolves it (above) to the same shifts as
    >val = (val & 0xffffffUL) | (129UL << 24);
    >or utilizes some tighter optimization, if the architecture allows it.


    This depends on the architecture *and* the optimizer.

    Taking the address of variables defeats some optimizers entirely.
    In such cases, the compiler may "throw up its hands in defeat" as
    it were, and compile code like:

    store reg, mem | put "val" into RAM so it can be modified piece-wise
    movi #129, t0 | tempreg = constant
    store tmp, mem+3 | set mem[3]
    load reg, mem | pull "val" back out of RAM

    which, on register-oriented machines where RAM is slow compared to
    the CPU, may take a dozen or more clock cycles. (Clever caches may
    manage to shrink this to just 3 clock cycles in the best case: one
    for the first store, one for the second store done "in parallel" with
    the move-immediate, and one for the load.)

    The shift-and-mask version might instead compile to:

    movih #0xff00, t0 | tempreg = 0xff00 << 16
    andn reg, t0, reg | val &= ~tempreg
    movih #0x8100, t0 | tempreg = 0x8100 << 16 (ie 129UL << 24)
    or reg, t0, reg

    which, although it is still four instructions, executes in two
    clock cycles (two instructions per clock), regardless of cache
    activity and RAM and so on.

    Other optimizers are a bit (or even a lot) more clever, and can
    indeed turn the one sequence into the other.

    The main disadvantage to the "access individual bytes of variable"
    method is that it not only depends on the size of bytes -- which
    tends to be exactly 8 bits across a wide variety of machines today,
    so that you are relatively safe there -- but also on the "endian-ness"
    of the CPU, which tends to vary. The shift-and-mask version,
    although it is more verbose in source form, is a lot easier for
    most optimizers.
    --
    In-Real-Life: Chris Torek, Wind River Systems
    Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
    email: forget about it http://web.torek.net/torek/index.html
    Reading email is like searching for food in the garbage, thanks to spammers.
     
    Chris Torek, Aug 9, 2007
    #10
  11. Eric Sosman Guest

    wrote On 08/09/07 15:49,:
    > On Aug 9, 8:40 pm, Eric Sosman <> wrote:
    >
    >> One way is
    >>
    >> ((unsigned char*)&val)[3] = 129;

    >
    >
    > Thank you for the insights!
    >
    > ((unsigned char*)&val)[3] = 129;
    > is elegant.
    >
    > I wonder if the compiler resolves it (above) to the same shifts as
    > val = (val & 0xffffffUL) | (129UL << 24);
    > or utilizes some tighter optimization, if the architecture allows it.
    > ??


    Which of the very many C compilers is "the" compiler
    you have in mind?

    (No, don't answer: It's a rhetorical question, intended
    to make you think.)


    >>Things would
    >>be somewhat better with `unsigned long', but ...

    >
    >
    > How does `unsigned long' change the situation?


    It doesn't -- for people who already "know" that a
    long has four eight-bit bytes arranged in Little-Endian
    order and using two's complement representation with no
    padding bits and no traps. Such people already "know"
    a good deal more than the C language guarantees.

    It's probably fairly safe to ignore the possibility
    of padding bits, trap representations, and formats other
    than two's complement; such things are definitely out of
    fashion these days and you're unlikely to encounter them.
    (But it *is* a fashion-driven industry; things that were
    once chic may become so again ...) Even so, there are
    plenty of machines whose longs use eight eight-bit bytes,
    plenty of machines that arrange their longs (of whatever
    length) in Big-Endian order, and even some machines that
    use 32-bit bytes. C can run on all of these -- but your
    program will not run on them if you use too much of what
    you "know."

    Sometimes it is necessary to make use of system-specific
    knowledge in order to do something that portable C cannot
    do or cannot do well. But those occasions are far rarer
    than many people seem to suppose; there is usually a way
    to get it done (for many, many values of "it") without
    resorting to trickery. The only reason you have given for
    using trickery is "I want to do it" this way -- I don't
    find that a compelling reason.

    --
     
    Eric Sosman, Aug 9, 2007
    #11
  12. pete Guest

    Army1987 wrote:
    >
    > On Thu, 09 Aug 2007 17:38:22 +0000, anon.asdf wrote:
    >
    > > Hi!
    > >
    > > I want to assign the number 129 (binary 10000001)
    > > to the MSB (most significant byte) of a 4-byte long
    > > and leave the other lower bytes in-tact!


    > Try
    > ((unsigned char *)&val)[3] = 129;
    > No unions and no struct needed.


    You can assign the value of 129
    to the highest addressed byte of any object,
    this way:

    ((unsigned char *)&val)[sizeof val - 1] = 129;

    --
    pete
     
    pete, Aug 10, 2007
    #12
  13. Army1987 Guest

    On Thu, 09 Aug 2007 21:07:47 +0200, Army1987 wrote:

    > On Thu, 09 Aug 2007 17:38:22 +0000, anon.asdf wrote:
    >
    >> Hi!
    >>
    >> I want to assign the number 129 (binary 10000001) to the MSB (most
    >> significant byte) of a 4-byte long and leave the other lower bytes in-
    >> tact!

    That'd be the representation of a negative integer.

    > l %= 0x01000000;
    > l += 129 * 0x01000000;
    > This works regardless of endianness.

    It would work if l were unsigned long, and either 129 * 0x01000000
    fitted in a signed long, or I wrote l += 129U * 0x01000000; (or
    l += 0x81000000; of course).
    --
    Army1987 (Replace "NOSPAM" with "email")
    No-one ever won a game by resigning. -- S. Tartakower
     
    Army1987, Aug 10, 2007
    #13
  14. Army1987 Guest

    On Thu, 09 Aug 2007 13:06:50 -0700, anon.asdf wrote:

    > On Aug 9, 9:10 pm, Army1987 <> wrote:
    >> If you *really* want to do that, try
    >> (union align_long_and_each_of_four *)val->four.byte3 = 129;

    >
    > Thanks! thats good - what I was looking for!
    > - but you forgot the & and parenthesis:
    >
    > ((union align_long_and_each_of_four *)&val)->four.byte3 = 129;

    Yeah...
    --
    Army1987 (Replace "NOSPAM" with "email")
    No-one ever won a game by resigning. -- S. Tartakower
     
    Army1987, Aug 10, 2007
    #14
  15. In article <>,
    pete <> wrote:
    >Army1987 wrote:


    >> On Thu, 09 Aug 2007 17:38:22 +0000, anon.asdf wrote:


    >> > I want to assign the number 129 (binary 10000001)
    >> > to the MSB (most significant byte) of a 4-byte long
    >> > and leave the other lower bytes in-tact!


    >> Try
    >> ((unsigned char *)&val)[3] = 129;
    >> No unions and no struct needed.


    >You can assign the value of 129
    >to the highest addressed byte of any object,
    >this way:


    > ((unsigned char *)&val)[sizeof val - 1] = 129;


    Yes, that should indeed assign into the highest addressed byte.
    Unfortunately the highest addressed byte might not be the MSB
    (most significant byte). On big-endian machines, it would
    often be the lowest addressed byte that is the MSB.
    --
    There are some ideas so wrong that only a very intelligent person
    could believe in them. -- George Orwell
     
    Walter Roberson, Aug 10, 2007
    #15
  16. pete <> writes:
    > Army1987 wrote:
    >> On Thu, 09 Aug 2007 17:38:22 +0000, anon.asdf wrote:
    >> > I want to assign the number 129 (binary 10000001)
    >> > to the MSB (most significant byte) of a 4-byte long
    >> > and leave the other lower bytes in-tact!

    >
    >> Try
    >> ((unsigned char *)&val)[3] = 129;
    >> No unions and no struct needed.

    >
    > You can assign the value of 129
    > to the highest addressed byte of any object,
    > this way:
    >
    > ((unsigned char *)&val)[sizeof val - 1] = 129;


    But the question was how to assign a value to the most significant
    byte, not the highest addressed byte.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Aug 10, 2007
    #16
  17. CBFalconer Guest

    pete wrote:
    > Army1987 wrote:
    >> anon.asdf wrote:
    >>>
    >>> I want to assign the number 129 (binary 10000001)
    >>> to the MSB (most significant byte) of a 4-byte long
    >>> and leave the other lower bytes in-tact!

    >
    >> Try
    >> ((unsigned char *)&val)[3] = 129;
    >> No unions and no struct needed.

    >
    > You can assign the value of 129
    > to the highest addressed byte of any object,
    > this way:
    >
    > ((unsigned char *)&val)[sizeof val - 1] = 129;


    Seems to give funny results on my machine. (Which happens to place
    the MSByte of an integer in the lowest order address). The world
    is not defined by an X86.

    --
    Chuck F (cbfalconer at maineline dot net)
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net>



    --
    Posted via a free Usenet account from http://www.teranews.com
     
    CBFalconer, Aug 10, 2007
    #17
  18. In article <>,
    Army1987 <> wrote:

    >> On Thu, 09 Aug 2007 17:38:22 +0000, anon.asdf wrote:


    >>> I want to assign the number 129 (binary 10000001) to the MSB (most
    >>> significant byte) of a 4-byte long and leave the other lower bytes in-
    >>> tact!


    >That'd be the representation of a negative integer.


    Not necessarily.

    A) We don't know how big a byte is on the target machine. It
    might not be overflow.
    B) If you are working in signed mode on an 8 bit byte,
    then it is overflow and so not defined;
    C) If you are working unsigned, it is not overflow, but if the
    machine is a seperated-sign machine, the correspondance between
    sign bit and arithmetic values is unspecified (but other
    representation constraints pretty much imply the seperated-sign
    would have to be the most significant bit.)
    --
    Programming is what happens while you're busy making other plans.
     
    Walter Roberson, Aug 10, 2007
    #18
  19. Guest

    On Aug 10, 2:01 am, Army1987 <> wrote:
    > On Thu, 09 Aug 2007 21:07:47 +0200, Army1987 wrote:
    > > On Thu, 09 Aug 2007 17:38:22 +0000, anon.asdf wrote:

    >
    > >> Hi!

    >
    > >> I want to assign the number 129 (binary 10000001) to the MSB (most
    > >> significant byte) of a 4-byte long and leave the other lower bytes in-
    > >> tact!

    >
    > That'd be the representation of a negative integer.
    >



    {
    long val = (129<<24) + 1;
    val %= 0x01000000;
    val += 129 * 0x01000000;
    printf("%ld\n", val); // get -2147483647, but should be -2130706431

    }

    referring to the above
    > It would work if val were unsigned long, and either 129 * 0x01000000
    > fitted in a signed long, or I wrote val += 129U * 0x01000000; (or
    > val += 0x81000000; of course).


    It can also work if val is signed long - as follows:
    {
    long val /* = 0 */;
    val %= 0x01000000U;
    val += 129U * 0x01000000;
    printf("%ld\n", val);
    }

    -anon.asdf
     
    , Aug 10, 2007
    #19
  20. Army1987 Guest

    On Fri, 10 Aug 2007 02:13:09 +0000, Walter Roberson wrote:

    > In article <>,
    > Army1987 <> wrote:
    >
    >>> On Thu, 09 Aug 2007 17:38:22 +0000, anon.asdf wrote:

    >
    >>>> I want to assign the number 129 (binary 10000001) to the MSB (most
    >>>> significant byte) of a 4-byte long and leave the other lower bytes in-
    >>>> tact!

    >
    >>That'd be the representation of a negative integer.

    >
    > Not necessarily.
    >
    > A) We don't know how big a byte is on the target machine. It
    > might not be overflow.

    Speak for yourself. I do know how big a byte is on the OP's
    machine.
    > B) If you are working in signed mode on an 8 bit byte,
    > then it is overflow and so not defined;

    Well, do you think anybody will speak of single bytes in a larger
    object in terms of a signed char?

    > C) If you are working unsigned, it is not overflow, but if the
    > machine is a seperated-sign machine, the correspondance between
    > sign bit and arithmetic values is unspecified (but other
    > representation constraints pretty much imply the seperated-sign
    > would have to be the most significant bit.)

    When the sign bit is set, the value is negative (provided it isn't
    a trap), period. This is true in any of the three allowed
    representations. And I happen to know that the OP has two's
    complement and no trap representation.
    --
    Army1987 (Replace "NOSPAM" with "email")
    No-one ever won a game by resigning. -- S. Tartakower
     
    Army1987, Aug 10, 2007
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Neil Zanella
    Replies:
    9
    Views:
    425
    Jeffrey D. Smith
    Oct 16, 2003
  2. Replies:
    1
    Views:
    404
    Lawrence Kirby
    Jul 6, 2005
  3. Alfonso Morra
    Replies:
    11
    Views:
    740
    Emmanuel Delahaye
    Sep 24, 2005
  4. Jason Curl

    Unions and structures implementation in C

    Jason Curl, Oct 12, 2005, in forum: C Programming
    Replies:
    4
    Views:
    813
    Default User
    Oct 12, 2005
  5. Ravikiran

    Difference between Unions and Structures...

    Ravikiran, Nov 3, 2008, in forum: C Programming
    Replies:
    7
    Views:
    631
    CBFalconer
    Nov 3, 2008
Loading...

Share This Page