Little to big endian conversion

Discussion in 'C Programming' started by Perception, Dec 14, 2003.

  1. Perception

    Perception Guest

    Hello all,

    If I have a C-like data structure such that

    struct Data {
    int a; //16-bit value
    char[3]; //3 ASCII characters
    int b; //32-bit value
    int c; //24-bit value
    }

    then assuming I were to store this on a 32 bit wide byte addressable memory,
    then, say, if a= 0A 0B, b=43 44 45, c= 80 00 00 44 and d = 123 (all in
    hex), then would I be correct in saying that in a big endian architecture it
    would be stored like the following:

    Address 0: 0A 0B 43 44
    Address 1: 45 80 00 00
    Address 2: 44 00 01 23

    and in a little endian:

    Address 0: 0B 0A 43 44
    Address 1: 45 44 00 00
    Address 2: 80 23 01 00

    ??

    I assume this is correct but would appreciate a check nonetheless.

    Finally, if I were to COPY the contents of the little endian memory onto a
    big endian memory am I correct in thinking that it would look no different
    from the little endian memory in BOTH the byte-by-byte transfer or the
    word-by-word transfer since we are sending and receiving in order of
    ascending addresses, and therefore bytes or words will be sent and received
    from the lowest to the highest address and we will merely end up with a
    duplicate of the little endian ordering in the big endian memory? i.e.

    Address 0: 0B 0A 43 44
    Address 1: 45 44 00 00
    Address 2: 80 23 01 00
    (again)

    or is this not the case at all? If it is, then surely we have to swap bytes
    to resolve the problem?

    Thanks in anticipation!
     
    Perception, Dec 14, 2003
    #1
    1. Advertising

  2. Perception

    Sidney Cadot Guest

    Perception wrote:

    > If I have a C-like data structure such that
    >
    > struct Data {
    > int a; //16-bit value
    > char[3]; //3 ASCII characters
    > int b; //32-bit value
    > int c; //24-bit value
    > }


    Please be advised that an "int" cannot be expected to represent more
    than 16 bit values, portably.

    > then assuming I were to store this on a 32 bit wide byte addressable memory,
    > then, say, if a= 0A 0B, b=43 44 45, c= 80 00 00 44 and d = 123 (all in
    > hex),


    In the following, I am assuming that you have specified these sample
    numbers in "big endian" order (as is customary among humanoids on this
    particular planet). The value of a, given as a plain decimal value,
    would be 2571. Please verify that this assumption is correct.

    > then would I be correct in saying that in a big endian architecture it
    > would be stored like the following:
    >
    > Address 0: 0A 0B 43 44
    > Address 1: 45 80 00 00
    > Address 2: 44 00 01 23
    >
    > and in a little endian:
    >
    > Address 0: 0B 0A 43 44
    > Address 1: 45 44 00 00
    > Address 2: 80 23 01 00
    >
    > ??


    Yes. Big endian means: the most significant byte takes the lowest
    address ('comes first'); little-endian is the other way round. This is
    for multi-byte values. The array-of-chars need no such treatment.

    > I assume this is correct but would appreciate a check nonetheless.
    >
    > Finally, if I were to COPY the contents of the little endian memory onto a
    > big endian memory am I correct in thinking that it would look no different
    > from the little endian memory in BOTH the byte-by-byte transfer or the
    > word-by-word transfer since we are sending and receiving in order of
    > ascending addresses, and therefore bytes or words will be sent and received
    > from the lowest to the highest address and we will merely end up with a
    > duplicate of the little endian ordering in the big endian memory? i.e.
    >
    > Address 0: 0B 0A 43 44
    > Address 1: 45 44 00 00
    > Address 2: 80 23 01 00
    > (again)
    >
    > or is this not the case at all? If it is, then surely we have to swap bytes
    > to resolve the problem?


    It is not entirely clear what you mean here, to me at least. What
    mechanism do you use to send/receive? If it's an "endianness-aware"
    mechanism (i.e., a library that promises to handle this) you can just
    send the items individually and they will be properly unpacked at the
    other side.

    In the (more probable) scenario that you're copying a bunch of bytes,
    you will have to do endianness-swapping on the relevant items by yourself.

    Furthermore, be careful if sending the "struct" as a single entity (and
    plan to do the byte-swapping at the receiving end, for example).
    Compilers are free to insert "padding" between struct fields (and at the
    end) to make access to the fields more suited to the underlying hardware
    (and they will do so). That is, unless you can instruct the compilers at
    both ends to treat the struct as a "packed" struct; the latter is not
    portable, but possible with most compilers.

    If you can give some more information on your problem that gives rise to
    this, I'm sure that I (or others) could offer some more help.

    Best regards,

    Sidney
     
    Sidney Cadot, Dec 14, 2003
    #2
    1. Advertising

  3. Perception

    Perception Guest

    "Sidney Cadot" <> wrote in message
    news:brgpfd$ikl$...
    > Perception wrote:
    >
    > > If I have a C-like data structure such that
    > >
    > > struct Data {
    > > int a; //16-bit value
    > > char[3]; //3 ASCII characters
    > > int b; //32-bit value
    > > int c; //24-bit value
    > > }

    >
    > Please be advised that an "int" cannot be expected to represent more
    > than 16 bit values, portably.


    You are quite right. The data type int is merely for illustration sake.

    > > then assuming I were to store this on a 32 bit wide byte addressable

    memory,
    > > then, say, if a= 0A 0B, b=43 44 45, c= 80 00 00 44 and d = 123 (all in
    > > hex),

    >
    > In the following, I am assuming that you have specified these sample
    > numbers in "big endian" order (as is customary among humanoids on this
    > particular planet). The value of a, given as a plain decimal value,
    > would be 2571. Please verify that this assumption is correct.


    Yes.

    > It is not entirely clear what you mean here, to me at least. What
    > mechanism do you use to send/receive? If it's an "endianness-aware"
    > mechanism (i.e., a library that promises to handle this) you can just
    > send the items individually and they will be properly unpacked at the
    > other side.
    >
    > In the (more probable) scenario that you're copying a bunch of bytes,
    > you will have to do endianness-swapping on the relevant items by yourself.


    I apologise if I was unclear. What I wanted to know was if I were to copy
    byte by byte (in the more probable scenario you have described) from the
    little endian architecture to the big endian architecture would I end up
    with the same ordering as in the big endian and therefore need to swap
    bytes? In other words, if my little endian architecture stores data in the
    following form:

    Address 0: 0B 0A 43 44
    Address 1: 45 44 00 00
    Address 2: 80 23 01 00

    and I were to copy this byte by byte to a big endian architecture is this
    what I would end up with:

    Address 0: 0B 0A 43 44
    Address 1: 45 44 00 00
    Address 2: 80 23 01 00

    ??

    or would it look completely different?

    What if instead of copying byte by byte I were to copy word by word (i.e. an
    entire address at the time) would I still end up with the same result?

    Thanks again.
     
    Perception, Dec 14, 2003
    #3
  4. EventHelix.com, Dec 14, 2003
    #4
  5. EventHelix.com wrote:

    > The following article should address your concerns about
    > little to big endian conversion:
    >
    > http://www.eventhelix.com/RealtimeMantra/ByteAlignmentAndOrdering.htm


    I'm rather concerned about the accuracy of the information on that page. For
    example, at one point it says:

    "Thus it is a good practice to insert pad bytes explicitly in all
    C-structures that are shared in a interface between machines differing in
    either the compiler and/or microprocessor."

    I strongly disagree. This is very /bad/ practice, since it obfuscates your
    code, and yet gains you nothing whatsoever (as far as I can make out).

    --
    Richard Heathfield :
    "Usenet is a strange place." - Dennis M Ritchie, 29 July 1999.
    C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
    K&R answers, C books, etc: http://users.powernet.co.uk/eton
     
    Richard Heathfield, Dec 15, 2003
    #5
  6. Perception

    Perception Guest

    "EventHelix.com" <> wrote in message
    news:...
    > The following article should address your concerns about
    > little to big endian conversion:
    >
    > http://www.eventhelix.com/RealtimeMantra/ByteAlignmentAndOrdering.htm
    >
    > Sandeep
    > --
    > http://www.EventHelix.com/EventStudio
    > EventStudio 2.0 - Go beyond UML Sequence Diagrams and Use Case Diagrams


    Still doesn't tell me what I would get if I were to copy a little endian
    structure to a big endian memory in the a) byte by byte and b) word by word
    case WITHOUT byte swapping or any fancy tricks like that. i.e. I'd like to
    know what exactly the problem is that REQUIRES this byte swapping technique
    when copying/converting over between the two

    Am I still too vague?

    Say we had the following multi-byte items stored on a 32 bit wide little
    endian memory (where AB CD is one 2 byte item i.e. AB and CD are one byte
    each... same story with EF GH).

    Address 1: AB CD EF GH

    Now if I were to COPY this over to a big endian memory byte by byte or word
    by word what would I get? Note I KNOW if I were to store this in a big
    endian memory I would get the opposite order assuming GH is the most
    significant byte... but what I really want to know is what I would get if I
    were to COPY this over to a big-endian (not store it directly) which IN TURN
    requires this byte swapping that everyone keeps referring to in order to fix
    what I would get! And would it be different copying word by word to copying
    byte by byte?

    Somebody please explain!
     
    Perception, Dec 15, 2003
    #6
  7. Perception wrote:

    > Still doesn't tell me what I would get if I were to copy a little endian
    > structure to a big endian memory in the a) byte by byte and b) word by word
    > case WITHOUT byte swapping or any fancy tricks like that. i.e. I'd like to
    > know what exactly the problem is that REQUIRES this byte swapping technique
    > when copying/converting over between the two
    >
    > Am I still too vague?
    >
    > Say we had the following multi-byte items stored on a 32 bit wide little
    > endian memory (where AB CD is one 2 byte item i.e. AB and CD are one byte
    > each... same story with EF GH).
    >
    > Address 1: AB CD EF GH
    >
    > Now if I were to COPY this over to a big endian memory byte by byte or word
    > by word what would I get? Note I KNOW if I were to store this in a big
    > endian memory I would get the opposite order assuming GH is the most
    > significant byte... but what I really want to know is what I would get if I
    > were to COPY this over to a big-endian (not store it directly) which IN TURN
    > requires this byte swapping that everyone keeps referring to in order to fix
    > what I would get! And would it be different copying word by word to copying
    > byte by byte?
    >
    > Somebody please explain!

    I'll give it my best shot, since we commonly have Endian wars (bugs
    based on Endian differences).

    Given a value, 0x1000, which is represented by a 32-bit quantity.
    Big Endian would order it as (2 digits per byte):
    00 00 10 00
    Little Endian:
    00 10 00 00

    If the Big Endian machine were to interpret the Little Endian value,
    it would be: 0x100000, which is not the original number. This is
    more extreme with smaller values:
    Big Endian Little Endian
    00 00 00 36 36 00 00 00

    Memory is memory as memory is memory. Memory is neither Big Endian nor
    Little Endian. The processor controls how multi-byte data is stored
    into memory. Most processors just store to and fetch from memory.
    No direct translations.

    In many systems, ordered data is placed directly into memory by
    either the main processor or an auxilary processor (like DMA or
    UART). If the data is ordered in Big Endian, but the processor
    is Little Endian, then the processor will interpret the data
    incorrectly when it performs multi-byte fetches. Take one of
    the cases above, and download the data as Little Endian. Add
    the values to each other as Big Endian quantities.

    "In the industry", the tactic is to convert the multibyte items
    after they are input. The processor manipulates the data
    according to its native Endianess. Before the data is output,
    it is converted to the appropriate Endianess.

    There could exist integrated circuits that perform Endianess
    conversion, but I haven't seen any. Most of the time the
    conversion responsibility lies with the software.

    --
    Thomas Matthews

    C++ newsgroup welcome message:
    http://www.slack.net/~shiva/welcome.txt
    C++ Faq: http://www.parashift.com/c -faq-lite
    C Faq: http://www.eskimo.com/~scs/c-faq/top.html
    alt.comp.lang.learn.c-c++ faq:
    http://www.raos.demon.uk/acllc-c /faq.html
    Other sites:
    http://www.josuttis.com -- C++ STL Library book
     
    Thomas Matthews, Dec 15, 2003
    #7
  8. Perception

    Chris Torek Guest

    In article <m0mDb.41050$>
    Thomas Matthews <> writes:
    >"In the industry", the tactic is to convert the multibyte items
    >after they are input. The processor manipulates the data
    >according to its native Endianess. Before the data is output,
    >it is converted to the appropriate Endianess.
    >
    >There could exist integrated circuits that perform Endianess
    >conversion, but I haven't seen any. ...


    Some processors sold as embedded-system CPUs have "endianness
    controls" built in.

    First, a reminder: endianness is a result of the process of breaking
    up or assembling data. Suppose you have a wooden plank you bought
    at the local hardware store or lumber yard, that is two units thick,
    four units wide, and 24 units long. (This is not actually a "two
    by four" unless the units are odd -- "two by four"s are not 2 inches
    by four inches; 2x4 inches are the sizes it had before it was cut
    by an ancient kind of saw that no longer is used but the sizes are
    now standardized based on it. Yes, lumber-milling has ANSI/ISO
    standards that vendors must obey. There are standards for *everything*
    -- one of the great examples I heard was the standards for bridges
    and mast-heights on ships ["mast stepping" really]. These standards
    have to agree, or the ships may never get under the bridges.)

    Anyway, given this single piece of wood that is two by four by 24
    units, suppose you were to carve and/or paint a (long, skinny)
    picture on it (perhaps a fancy Celtic knot). No matter how you
    pick up and put down the plank, it continues to have a single,
    cohesive image on it ... until you get out the saw. (This is a
    very special saw with a zero-unit kerf; perhaps it is made of
    monomolecular wire. :) )

    You have decided to move the plank from your house to someone
    else's, and your shipping department (or post office) refuses to
    send a whole 24-unit-long plank, but they will send shorter ones.
    Using your saw, you cut up the plank into four pieces: 2 x 4 x 6.
    Your picture spans all four pieces, but now the order you pick
    them up and set them back down when you move the plank *matters*.
    If you re-assemble the pieces in the wrong order, the image you
    put on them, back when it was a single block of wood, will be
    wrecked. The wood now has an "endian-ness", based on the order
    you use when you send the separate pieces.

    Note that there WAS NO ENDIAN-NESS before you broke up the item,
    and if you glue the four pieces back in the correct order once they
    arrive at their destination, so that they are no longer break-up-able,
    there will be no endianness after that either. This "endian"
    property arises *because* you broke up a whole into a bunch of
    parts. The one "breaking up the whole" picks some order, and to
    re-assemble the thing properly, the one re-assembling had better
    use the same order.

    The same holds for CPUs. If you have byte-at-a-time memory, and
    the CPU has instructions that work on four-byte-at-a-time units
    (such as 32-bit integers or 32-bit floating-point numbers), the
    CPU is going to have to break up and assemble things. When it does
    so, it will use *its* order, whatever that is, to do this. When
    people talk about doing endianness conversion in moving data from
    one system to another, telling you that you are going to have to
    convert the data when you read it, what they really mean is:

    "I have chosen to be a slave to the way my system breaks things
    up, and therefore I am going to make YOU be a slave to the way
    my system breaks things up too. If YOUR system happens to use
    some other order, i.e., to assemble the broken-up things in
    some other way, YOU are going to have to do a whole lot of work
    to get around this."

    Now, their choice -- to be a slave to their system -- is not
    necessarily a *bad* one. In particular, it makes *their* job a
    lot easier, and it makes *their* code run faster. But it is awfully
    selfish, and if it turns out that they depend on you as much as
    you depend on them, it could turn out to be a poor decision after
    all. But if they have made that choice and you are stuck with it,
    why then, you are stuck.

    Today, however, some computers actually offer the programmer a
    choice of endian-ness, using one or more "endian control bits".
    In other words, the system has not made an irrevocable decision up
    front for you-the-programmer. But the system *does* break up and
    assemble things, so there *is* an order, and someone has to decide
    it -- perhaps *you*, now. It turns out to be pretty easy to offer
    "reversible endian-ness" in CPUs, simply by re-wiring the low-order
    address lines. To reverse the endianness, we can just invert them.
    (The actual on-chip implementation may be more complicated than
    this, but the principle works.) Some computer-makers have taken
    this a step beyond a simple "global" control bit (or pin) in (on)
    the CPU, and have MMU-level and/or instruction-level control bits
    for reversing. The UltraSPARC in particular has a control bit in
    the CPU, another control bit in every MMU page-table entry --
    including the I/O MMUs in the U2S and U2P adapters -- and a third
    control bit in instructions (via the endian-inverting Address Space
    Identifiers). These three bits are simply xor'ed together; the
    result controls the low-order-pin inversion. This seems convenient,
    but can be a big mess in practice, because the I/O can be "streaming"
    or "non", and the size of the chunks whose low-order addresses are
    to be inverted rapidly becomes confusing. (The microSPARC had
    similar features, but no address-inverting ASIs and no "streaming
    mode" I/O through U2whatever adapters, not having a UPA bus in the
    first place. [They had M-to-S and M-to-P, or ran native PCI in
    the first place.])

    Anyway, to summarize all this: endianness is the result of breaking-up
    of large, cohesive wholes into pieces. It is the one who does the
    breaking-up who determines the endianness. If that one is you --
    the programmer -- then you are *not* at the mercy of your system;
    but if you choose to have the system do it (perhaps for speed
    reasons, as the system probably does it a lot faster), just keep
    in mind the control you are giving up. Make sure you are "getting
    your money's worth".
    --
    In-Real-Life: Chris Torek, Wind River Systems
    Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
    email: forget about it http://web.torek.net/torek/index.html
    Reading email is like searching for food in the garbage, thanks to spammers.
     
    Chris Torek, Dec 15, 2003
    #8
  9. Perception

    Old Wolf Guest

    > Still doesn't tell me what I would get if I were to copy a little endian
    > structure to a big endian memory in the a) byte by byte and b) word by word


    There is no such thing as:
    - big endian memory
    - big endian structure
    - big endian storage of any sort

    However there are:
    - memory
    - structures
    - storage
    - big endian CPU

    If you write 4 bytes "12 34 56 78" to any device (disk, memory, ...)
    you will still have "12 34 56 78" wherever you read it. The only issue
    is if you try and read that as an "int". (eg. memcpy()ing or read()ing
    from a device into a structure -- not a recommended practice).

    There are no rules about a system's endianness, or structure padding.
    If you rely on either of these in your code then you are being very
    non-portable. In fact, little and big endian are not the only
    possibilities either, some CPUs would represent 0x12345678 as
    "56 78 12 34" and so on.
    Even with: typedef struct { int a; char b; int c; } S;
    I know of systems where sizeof(S) could be 6, 9, 10, or 12.
    If you really do want to be non-portable, you should make some
    structures like you have suggested, write them directly to a file
    on your system, and see what comes out.

    A better option would be to provide functions to convert your
    structure to a fixed external representation, and convert it back.
    This is called "serialization". For example,
    int s_serialize(const S *s, char *buf, int buf_len);
    bool s_unserialize(S *s, const char *buf);

    Some systems provide the following functions which may help you:
    #include <netinet/in.h>
    unsigned long int htonl(unsigned long int hostlong);
    unsigned short int htons(unsigned short int hostshort);
    unsigned long int ntohl(unsigned long int netlong);
    unsigned short int ntohs(unsigned short int netshort);

    which converts the parameter into big-endian value and returns it.
    (On LE systems, they swap bytes, and on BE systems they do nothing,
    so the same code will work on either system).
     
    Old Wolf, Dec 15, 2003
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. hicham
    Replies:
    2
    Views:
    9,063
    dxcoder
    Jul 2, 2003
  2. Ernst Murnleitner

    float: IEEE, big endian, little endian

    Ernst Murnleitner, Jan 13, 2004, in forum: C++
    Replies:
    0
    Views:
    894
    Ernst Murnleitner
    Jan 13, 2004
  3. invincible

    Little Endian to Big Endian

    invincible, Jun 14, 2005, in forum: C++
    Replies:
    9
    Views:
    14,395
    Old Wolf
    Jun 14, 2005
  4. invincible
    Replies:
    1
    Views:
    560
    red floyd
    Jun 14, 2005
  5. Replies:
    5
    Views:
    385
    Stephen Sprunk
    Aug 31, 2006
Loading...

Share This Page