How does C handle issues arising out of Endianness?

Discussion in 'C Programming' started by Indian.croesus@gmail.com, Dec 17, 2006.

  1. Guest

    Hi,
    If I am right Endianness is CPU related. I do not know if the
    question is right in itself but if it is then how does C handle issues
    arising out of Endianness.

    I understand that if we pass structures using sockets across platforms,
    we need to take care of Endianness issues at the application level. But
    for example, for the code using bitwise AND to figure out if a number
    is odd or even, how does C know the LSB position?

    Thanks,
    IC
    , Dec 17, 2006
    #1
    1. Advertising

  2. Tim Prince Guest

    wrote:
    > Hi,
    > If I am right Endianness is CPU related. I do not know if the
    > question is right in itself but if it is then how does C handle issues
    > arising out of Endianness.
    >
    > I understand that if we pass structures using sockets across platforms,
    > we need to take care of Endianness issues at the application level. But
    > for example, for the code using bitwise AND to figure out if a number
    > is odd or even, how does C know the LSB position?
    >

    C relies on the implementor to define each operator for each native data
    type for each platform. For an example, you could look up the
    gcc/config/*/*.md (machine description) files.
    Standard C has rules against data type punning under which your odd/even
    code would break with a change of endianness. C can't necessarily
    prevent you from breaking those rules.
    Tim Prince, Dec 17, 2006
    #2
    1. Advertising

  3. Guest

    Thanks. I will check it out.

    > C relies on the implementor to define each operator for each native data
    > type for each platform.


    So why does it not do the same with structs? Why should the programmer
    take care of it while passing it across platforms? Is it more of a
    "rationale" related question?

    Thanks,
    IC
    , Dec 17, 2006
    #3
  4. Eric Sosman Guest

    wrote:
    > Hi,
    > If I am right Endianness is CPU related. I do not know if the
    > question is right in itself but if it is then how does C handle issues
    > arising out of Endianness.


    By ignoring them.

    > I understand that if we pass structures using sockets across platforms,
    > we need to take care of Endianness issues at the application level. But
    > for example, for the code using bitwise AND to figure out if a number
    > is odd or even, how does C know the LSB position?


    On any particular implementation, the LSB of the unknown
    value being tested is in the same position as the LSB of the
    constant 1 you are ANDing with it. Problem solved.

    Problems can occur when you exchange data between dissimilar
    implementations, because they may disagree about endianness. They
    may disagree about other matters of representation, too: one
    platform might represent an int with sixteen bits while the other
    uses thirty-two, one might use IEEE floating-point while the other
    uses the S/360 format, the two might insert padding in structures
    differently, and so on. Endianness is just one of a number of
    representational issues you must consider when communicating
    between different systems.

    One approach that has proven widely useful is to invent a
    "wire format" for the data to be exchanged, a format that does
    not depend on the peculiarities of the machines. Each machine
    then needs two routines: One to read "wire format" and convert
    it to native representation, and one to convert the native form
    to "wire format." For obvious reasons, many extrememly popular
    "wire formats" use textual representations: If you want to send
    the value forty-two, you transmit the two characters '4' and '2',
    possibly followed by a delimiter like '\n' or ';' or some such.
    This doesn't solve every possible problem (because the encoding
    of characters can also vary from machine to machine), but it solves
    a great many of them and usually leaves a fairly tractable remnant
    to deal with.

    --
    Eric Sosman
    lid
    Eric Sosman, Dec 17, 2006
    #4
  5. Guest


    > > I understand that if we pass structures using sockets across platforms,
    > > we need to take care of Endianness issues at the application level. But
    > > for example, for the code using bitwise AND to figure out if a number
    > > is odd or even, how does C know the LSB position?

    >
    > On any particular implementation, the LSB of the unknown
    > value being tested is in the same position as the LSB of the
    > constant 1 you are ANDing with it. Problem solved.


    Thanks. Now that you have explained it that was pretty stupid of me.

    Are shift operators better examples of the question I have?

    As in the following snippet (please do let me know if I need to follow
    any norms while adding code snippets.)
    -------
    int x = 10;
    int y;

    y = x << 2;
    -------

    Thanks,
    IC
    , Dec 17, 2006
    #5
  6. Default User Guest

    wrote:

    > Hi,
    > If I am right Endianness is CPU related. I do not know if the
    > question is right in itself but if it is then how does C handle issues
    > arising out of Endianness.
    >
    > I understand that if we pass structures using sockets across
    > platforms, we need to take care of Endianness issues at the
    > application level. But for example, for the code using bitwise AND to
    > figure out if a number is odd or even, how does C know the LSB
    > position?


    C doesn't, but the implementation creator did.




    Brian
    Default User, Dec 17, 2006
    #6
  7. Malcolm Guest

    <> wrote in message
    news:...
    >
    >> > I understand that if we pass structures using sockets across platforms,
    >> > we need to take care of Endianness issues at the application level. But
    >> > for example, for the code using bitwise AND to figure out if a number
    >> > is odd or even, how does C know the LSB position?

    >>
    >> On any particular implementation, the LSB of the unknown
    >> value being tested is in the same position as the LSB of the
    >> constant 1 you are ANDing with it. Problem solved.

    >
    > Thanks. Now that you have explained it that was pretty stupid of me.
    >
    > Are shift operators better examples of the question I have?
    >
    > As in the following snippet (please do let me know if I need to follow
    > any norms while adding code snippets.)
    > -------
    > int x = 10;
    > int y;
    >
    > y = x << 2;
    >

    The shift operator assumes that the bits are arrayed from left to right,
    with the most significant at the left.
    This may or may not have anything to do with the physical location of the
    bits in memory. *(unsigned char *)x; will read the top byte of x, which is
    probably either 10 or zero, but could be anything.
    --
    www.personal.leeds.ac.uk/~bgy1mm
    freeware games to download.
    Malcolm, Dec 17, 2006
    #7
  8. pete Guest

    Malcolm wrote:

    > *(unsigned char *)x; will read the top byte of x


    .... if "top" means "lowest addressed"

    --
    pete
    pete, Dec 17, 2006
    #8
  9. Malcolm Guest

    "pete" <> wrote in message
    > Malcolm wrote:
    >
    >> *(unsigned char *)x; will read the top byte of x

    >
    > ... if "top" means "lowest addressed"
    >

    If Microsoft take over the world they might make us all store out bytes at
    the little end.
    --
    www.personal.leeds.ac.uk/~bgy1mm
    freeware games to download.
    Malcolm, Dec 17, 2006
    #9
  10. Guest

    Just to add, as to how to determine the nature of Endianness,

    #define LITTLE_ENDIAN 0
    #define BIG_ENDIAN 1
    int machineEndianness()
    {
    int i = 1;
    char *p = (char *) &i;
    if (p[0] == 1) // Lowest address contains the least significant
    byte
    return BIG_ENDIAN;
    else
    return LITTLE_ENDIAN;
    }
    , Dec 17, 2006
    #10
  11. > If I am right Endianness is CPU related. I do not know if the
    >question is right in itself but if it is then how does C handle issues
    >arising out of Endianness.


    It's very simple: If you do anything that depends on Endianness,
    the result is undefined (or perhaps implementation-defined). The
    problem is thrown into the programmer's court NOT to do that. Write
    your code so it doesn't depend on endianness.

    >I understand that if we pass structures using sockets across platforms,
    >we need to take care of Endianness issues at the application level. But
    >for example, for the code using bitwise AND to figure out if a number
    >is odd or even, how does C know the LSB position?


    If you view a value as a value, and not a bunch of bytes, there is
    no problem. C knows which end of an int has the least significant
    bit, and machine registers might not even be addressable as bytes.
    The problem comes when you take a value (potentially multi-byte)
    and try to convert it to or from a bunch of bytes. THEN you have
    to worry about the problem that there are 24 byte-orders for 4-byte
    integers, and 40320 byte-orders for 8-byte integers.
    Gordon Burditt, Dec 17, 2006
    #11
  12. >Just to add, as to how to determine the nature of Endianness,
    >
    > #define LITTLE_ENDIAN 0
    > #define BIG_ENDIAN 1


    There are 24 possible byte-orders for a 4-byte integer.
    Where are the other 22 defines?

    At the very least, you should have a NON_ENDIAN define for
    neither little-endian nor big-endian. PDP-11s are real.

    > int machineEndianness()
    > {
    > int i = 1;
    > char *p = (char *) &i;
    > if (p[0] == 1) // Lowest address contains the least significant
    >byte
    > return BIG_ENDIAN;
    > else
    > return LITTLE_ENDIAN;
    > }
    Gordon Burditt, Dec 17, 2006
    #12
  13. Richard Guest

    (Gordon Burditt) writes:

    >>Just to add, as to how to determine the nature of Endianness,
    >>
    >> #define LITTLE_ENDIAN 0
    >> #define BIG_ENDIAN 1

    >
    > There are 24 possible byte-orders for a 4-byte integer.
    > Where are the other 22 defines?


    Helpful.

    >
    > At the very least, you should have a NON_ENDIAN define for
    > neither little-endian nor big-endian. PDP-11s are real.
    >
    >> int machineEndianness()
    >> {
    >> int i = 1;
    >> char *p = (char *) &i;
    >> if (p[0] == 1) // Lowest address contains the least significant
    >>byte
    >> return BIG_ENDIAN;
    >> else
    >> return LITTLE_ENDIAN;
    >> }


    There seems to be a "it doesnt matter in C" answer appearing here which
    is as incorrect as it is misleading. Eric seems to have been the only
    one to give an answer.

    Many system communicate using C and it isn't too uncommon for endian
    issues to crop up.

    C does not "take care of it" if bytes or streams of bytes are thrown
    down a wire.

    The programmer does have to reassemble data accordingly - especially
    with user defined structures, packing etc.
    Richard, Dec 18, 2006
    #13
  14. Mattan Guest

    Richard wrote:
    > [..] C does not "take care of it" if bytes or streams of bytes are thrown
    > down a wire.
    >
    > The programmer does have to reassemble data accordingly - especially
    > with user defined structures, packing etc.


    ....and an implementation of htonl (host to network long) and friends
    may be useful, if available on the current system.

    /Mattan
    Mattan, Dec 18, 2006
    #14
  15. (Gordon Burditt) writes:
    >>Just to add, as to how to determine the nature of Endianness,
    >>
    >> #define LITTLE_ENDIAN 0
    >> #define BIG_ENDIAN 1

    >
    > There are 24 possible byte-orders for a 4-byte integer.
    > Where are the other 22 defines?
    >
    > At the very least, you should have a NON_ENDIAN define for
    > neither little-endian nor big-endian. PDP-11s are real.


    That's true in principle. In real life, though, there are only two or
    three possible endiannesses: big-endian, little-endian, and
    PDP-11-endian -- and you're not likely to run into the latter.

    And you also have to allow for the possibilty that you don't *have*
    4-byte integers. On some DSPs, for example, an int is one byte (and a
    byte is at least 16 bits); on such a system, int has no endianness.

    It's a good idea to check explicitly for both big-endian and
    little-endian, but it's probably not necessary to handle other cases
    other than by bailing out. For example:

    #include <limits.h>
    #include <stdio.h>
    #include <stdint.h>
    #include <stdlib.h>

    int main(void)
    {
    #if CHAR_BIT != 8
    #error "CHAR_BIT != 8, I'm not prepared to cope with that."
    #endif
    unsigned char arr[4] = { 0x12, 0x34, 0x56, 0x78 };
    uint32_t n = *(uint32_t*)arr;
    if (n == 0x12345678) {
    printf("big-endian\n");
    }
    else if (n == 0x78563412) {
    printf("little-endian\n");
    }
    else {
    fprintf(stderr, "Unable to determine endianness, n == 0x%x\n", n);
    exit(EXIT_FAILURE);
    }

    return 0;
    }

    Adjust as needed if your system doesn't support <stdint.h>.

    (The first time I tried this, I had forgotten to #include <limits.h>.
    CHAR_BIT quietly expanded to 0, giving me a very unexpected result.)

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    We must do something. This is something. Therefore, we must do this.
    Keith Thompson, Dec 18, 2006
    #15
  16. "Mattan" <> wrote in message
    news:pPlhh.26747$...
    > Richard wrote:
    >> [..] C does not "take care of it" if bytes or streams of bytes are
    >> thrown
    >> down a wire.
    >>
    >> The programmer does have to reassemble data accordingly - especially
    >> with user defined structures, packing etc.

    >
    > ...and an implementation of htonl (host to network long) and friends
    > may be useful, if available on the current system.


    Any system which has sockets available should have htonl() et al as
    well.

    To forestall any complaints that sockets are OT here, note that the same
    exact issue exists when you try to write any object to a file in binary
    mode. You have to define file/wire formats when working with binary
    data, and that includes the number of bits and endianness for each
    field. In the sockets world, the unit of transport is the octet (always
    8 bits), not the byte (which varies in size), and "network byte order"
    is defined as big-endian. File formats have no such conventions. Using
    the same convention as sockets makes your life easier if you're on a
    system that has sockets available (which is nearly all, these days)
    since you get ntohl() et al for free, but a huge number of file formats
    (and non-IETF network protocols) from the DOS/Windows world use
    little-endian storage.

    Text, of course, is the safest format for interchange, provided you know
    what encoding is used for the characters. Unfortunately, one still has
    to deal with EBCDIC vs ASCII and all the various multibyte encodings for
    Unicode, so figuring out how to read a text file with the right encoding
    has become as much a challenge as dealing with binary data -- and slower
    to boot. The only remaining advantage is that it's easier for humans to
    debug (or, in the case of files, modify) the messages.

    S

    --
    Stephen Sprunk "God does not play dice." --Albert Einstein
    CCIE #3723 "God is an inveterate gambler, and He throws the
    K5SSS dice at every possible opportunity." --Stephen Hawking


    --
    Posted via a free Usenet account from http://www.teranews.com
    Stephen Sprunk, Dec 18, 2006
    #16
  17. Chris Torek Guest

    In article <>
    <> wrote:
    > If I am right Endianness is CPU related.


    Others have already discussed most of the practical issues. I would
    like to point out that endianness is not really "CPU related" at all
    though.

    Suppose you are getting ready to move from one apartment to another.
    Your friend has offered you *free* use of his small pickup truck,
    so that you need not rent a huge van.

    There is one problem: your bed will not fit, fully assembled, into
    the pickup.

    Fortunately, your bed comes apart, into three pieces: headboard,
    middle section, and footboard. Each of those pieces will, by
    itself, fit in the truck. So you take the bed apart:

    ||||
    |||| |||
    |||| ============= |||
    |||| ============= |||
    headboard middle section footboard

    At the other end, your friend will reassemble the bed while you
    drive back to get more stuff. You bring him the headboard, then
    the footboard, then the middle, because that was the easiest way
    to take them out:

    ||||
    |||| |||
    |||| ||| =============
    |||| ||| =============

    Then you drive back to your old place to get more stuff.

    Your friend, for some reason, believes that you delivered the
    footboard first, then the middle, then the headboard. So he connects
    the pieces in that order. But you delivered the footboard first,
    so he put that where the headboard goes, then you delivered the
    footboard, which he put in the middle, and last, you delivered the
    middle, which he put at the foot:

    ||||
    |||||||
    =============|||||||
    =============|||||||

    Your bed is no longer use-able, until you take it apart and
    re-reassemble it in the correct order. The problem is that you
    and your friend failed to agree on "endianness". (Well, that, and
    your friend is about as smart as a typical computer: he only does
    what you tell him, instead of what you meant.[%]) But there is no
    CPU in sight. So where did the "endianness" come from?

    It came from disagreement between various entities -- in this case,
    you and your friend -- that dis-assembled something (here, your
    bed), then re-assembled it, but did not connect the same pieces in
    the same way. To avoid the problem, you must make sure that all
    entities involved in disassembly and reassembly agree as to which
    sub-parts go where.

    If you (and of course your friends too) never break a whole object
    up into parts, the problem never occurs. (Transport the bed as a
    single unit, it arrives as a single unit, still in "bed" shape.)
    The problem occurs only when you *do* break something into parts.
    Even then, it occurs only if you put it back together in some other
    way. If you and your friends can all agree on some basic,
    un-break-able sub-unit -- such as, say, the 8-bit byte -- and you
    make sure never to give out anything "too big" so that your friends
    have to break them up, *you* can control the order of breaking-up
    and re-assembling, and therefore guarantee that the re-assembly
    always follows the same sequencing rules as the breaking-up.
    -----
    [%] The student programmer's lament:

    I really hate this darn* machine
    I wish that they would sell it
    It never does quite what I want
    But only what I tell it.

    * or other suitable one-syllable word
    --
    In-Real-Life: Chris Torek, Wind River Systems
    Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
    email: forget about it http://web.torek.net/torek/index.html
    Reading email is like searching for food in the garbage, thanks to spammers.
    Chris Torek, Dec 18, 2006
    #17
  18. wrote:

    > Thanks. I will check it out.


    check what out? Please leave relevent context.

    > > C relies on the implementor to define each operator for each native data
    > > type for each platform.

    >
    > So why does it not do the same with structs? Why should the programmer
    > take care of it while passing it across platforms? Is it more of a
    > "rationale" related question?


    C takes care of structs *on the same platform*. The C standard does not

    address cross-platform issues so it's the programmer's problem.

    Note you not only have endian problems, but also fundamental types'
    sizes, floating point representations, character sets and struct
    padding may all vary. Pointers are a complete no-no.

    That's just what I thought of off the top of my head there will be
    other stuff.

    Take a look at XDR, ASN.1 and XML for portable data formats.

    --
    Nick Keighley

    Unicode is an international standard character set that can be used
    to write documents in almost any language you're likely to speak,
    learn or encounter in your lifetime, barring alien abduction.
    (XML in a Nutshell)
    Nick Keighley, Dec 18, 2006
    #18
  19. On 18 Dec 2006 08:28:19 GMT, Chris Torek <> wrote:

    > In article <>
    > <> wrote:
    > > If I am right Endianness is CPU related.

    >
    > Others have already discussed most of the practical issues. I would
    > like to point out that endianness is not really "CPU related" at all
    > though.
    >
    > Suppose you are getting ready to move from one apartment to another. <snip>
    > Fortunately, your bed comes apart, into three pieces: <snip>
    > At the other end, your friend will reassemble the bed while you
    > drive back to get more stuff. <snip>
    > Your friend, for some reason, believes that you delivered the
    > [pieces in a different order and reassembles obviously wrongly]
    > The problem is that you
    > and your friend failed to agree on "endianness". (Well, that, and
    > your friend is about as smart as a typical computer: he only does
    > what you tell him, instead of what you meant.[%]) <snip>


    It's a good thing you're the one driving; I'd hate to see what this
    Turing-machine-brained friend does when faced with say a bent or
    obscured traffic control sign. (Aside: I lived near Boston back about
    1980 when the originally Californian law, allowing by default right
    turn after stop at a red light if no traffic, was adopted -- or at
    least its adoption 'encouraged' -- Federally as a gasoline saving
    measure. So the city went around putting up 'no turn on red' signs
    pretty much everywhere. One intersection near me was already signed
    'no left turn' AND 'no right turn' and they added 'no turn on red'!)

    > If you (and of course your friends too) never break a whole object
    > up into parts, the problem never occurs. (Transport the bed as a
    > single unit, it arrives as a single unit, still in "bed" shape.)


    If it and everything else in the same load is tied down adequately;
    otherwise it may arrive in an arbitrary but substantial number of
    pieces, none bed-shaped, and probably not reassemble-able at all. FWIW
    _this_ problem rarely happens with userlevel computer data; although
    it can and does occur in hardware, devices and systems mostly are
    designed with error detection and correction features (parity, CRC,
    LRC, VRC, EDC, ECC, etc.) which lead the program to see either (1)
    correct data as sent/stored/whatever or (2) no data at all, sometimes
    but not always with a more-or-less specific error indicator.

    <snip rest>

    - David.Thompson1 at worldnet.att.net
    Dave Thompson, Jan 3, 2007
    #19
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Will Chamberlain

    Problems Arising When Migrating to New Server

    Will Chamberlain, Oct 6, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    369
    =?Utf-8?B?U3JlZWppdGggUmFt?=
    Oct 6, 2005
  2. Jonathan N. Little

    Re: Problem arising from Outlook

    Jonathan N. Little, Jun 11, 2005, in forum: HTML
    Replies:
    3
    Views:
    407
    Jonathan N. Little
    Jun 12, 2005
  3. Toby Inkster

    Re: Problem arising from Outlook

    Toby Inkster, Jun 12, 2005, in forum: HTML
    Replies:
    0
    Views:
    379
    Toby Inkster
    Jun 12, 2005
  4. pramod
    Replies:
    22
    Views:
    1,826
    Lew Pitcher
    Jan 6, 2004
  5. kelvSYC

    Endianness

    kelvSYC, Jun 3, 2005, in forum: C++
    Replies:
    3
    Views:
    584
    Donovan Rebbechi
    Jun 3, 2005
Loading...

Share This Page