Endianness macros

Discussion in 'C Programming' started by James Harris, Aug 23, 2013.

  1. (snip)

    Little endian is slightly easier for addition. If you do mulitply,
    all the advantage goes away. Past the 6502, it should have gone away.
    The VMS DUMP program figured it out. ASCII is printed left to right,
    and hex right to left, with the address in the middle. Text is
    readable, numbers are readable.

    But it is so much easier big-endian, and avoid the whole problem.

    -- glen
    glen herrmannsfeldt, Aug 26, 2013
    1. Advertisements

  2. James Harris

    James Harris Guest

    Agreed. They both have their places. An unsigned big endian which is longer
    than a register may be easier for sorting as the least significant byte is
    at a later address, as is the case with text. In fact I remember IBM doing
    something to signed numbers so that they too appeared in a directly-sortable
    order - possibly flip the top bit. This may have been in DB2. I cannot
    remember. In any case the point was to allow a single sort routine to sort
    integers in exactly the same way it sorted characters.

    Little endian also has its place. I have found it easier to process in code
    because the units are always in the same position regardless of the size of
    the integer. Additionally, it makes a lot of sense to me that bits are
    numbered right to left. Then, the value of a bit is equal to 2 to the power
    of the bit position. IBM called the bits 0 to 31 from left to right which
    bears little relation to anything.

    So it's horses for courses. A bit like driving on the left or the right. The
    main thing is to know what's expected and work with it.

    Les saying that one version or the other is a "mistake" is putting it too
    strongly IMHO. You could say it's like deciding which end of an egg one
    should crack open. ;-)

    James Harris, Aug 26, 2013
    1. Advertisements

  3. (snip)
    But in little-endian hex, you have to count:

    0, 8, 4, c, 2, a, 6, e, 1, 9, 5, d, 3, b, 7, f,

    That is, in binary:

    0000, 1000, 0100, 1100, 0010, 1010, 1110, 0001, 1001,
    0101, 1101, 0011, 1011, 0111, 1111.

    Much easier big-endian.

    -- glen
    glen herrmannsfeldt, Aug 26, 2013
  4. (snip)
    Yes, you can do that. Even more, you can arrange it so floating
    point sorts, too.
    So you can ignore the high bits, loading just the low bits, and with
    no offset. But it also means that your program will seem to work until
    the values get bigger.

    But yes, it does get interesting. With z/Architecture the register
    bits are now 0 to 63, so the 32 bit registers are 32 to 63.
    -- glen
    glen herrmannsfeldt, Aug 26, 2013
  5. Just to add to the frivolity, the decimal numbering system we use
    (where one hundred and twenty three is written as "123") is called
    "Arabic numerals" or, more precisely, "Hindu-Arabic numerals".
    Fibonacci promoted their use in Europe.

    Since Arabic is written right-to-left, a number written as "123" is
    actually little-endian when written in Arabic. European languages
    are written left-to-right, but Europeans kept the high-order digit
    on the left, making the same number "123" big-endian (perhaps also
    influenced by Roman numerals being big-endian: "CCCXXI").
    Keith Thompson, Aug 26, 2013
  6. James Harris

    Jorgen Grahn Guest

    Another nasty one is arithmetic on alien integers.

    c = a + b;

    works nicely even on little-endian, until a carry bit moves past a
    byte border. (Or something -- I prefer not to think about the detailed

    Jorgen Grahn, Aug 27, 2013
  7. James Harris

    Jorgen Grahn Guest

    But it teaches people that it's ok to lift foreign data into the
    program logic; that's a bad thing IMO.

    And it doesn't really teach you much about endianness issues as you
    normally see them -- you're presented with integers which are already
    foreign, not with an unstructured octet buffer which you have to
    interpret in terms of C.

    Jorgen Grahn, Aug 27, 2013
    Keith Thompson, Aug 27, 2013
  9. James Harris

    Alan Curry Guest

    Have you ever looked at the talkd protocol? You listen to a TCP port, then
    leave an invitation for your friend to connect to it. The invitation
    contains the address of your listening socket. Literally. Not a dotted
    quad and a %d port number... just a copy of your struct sockaddr_in, sent
    on the wire without modification.

    I don't know if that kind of thing happens in any other protocols designed
    in the early BSD era, but it might be a hint to their thinking. The socket
    address is an object you can use to make requests to your local kernel,
    and also a portable representation of an address that programs on other
    machines can use in requests to *their* local kernels, without any parsing
    or byte-swapping.
    Alan Curry, Aug 28, 2013
  10. James Harris

    Joe Pfeiffer Guest

    I haven't -- but I think (in the absence of explicit comments in the
    code, or even something in the 4.2 networking documents) it's at least
    as likely to be my guess as yours.
    Joe Pfeiffer, Aug 28, 2013
  11. The Internet isn't and certainly wasn't the whole world, but most
    other networks and interchange standards were indeed big-endian or not
    endian at all (i.e. parallel).

    But one very important exception: sending ASCII on a serial line
    (RS-232 and later RS-4xx) was low bit first, first de facto by
    Teletype Corp and then de jure by ANSI/X3 -- I don't remember the
    number but I saw it once in a library and it was a bit amusing: the
    same covers, copyright, preface about the standards process, etc., as
    other standards, and then a page for the body of standard containing
    exactly one clause and one sentence something close to "The order of
    transmission of the bits of ASCII (X3.4-whatever) shall be from least
    significant to most significant."

    I *think* there was also a FIPS adoption of this, but I could be
    misrembering that, since there were many FIPS adoptions of X3.
    David Thompson, Aug 29, 2013
  12. James Harris

    Les Cargill Guest

    No doubt with several "this page intentionally left blank". :)

    This is amusing. This being said, if you look at this data stream with
    an o-scope, the MSB is still on the left :)

    And just to be inconsistent, it would bother me less if bytes were
    reversed within themselves than having to byte-swap would.
    Les Cargill, Aug 29, 2013
  13. (snip, someone wrote)
    10 megabit ethernet is also sent LSB first. For the most part, though
    that doesn't matter much. There is one place where a multiple byte
    field is used, the length field for 802.3 frame format. The MAC
    address and ethertype are treated like numbers, but are mostly
    bit strings. (Not counting the ordering for the data inside
    the frame.)

    If you want to compute the ethernet CRC value, you also need to know
    the bit order to get it right.

    -- glen
    glen herrmannsfeldt, Aug 29, 2013
  14. Change both systems from big-endian to little-endian, i.e. reverse the

    CCCXXI = 321
    IXXCCC = 123

    PDP-endian jokes are left as an exercise for the reader.

    Stephen Sprunk, Aug 29, 2013
  15. James Harris

    James Kuyper Guest

    Actually, that's an example of a subtractive Roman Numeral. The Romans
    themselves didn't use them - it was invented in the 13th century CE.
    When a roman numeral with a lower value was written to the left of one
    with a higher value, it was subtracted from the higher one, rather than
    added to it. For instance, IV = 4, IX = 9. I gather that there's a lot
    of inconsistency and disagreement about the handling of subtractive
    Roman numerals, and I didn't find any examples involving multiple
    subtractions. My personal opinion is that IXXCCC = 300 - 20 - 1 = 279.
    James Kuyper, Aug 29, 2013
  16. (snip)
    A little closer to computers, note that IBM used a binary coding
    similar to roman numerals for sizing of computer memory, and sometimes
    for software to fit that memory.

    A=2K, B=4K, C=8K, D=16K, E=32K, F=64K, G=128K, H=256K, and so on.

    Like roman numerals, smaller values to the right of a larger one are
    added, and to the left subtracted, so:

    FE=96K, (and usually not EG), but DG=112K.

    Fortran G was designed to run on 128K machines, and PL/I (F) could
    run on 64K machines, though very slowly.

    -- glen
    glen herrmannsfeldt, Aug 29, 2013
  17. You missed the joke too, apparently.
    I was taught in grade school was that you could only subtract one of the
    next-lower numeral, with an exception for IX. That's consistent with
    movie copyright notices (why are they in Roman numerals anyway?), e.g.
    1999 was written MCMLXLIX rather than MIM, MCMIC, MCMXCIX, etc.
    If it were in big-endian form, sure. In little-endian form, though,
    it's 123 (LE) or 321 (BE).

    Stephen Sprunk, Aug 30, 2013
  18. I can see the joke now:

    "67,108,864K ought to be enough for anybody."

    Stephen Sprunk, Aug 30, 2013
  19. Sorry, plain wrong. Take, for example, the dedication to Augustus over
    the entrance to the theatre at Lepcis Magna. It reports his holding of
    tribunician power for the 24th time thus:

    tr(ibunicia) (pot)estate XXIV ...

    See http://irt.kcl.ac.uk/irt2009/IRT321.html for the full transcript,
    photos all over the web (I think there are three versions over various

    (Subtraction /was/ a later addition to the system, but so was using the
    Latin letters. All number systems evolve.)

    Ben Bacarisse, Aug 30, 2013
  20. James Harris

    Geoff Guest

    MMI, A Space Odyssey
    Geoff, Aug 30, 2013
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.