Endianness macros

glen herrmannsfeldt · Aug 26, 2013

(snip)

Little endian is slightly easier for addition. If you do mulitply,
all the advantage goes away. Past the 6502, it should have gone away.

y oSr'oyas egniy ti seod t'nttam ?re

The VMS DUMP program figured it out. ASCII is printed left to right,
and hex right to left, with the address in the middle. Text is
readable, numbers are readable.

But it is so much easier big-endian, and avoid the whole problem.

-- glen

James Harris · Aug 26, 2013

Joe Pfeiffer said:
....

There's no significant difference between them. Big-endian is
infinitesimally easier for people to read; little-endian can be
preferred for the equally irrelevant improvement in internal
consistency.

Agreed. They both have their places. An unsigned big endian which is longer
than a register may be easier for sorting as the least significant byte is
at a later address, as is the case with text. In fact I remember IBM doing
something to signed numbers so that they too appeared in a directly-sortable
order - possibly flip the top bit. This may have been in DB2. I cannot
remember. In any case the point was to allow a single sort routine to sort
integers in exactly the same way it sorted characters.

Little endian also has its place. I have found it easier to process in code
because the units are always in the same position regardless of the size of
the integer. Additionally, it makes a lot of sense to me that bits are
numbered right to left. Then, the value of a bit is equal to 2 to the power
of the bit position. IBM called the bits 0 to 31 from left to right which
bears little relation to anything.

So it's horses for courses. A bit like driving on the left or the right. The
main thing is to know what's expected and work with it.

Les saying that one version or the other is a "mistake" is putting it too
strongly IMHO. You could say it's like deciding which end of an egg one
should crack open. ;-)

James

glen herrmannsfeldt · Aug 26, 2013

(snip)

it is not a mistake
if i understand well the mistake is "big endian"
(snip)

i think it in reverse...

in the sense for me is better
?re mattn't does it yinge sayo'...
too

i like the x86 endian, if i remember it would be little endian...
if the b==binary form of the number

b1
is 1
b11
is 3
b111
is 7
b011
is 6
etc

how all you can see
the number get more digit in the side right
for allocate new memory for the binary digit
but this is ok for digit of base 0xFF too
and 0xFFFF too etc

But in little-endian hex, you have to count:

0, 8, 4, c, 2, a, 6, e, 1, 9, 5, d, 3, b, 7, f,

That is, in binary:

0000, 1000, 0100, 1100, 0010, 1010, 1110, 0001, 1001,
0101, 1101, 0011, 1011, 0111, 1111.

Much easier big-endian.

-- glen

glen herrmannsfeldt · Aug 26, 2013

(snip)

Agreed. They both have their places. An unsigned big endian which is longer
than a register may be easier for sorting as the least significant byte is
at a later address, as is the case with text. In fact I remember IBM doing
something to signed numbers so that they too appeared in a directly-sortable
order - possibly flip the top bit. This may have been in DB2. I cannot
remember. In any case the point was to allow a single sort routine to sort
integers in exactly the same way it sorted characters.

Yes, you can do that. Even more, you can arrange it so floating
point sorts, too.

Little endian also has its place. I have found it easier to process in code
because the units are always in the same position regardless of the size of
the integer. Additionally, it makes a lot of sense to me that bits are
numbered right to left. Then, the value of a bit is equal to 2 to the power
of the bit position. IBM called the bits 0 to 31 from left to right which
bears little relation to anything.

So you can ignore the high bits, loading just the low bits, and with
no offset. But it also means that your program will seem to work until
the values get bigger.

But yes, it does get interesting. With z/Architecture the register
bits are now 0 to 63, so the 32 bit registers are 32 to 63.

So it's horses for courses. A bit like driving on the left or
the right. The main thing is to know what's expected and work
with it.

-- glen

Keith Thompson · Aug 26, 2013

James Harris said:
Agreed. They both have their places. An unsigned big endian which is longer
than a register may be easier for sorting as the least significant byte is
at a later address, as is the case with text. In fact I remember IBM doing
something to signed numbers so that they too appeared in a directly-sortable
order - possibly flip the top bit. This may have been in DB2. I cannot
remember. In any case the point was to allow a single sort routine to sort
integers in exactly the same way it sorted characters.

Little endian also has its place. I have found it easier to process in code
because the units are always in the same position regardless of the size of
the integer. Additionally, it makes a lot of sense to me that bits are
numbered right to left. Then, the value of a bit is equal to 2 to the power
of the bit position. IBM called the bits 0 to 31 from left to right which
bears little relation to anything.

So it's horses for courses. A bit like driving on the left or the right. The
main thing is to know what's expected and work with it.

Les saying that one version or the other is a "mistake" is putting it too
strongly IMHO. You could say it's like deciding which end of an egg one
should crack open. ;-)

Just to add to the frivolity, the decimal numbering system we use
(where one hundred and twenty three is written as "123") is called
"Arabic numerals" or, more precisely, "Hindu-Arabic numerals".
Fibonacci promoted their use in Europe.

Since Arabic is written right-to-left, a number written as "123" is
actually little-endian when written in Arabic. European languages
are written left-to-right, but Europeans kept the high-order digit
on the left, making the same number "123" big-endian (perhaps also
influenced by Roman numerals being big-endian: "CCCXXI").

Jorgen Grahn · Aug 27, 2013

One of the sneakiest I personally ran across involved code that
carefully hton'ed a value before stuffing it into a buffer. What's
wrong with that? Well, the caller had *already* hton'ed the data!
And since hton*() were no-ops on the BigEndian development system,
testing didn't reveal any problem ...

Another nasty one is arithmetic on alien integers.

c = a + b;

works nicely even on little-endian, until a carry bit moves past a
byte border. (Or something -- I prefer not to think about the detailed
behavior.)

/Jorgen

Jorgen Grahn · Aug 27, 2013

OTOH, it forces _everyone_ who uses the sockets library to learn about
endianness issues, which is not necessarily a bad thing; many will not
have ever seen or thought about such issues before--and would go on to
write code that doesn't account for it properly otherwise.

But it teaches people that it's ok to lift foreign data into the
program logic; that's a bad thing IMO.

And it doesn't really teach you much about endianness issues as you
normally see them -- you're presented with integers which are already
foreign, not with an unstructured octet buffer which you have to
interpret in terms of C.

/Jorgen

Keith Thompson · Aug 27, 2013

Robert Wessel said:
CXXIII?

IXXCCC

Alan Curry · Aug 27, 2013

Given that the first widely distributed Unix was on a PDP-11, and
networking was added in 4.2bsd which to the best of my knowledge was
only available on a VAX at that time, that's not a likely explanation.

That the same people were implementing both the network stack and the
first applications using it, and didn't put a lot of thought into the
details of the interface, strikes me as a much likelier one.

Have you ever looked at the talkd protocol? You listen to a TCP port, then
leave an invitation for your friend to connect to it. The invitation
contains the address of your listening socket. Literally. Not a dotted
quad and a %d port number... just a copy of your struct sockaddr_in, sent
on the wire without modification.

I don't know if that kind of thing happens in any other protocols designed
in the early BSD era, but it might be a hint to their thinking. The socket
address is an object you can use to make requests to your local kernel,
and also a portable representation of an address that programs on other
machines can use in requests to *their* local kernels, without any parsing
or byte-swapping.

Joe Pfeiffer · Aug 28, 2013

Alan Curry said:
Have you ever looked at the talkd protocol? You listen to a TCP port, then
leave an invitation for your friend to connect to it. The invitation
contains the address of your listening socket. Literally. Not a dotted
quad and a %d port number... just a copy of your struct sockaddr_in, sent
on the wire without modification.

I don't know if that kind of thing happens in any other protocols designed
in the early BSD era, but it might be a hint to their thinking. The socket
address is an object you can use to make requests to your local kernel,
and also a portable representation of an address that programs on other
machines can use in requests to *their* local kernels, without any parsing
or byte-swapping.

I haven't -- but I think (in the absence of explicit comments in the
code, or even something in the 4.2 networking documents) it's at least
as likely to be my guess as yours.

David Thompson · Aug 28, 2013

The Internet isn't and certainly wasn't the whole world, but most
other networks and interchange standards were indeed big-endian or not
endian at all (i.e. parallel).

But one very important exception: sending ASCII on a serial line
(RS-232 and later RS-4xx) was low bit first, first de facto by
Teletype Corp and then de jure by ANSI/X3 -- I don't remember the
number but I saw it once in a library and it was a bit amusing: the
same covers, copyright, preface about the standards process, etc., as
other standards, and then a page for the body of standard containing
exactly one clause and one sentence something close to "The order of
transmission of the bits of ASCII (X3.4-whatever) shall be from least
significant to most significant."

I *think* there was also a FIPS adoption of this, but I could be
misrembering that, since there were many FIPS adoptions of X3.

Les Cargill · Aug 28, 2013

David said:
The Internet isn't and certainly wasn't the whole world, but most
other networks and interchange standards were indeed big-endian or not
endian at all (i.e. parallel).

But one very important exception: sending ASCII on a serial line
(RS-232 and later RS-4xx) was low bit first, first de facto by
Teletype Corp and then de jure by ANSI/X3 -- I don't remember the
number but I saw it once in a library and it was a bit amusing: the
same covers, copyright, preface about the standards process, etc., as
other standards, and then a page for the body of standard containing
exactly one clause and one sentence something close to "The order of
transmission of the bits of ASCII (X3.4-whatever) shall be from least
significant to most significant."

No doubt with several "this page intentionally left blank".

This is amusing. This being said, if you look at this data stream with
an o-scope, the MSB is still on the left

And just to be inconsistent, it would bother me less if bytes were
reversed within themselves than having to byte-swap would.

glen herrmannsfeldt · Aug 29, 2013

(snip, someone wrote)

The Internet isn't and certainly wasn't the whole world, but most
other networks and interchange standards were indeed big-endian or not
endian at all (i.e. parallel).

But one very important exception: sending ASCII on a serial line
(RS-232 and later RS-4xx) was low bit first,

10 megabit ethernet is also sent LSB first. For the most part, though
that doesn't matter much. There is one place where a multiple byte
field is used, the length field for 802.3 frame format. The MAC
address and ethertype are treated like numbers, but are mostly
bit strings. (Not counting the ordering for the data inside
the frame.)

If you want to compute the ethernet CRC value, you also need to know
the bit order to get it right.

-- glen

Stephen Sprunk · Aug 29, 2013

Apparently I'm being dense today, and am not getting the joke.

Change both systems from big-endian to little-endian, i.e. reverse the
numerals:

CCCXXI = 321
IXXCCC = 123

PDP-endian jokes are left as an exercise for the reader.

S

James Kuyper · Aug 29, 2013

[...]
Europeans kept the high-order digit on the left, making the
same number "123" big-endian (perhaps also influenced by Roman
numerals being big-endian: "CCCXXI").

CXXIII?

IXXCCC

Click to expand...

Apparently I'm being dense today, and am not getting the joke.

Click to expand...

Change both systems from big-endian to little-endian, i.e. reverse the
numerals:

CCCXXI = 321
IXXCCC = 123

Actually, that's an example of a subtractive Roman Numeral. The Romans
themselves didn't use them - it was invented in the 13th century CE.
When a roman numeral with a lower value was written to the left of one
with a higher value, it was subtracted from the higher one, rather than
added to it. For instance, IV = 4, IX = 9. I gather that there's a lot
of inconsistency and disagreement about the handling of subtractive
Roman numerals, and I didn't find any examples involving multiple
subtractions. My personal opinion is that IXXCCC = 300 - 20 - 1 = 279.

glen herrmannsfeldt · Aug 29, 2013

(snip)

Actually, that's an example of a subtractive Roman Numeral. The Romans
themselves didn't use them - it was invented in the 13th century CE.

A little closer to computers, note that IBM used a binary coding
similar to roman numerals for sizing of computer memory, and sometimes
for software to fit that memory.

A=2K, B=4K, C=8K, D=16K, E=32K, F=64K, G=128K, H=256K, and so on.

Like roman numerals, smaller values to the right of a larger one are
added, and to the left subtracted, so:

FE=96K, (and usually not EG), but DG=112K.

Fortran G was designed to run on 128K machines, and PL/I (F) could
run on 64K machines, though very slowly.

-- glen

Stephen Sprunk · Aug 29, 2013

Actually, that's an example of a subtractive Roman Numeral.

You missed the joke too, apparently.

The Romans themselves didn't use them - it was invented in the 13th
century CE. When a roman numeral with a lower value was written to
the left of one with a higher value, it was subtracted from the
higher one, rather than added to it. For instance, IV = 4, IX = 9. I
gather that there's a lot of inconsistency and disagreement about the
handling of subtractive Roman numerals, and I didn't find any
examples involving multiple subtractions.

I was taught in grade school was that you could only subtract one of the
next-lower numeral, with an exception for IX. That's consistent with
movie copyright notices (why are they in Roman numerals anyway?), e.g.
1999 was written MCMLXLIX rather than MIM, MCMIC, MCMXCIX, etc.

My personal opinion is that IXXCCC = 300 - 20 - 1 = 279.

If it were in big-endian form, sure. In little-endian form, though,
it's 123 (LE) or 321 (BE).

S

Stephen Sprunk · Aug 29, 2013

A little closer to computers, note that IBM used a binary coding
similar to roman numerals for sizing of computer memory, and sometimes
for software to fit that memory.

A=2K, B=4K, C=8K, D=16K, E=32K, F=64K, G=128K, H=256K, and so on.

I can see the joke now:

"67,108,864K ought to be enough for anybody."

S

Ben Bacarisse · Aug 29, 2013

James Kuyper said:
[...]
Europeans kept the high-order digit on the left, making the
same number "123" big-endian (perhaps also influenced by Roman
numerals being big-endian: "CCCXXI").

CXXIII?

IXXCCC

Apparently I'm being dense today, and am not getting the joke.

Click to expand...

Change both systems from big-endian to little-endian, i.e. reverse the
numerals:

CCCXXI = 321
IXXCCC = 123

Click to expand...

Actually, that's an example of a subtractive Roman Numeral. The Romans
themselves didn't use them - it was invented in the 13th century CE.

Sorry, plain wrong. Take, for example, the dedication to Augustus over
the entrance to the theatre at Lepcis Magna. It reports his holding of
tribunician power for the 24th time thus:

tr(ibunicia) (pot)estate XXIV ...

See http://irt.kcl.ac.uk/irt2009/IRT321.html for the full transcript,
photos all over the web (I think there are three versions over various
entrances).

(Subtraction /was/ a later addition to the system, but so was using the
Latin letters. All number systems evolve.)

<snip>

Geoff · Aug 29, 2013

MMI, A Space Odyssey

Endianness macros	4	Nov 27, 2009
I know VBA (macros) - nothing else!	0	Sep 26, 2022
Macros	28	Dec 2, 2012
bitwise operator and endianness	5	Nov 5, 2007
about macros	4	Jul 17, 2011
Changing endianness	8	Feb 10, 2010
Macros	16	Nov 28, 2006
Bit shifts and endianness	72	Jan 5, 2006

Endianness macros

glen herrmannsfeldt

James Harris

glen herrmannsfeldt

glen herrmannsfeldt

Keith Thompson

Jorgen Grahn

Jorgen Grahn

Keith Thompson

Alan Curry

Joe Pfeiffer

David Thompson

Les Cargill

glen herrmannsfeldt

Stephen Sprunk

James Kuyper

glen herrmannsfeldt

Stephen Sprunk

Stephen Sprunk

Ben Bacarisse

Geoff

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads