BIG or little endian

C

Chris Torek

Types have representations. Endianness is part of that
representation.

This is, well, wrong. I was hoping for an easier way to put that,
but it is just plain wrong ... except in some cases. :)

Consider a machine that has "endian control" in the CPU, in the
instruction set, and in the MMU. (SPARCV9 implementations do this.)
That is, there is an "invert endian-ness" bit in the CPU, so that
you can "run in either mode", *plus* an "invert endian-ness" bit
in some kind(s) of memory access instructions -- on V9 one uses
the "alternate address space" extensions for this -- so that one
can, for instance, refer to shared memory regions that are being
used by a processor running in the "other" endianness. If you
set the invert bit in both the CPU and the instruction, you get
big-endian ("native", as it were) byte order. However, if the
invert bit is set in the MMU entry for the page, you get the "other"
endian-ness yet again: if all three bits are set (CPU, instruction,
and MMU), you get "little-endian" byte order.

Yet all these accesses can be done as 16-bit "short", 32-bit "int",
or 64-bit "long" / "long long" (depending on compiler mode). (Well,
admittedly, the compiler does not generate lda/sta instructions on
its own, but the inversion bits in the CPU and/or MMU can still be
set.)

Clearly, on V9 SPARCs, endianness is not due to C-level types after
all. So what *is* it due to?

As I have said before, endianness arises from "disassembly and
reassembly". Any atomic entity -- anything that is never taken
apart -- has no need for the concept of "endianness". But when
you take a large item and chop it up into small pieces, then shuffle
the small parts from point A to point B, and finally reassemble
the small parts into a large item, *then* it matters: do you take
the small parts from left to right, or from top to bottom, or
outside in, or inside out, or what?

When you let CPU #1 take a "large" value, like a 32-bit integer,
apart, and then shuffle the pieces -- such as "four 8-bit bytes"
-- over to CPU #2 and ask it to reassemble the 32-bit integer, you
subject yourself to possible different orders. If CPU #1 takes
the value apart from outside in, so that the "first byte" is the
most significant and the second byte is the least significant, but
CPU #2 assembles them "right to left", the value CPU #2 delivers
is not the value CPU #1 took apart. When you let memory subsystem
Q take the value apart, and ask memory subsystem R to reassemble
it, you again subject yourself to possible different orders.

It is tempting to think (or assume) that, on any particular machine,
the C compiler's type is the sole determinant of how everything on
that machine will disassemble and/or reassemble "original values"
to/from "bytes". On some -- perhaps even many -- machines, this
is actually true. But it is not universal, as the SPARC example
illustrates.

The key to understanding "endian-ness" is to think about the slicing
and splicing of values. You must figure out:

- who is doing the slicing or splicing, and from that,
- what order they will use, and
- why.

On a few (admittedly common) machines, there is just the one entity
that does this -- the CPU -- at least as far as C programs are
concerned, and it has just the one order. Not all machines are
that simple. Any machine with "endian-ness control bits" is more
complicated, for instance.
 
J

Joe Wright

Keith said:
pete said:
Having representation is an insufficient condition for having
endianness. That's a reason why "Values don't have endianness".

You can tell from the representation of (-1), whether your C
implementation uses two's complement or something else, but you
can't tell the endianness from the representation of (-1).

I see that the word "representation" is ambiguous. I was referring to
the representation of a type (e.g., "32 bits, big-endian,
2's-complement, bit N means [...]") as opposed to the representation
of a particular value of a type (e.g., "11001001").
Endian-ness is a characteristic of a processor's architecture. Nowhere
does my capacious C library even mention it. Not K&R nor any other book
I have (I think).

For sake of explanation, Sparc processors in Sun machines are Big Endian
and Intel x86 processors in PC's are Little Endian. A properly written C
program will compile and run identically on Sun or a PC. There is no
need for a C program to know the Endian-ness of the native processor.
 
K

Keith Thompson

Joe Wright said:
Endian-ness is a characteristic of a processor's architecture. Nowhere
does my capacious C library even mention it. Not K&R nor any other
book I have (I think).

For sake of explanation, Sparc processors in Sun machines are Big
Endian and Intel x86 processors in PC's are Little Endian. A properly
written C program will compile and run identically on Sun or a
PC. There is no need for a C program to know the Endian-ness of the
native processor.

There is no need for *most* C programs to know the endianness of the
native processor, but *some* C program can depend on it.

For example, suppose I want to write a 4-byte unsigned integer to a
file, and I want a particular endianness (say, because it's required
by an externally imposed file format). If I happen to know that I'm
running on a CPU with the desired endianness (plus a few other
system-specific assumptions), I can just write the 4-byte value
directly. Otherwise, I need to rearrange the bytes somehow, perhaps
by modifying the value in memory, perhaps by writing one byte at a
time to the file.

Maximally portable C programs don't care about either endianness (or
about two's-complement vs 1s'-complement vs. sign-and-magnitude), but
not all C programs can be maximally portable. The standard doesn't
discuss endianness, but it is visible to C programs.
 
K

Keith Thompson

Chris Torek said:
This is, well, wrong. I was hoping for an easier way to put that,
but it is just plain wrong ... except in some cases. :)

Consider a machine that has "endian control" in the CPU, in the
instruction set, and in the MMU. (SPARCV9 implementations do this.)
That is, there is an "invert endian-ness" bit in the CPU, so that
you can "run in either mode", *plus* an "invert endian-ness" bit
in some kind(s) of memory access instructions -- on V9 one uses
the "alternate address space" extensions for this -- so that one
can, for instance, refer to shared memory regions that are being
used by a processor running in the "other" endianness. If you
set the invert bit in both the CPU and the instruction, you get
big-endian ("native", as it were) byte order. However, if the
invert bit is set in the MMU entry for the page, you get the "other"
endian-ness yet again: if all three bits are set (CPU, instruction,
and MMU), you get "little-endian" byte order.

Shared memory accessed by multiple processes is beyond the scope of
the standard.

If two objects of the same type in the same program have different
endianness, then the implementation is non-conforming. For example,
memcpy()ing from one int object to another int object must yield the
same value as a simple assignment.
Yet all these accesses can be done as 16-bit "short", 32-bit "int",
or 64-bit "long" / "long long" (depending on compiler mode). (Well,
admittedly, the compiler does not generate lda/sta instructions on
its own, but the inversion bits in the CPU and/or MMU can still be
set.)

Clearly, on V9 SPARCs, endianness is not due to C-level types after
all. So what *is* it due to?

It's due to something that's part of the implementation, defined in
C99 3.12 as:

particular set of software, running in a particular translation
environment under particular control options, that performs
translation of programs for, and supports execution of functions
in, a particular execution environment

My statement that "types have representations" was incomplete. A
given type has some specified representation *for a given
implementation*.

The representation of a type can of course vary across
implementations. For example, int might be 16 bits two's-complement
big-endian under one implementation, and 32 bits ones'-complement
little-endian under another.

If we have two running instances of the same source program, and an
integer type has different endianness in the two instances (this can
be detected by the program), then those two instances are running
under two different implementations (even if they happen to have been
compiled by the same compiler and executed on the same computer).

[...]
It is tempting to think (or assume) that, on any particular machine,
the C compiler's type is the sole determinant of how everything on
that machine will disassemble and/or reassemble "original values"
to/from "bytes". On some -- perhaps even many -- machines, this
is actually true. But it is not universal, as the SPARC example
illustrates.

Right, but the compiler is only one part of the implementation. And
though the standard doesn't explicitly mention endianness, it does
require it to be documented. C99 6.2.6.1p2 says:

Except for bit-fields, objects are composed of contiguous
sequences of one or more bytes, the number, order, and encoding of
which are either explicitly specified or implementation-defined.

[...]
 
J

Joe Wright

Keith said:
There is no need for *most* C programs to know the endianness of the
native processor, but *some* C program can depend on it.

For example, suppose I want to write a 4-byte unsigned integer to a
file, and I want a particular endianness (say, because it's required
by an externally imposed file format). If I happen to know that I'm
running on a CPU with the desired endianness (plus a few other
system-specific assumptions), I can just write the 4-byte value
directly. Otherwise, I need to rearrange the bytes somehow, perhaps
by modifying the value in memory, perhaps by writing one byte at a
time to the file.
Without regard to the Endian-ness of your cpu but knowing that the
network or file require Big-Endian, you create an array 4 of unsigned
char, load it up in the desired order and write the array.
Maximally portable C programs don't care about either endianness (or
about two's-complement vs 1s'-complement vs. sign-and-magnitude), but
not all C programs can be maximally portable. The standard doesn't
discuss endianness, but it is visible to C programs.
Maybe but it need not be visible. Knowing that 'net' is Big Endian we
can write net-to-host and host-to-net routines without knowing the
Endian-ness of 'host'.
 
K

Keith Thompson

Joe Wright said:
Maybe but it need not be visible. Knowing that 'net' is Big Endian we
can write net-to-host and host-to-net routines without knowing the
Endian-ness of 'host'.

Certainly we can, and arguably we should, but we don't have to.

If we happen to know that the endianness of the CPU we're running on
matches network order, it might be advantageous to write whole words
directly rather than writing a byte at a time.

Certainly most C code doesn't need to deal with endianness, and most
code that does need to deal with it can do so in a portable manner.
My point is simply that endianness it's a property of the
implementation, and one that's visible to C code, just like other
aspects of representation.
 
I

Ian Collins

Keith said:
Certainly we can, and arguably we should, but we don't have to.

If we happen to know that the endianness of the CPU we're running on
matches network order, it might be advantageous to write whole words
directly rather than writing a byte at a time.
Which is the way every system I've seen implements the standard network
byte order conversion functions. On big endian machines, they are a no op.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,189
Latest member
CryptoTaxSoftware

Latest Threads

Top