How does C handle issues arising out of Endianness?

I

Indian.croesus

Hi,
If I am right Endianness is CPU related. I do not know if the
question is right in itself but if it is then how does C handle issues
arising out of Endianness.

I understand that if we pass structures using sockets across platforms,
we need to take care of Endianness issues at the application level. But
for example, for the code using bitwise AND to figure out if a number
is odd or even, how does C know the LSB position?

Thanks,
IC
 
T

Tim Prince

Hi,
If I am right Endianness is CPU related. I do not know if the
question is right in itself but if it is then how does C handle issues
arising out of Endianness.

I understand that if we pass structures using sockets across platforms,
we need to take care of Endianness issues at the application level. But
for example, for the code using bitwise AND to figure out if a number
is odd or even, how does C know the LSB position?
C relies on the implementor to define each operator for each native data
type for each platform. For an example, you could look up the
gcc/config/*/*.md (machine description) files.
Standard C has rules against data type punning under which your odd/even
code would break with a change of endianness. C can't necessarily
prevent you from breaking those rules.
 
I

Indian.croesus

Thanks. I will check it out.
C relies on the implementor to define each operator for each native data
type for each platform.

So why does it not do the same with structs? Why should the programmer
take care of it while passing it across platforms? Is it more of a
"rationale" related question?

Thanks,
IC
 
E

Eric Sosman

Hi,
If I am right Endianness is CPU related. I do not know if the
question is right in itself but if it is then how does C handle issues
arising out of Endianness.

By ignoring them.
I understand that if we pass structures using sockets across platforms,
we need to take care of Endianness issues at the application level. But
for example, for the code using bitwise AND to figure out if a number
is odd or even, how does C know the LSB position?

On any particular implementation, the LSB of the unknown
value being tested is in the same position as the LSB of the
constant 1 you are ANDing with it. Problem solved.

Problems can occur when you exchange data between dissimilar
implementations, because they may disagree about endianness. They
may disagree about other matters of representation, too: one
platform might represent an int with sixteen bits while the other
uses thirty-two, one might use IEEE floating-point while the other
uses the S/360 format, the two might insert padding in structures
differently, and so on. Endianness is just one of a number of
representational issues you must consider when communicating
between different systems.

One approach that has proven widely useful is to invent a
"wire format" for the data to be exchanged, a format that does
not depend on the peculiarities of the machines. Each machine
then needs two routines: One to read "wire format" and convert
it to native representation, and one to convert the native form
to "wire format." For obvious reasons, many extrememly popular
"wire formats" use textual representations: If you want to send
the value forty-two, you transmit the two characters '4' and '2',
possibly followed by a delimiter like '\n' or ';' or some such.
This doesn't solve every possible problem (because the encoding
of characters can also vary from machine to machine), but it solves
a great many of them and usually leaves a fairly tractable remnant
to deal with.
 
I

Indian.croesus

On any particular implementation, the LSB of the unknown
value being tested is in the same position as the LSB of the
constant 1 you are ANDing with it. Problem solved.

Thanks. Now that you have explained it that was pretty stupid of me.

Are shift operators better examples of the question I have?

As in the following snippet (please do let me know if I need to follow
any norms while adding code snippets.)
 
D

Default User

Hi,
If I am right Endianness is CPU related. I do not know if the
question is right in itself but if it is then how does C handle issues
arising out of Endianness.

I understand that if we pass structures using sockets across
platforms, we need to take care of Endianness issues at the
application level. But for example, for the code using bitwise AND to
figure out if a number is odd or even, how does C know the LSB
position?

C doesn't, but the implementation creator did.




Brian
 
M

Malcolm

Thanks. Now that you have explained it that was pretty stupid of me.

Are shift operators better examples of the question I have?

As in the following snippet (please do let me know if I need to follow
any norms while adding code snippets.)
-------
int x = 10;
int y;

y = x << 2;
The shift operator assumes that the bits are arrayed from left to right,
with the most significant at the left.
This may or may not have anything to do with the physical location of the
bits in memory. *(unsigned char *)x; will read the top byte of x, which is
probably either 10 or zero, but could be anything.
 
S

sam_cit

Just to add, as to how to determine the nature of Endianness,

#define LITTLE_ENDIAN 0
#define BIG_ENDIAN 1
int machineEndianness()
{
int i = 1;
char *p = (char *) &i;
if (p[0] == 1) // Lowest address contains the least significant
byte
return BIG_ENDIAN;
else
return LITTLE_ENDIAN;
}
 
G

Gordon Burditt

If I am right Endianness is CPU related. I do not know if the
question is right in itself but if it is then how does C handle issues
arising out of Endianness.

It's very simple: If you do anything that depends on Endianness,
the result is undefined (or perhaps implementation-defined). The
problem is thrown into the programmer's court NOT to do that. Write
your code so it doesn't depend on endianness.
I understand that if we pass structures using sockets across platforms,
we need to take care of Endianness issues at the application level. But
for example, for the code using bitwise AND to figure out if a number
is odd or even, how does C know the LSB position?

If you view a value as a value, and not a bunch of bytes, there is
no problem. C knows which end of an int has the least significant
bit, and machine registers might not even be addressable as bytes.
The problem comes when you take a value (potentially multi-byte)
and try to convert it to or from a bunch of bytes. THEN you have
to worry about the problem that there are 24 byte-orders for 4-byte
integers, and 40320 byte-orders for 8-byte integers.
 
G

Gordon Burditt

Just to add, as to how to determine the nature of Endianness,
#define LITTLE_ENDIAN 0
#define BIG_ENDIAN 1

There are 24 possible byte-orders for a 4-byte integer.
Where are the other 22 defines?

At the very least, you should have a NON_ENDIAN define for
neither little-endian nor big-endian. PDP-11s are real.
int machineEndianness()
{
int i = 1;
char *p = (char *) &i;
if (p[0] == 1) // Lowest address contains the least significant
byte
return BIG_ENDIAN;
else
return LITTLE_ENDIAN;
}
 
R

Richard

Just to add, as to how to determine the nature of Endianness,

#define LITTLE_ENDIAN 0
#define BIG_ENDIAN 1

There are 24 possible byte-orders for a 4-byte integer.
Where are the other 22 defines?
Helpful.


At the very least, you should have a NON_ENDIAN define for
neither little-endian nor big-endian. PDP-11s are real.
int machineEndianness()
{
int i = 1;
char *p = (char *) &i;
if (p[0] == 1) // Lowest address contains the least significant
byte
return BIG_ENDIAN;
else
return LITTLE_ENDIAN;
}

There seems to be a "it doesnt matter in C" answer appearing here which
is as incorrect as it is misleading. Eric seems to have been the only
one to give an answer.

Many system communicate using C and it isn't too uncommon for endian
issues to crop up.

C does not "take care of it" if bytes or streams of bytes are thrown
down a wire.

The programmer does have to reassemble data accordingly - especially
with user defined structures, packing etc.
 
M

Mattan

Richard said:
[..] C does not "take care of it" if bytes or streams of bytes are thrown
down a wire.

The programmer does have to reassemble data accordingly - especially
with user defined structures, packing etc.

....and an implementation of htonl (host to network long) and friends
may be useful, if available on the current system.

/Mattan
 
K

Keith Thompson

There are 24 possible byte-orders for a 4-byte integer.
Where are the other 22 defines?

At the very least, you should have a NON_ENDIAN define for
neither little-endian nor big-endian. PDP-11s are real.

That's true in principle. In real life, though, there are only two or
three possible endiannesses: big-endian, little-endian, and
PDP-11-endian -- and you're not likely to run into the latter.

And you also have to allow for the possibilty that you don't *have*
4-byte integers. On some DSPs, for example, an int is one byte (and a
byte is at least 16 bits); on such a system, int has no endianness.

It's a good idea to check explicitly for both big-endian and
little-endian, but it's probably not necessary to handle other cases
other than by bailing out. For example:

#include <limits.h>
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>

int main(void)
{
#if CHAR_BIT != 8
#error "CHAR_BIT != 8, I'm not prepared to cope with that."
#endif
unsigned char arr[4] = { 0x12, 0x34, 0x56, 0x78 };
uint32_t n = *(uint32_t*)arr;
if (n == 0x12345678) {
printf("big-endian\n");
}
else if (n == 0x78563412) {
printf("little-endian\n");
}
else {
fprintf(stderr, "Unable to determine endianness, n == 0x%x\n", n);
exit(EXIT_FAILURE);
}

return 0;
}

Adjust as needed if your system doesn't support <stdint.h>.

(The first time I tried this, I had forgotten to #include <limits.h>.
CHAR_BIT quietly expanded to 0, giving me a very unexpected result.)
 
S

Stephen Sprunk

Mattan said:
Richard said:
[..] C does not "take care of it" if bytes or streams of bytes are
thrown
down a wire.

The programmer does have to reassemble data accordingly - especially
with user defined structures, packing etc.

...and an implementation of htonl (host to network long) and friends
may be useful, if available on the current system.

Any system which has sockets available should have htonl() et al as
well.

To forestall any complaints that sockets are OT here, note that the same
exact issue exists when you try to write any object to a file in binary
mode. You have to define file/wire formats when working with binary
data, and that includes the number of bits and endianness for each
field. In the sockets world, the unit of transport is the octet (always
8 bits), not the byte (which varies in size), and "network byte order"
is defined as big-endian. File formats have no such conventions. Using
the same convention as sockets makes your life easier if you're on a
system that has sockets available (which is nearly all, these days)
since you get ntohl() et al for free, but a huge number of file formats
(and non-IETF network protocols) from the DOS/Windows world use
little-endian storage.

Text, of course, is the safest format for interchange, provided you know
what encoding is used for the characters. Unfortunately, one still has
to deal with EBCDIC vs ASCII and all the various multibyte encodings for
Unicode, so figuring out how to read a text file with the right encoding
has become as much a challenge as dealing with binary data -- and slower
to boot. The only remaining advantage is that it's easier for humans to
debug (or, in the case of files, modify) the messages.

S
 
C

Chris Torek

If I am right Endianness is CPU related.

Others have already discussed most of the practical issues. I would
like to point out that endianness is not really "CPU related" at all
though.

Suppose you are getting ready to move from one apartment to another.
Your friend has offered you *free* use of his small pickup truck,
so that you need not rent a huge van.

There is one problem: your bed will not fit, fully assembled, into
the pickup.

Fortunately, your bed comes apart, into three pieces: headboard,
middle section, and footboard. Each of those pieces will, by
itself, fit in the truck. So you take the bed apart:

||||
|||| |||
|||| ============= |||
|||| ============= |||
headboard middle section footboard

At the other end, your friend will reassemble the bed while you
drive back to get more stuff. You bring him the headboard, then
the footboard, then the middle, because that was the easiest way
to take them out:

||||
|||| |||
|||| ||| =============
|||| ||| =============

Then you drive back to your old place to get more stuff.

Your friend, for some reason, believes that you delivered the
footboard first, then the middle, then the headboard. So he connects
the pieces in that order. But you delivered the footboard first,
so he put that where the headboard goes, then you delivered the
footboard, which he put in the middle, and last, you delivered the
middle, which he put at the foot:

||||
|||||||
=============|||||||
=============|||||||

Your bed is no longer use-able, until you take it apart and
re-reassemble it in the correct order. The problem is that you
and your friend failed to agree on "endianness". (Well, that, and
your friend is about as smart as a typical computer: he only does
what you tell him, instead of what you meant.[%]) But there is no
CPU in sight. So where did the "endianness" come from?

It came from disagreement between various entities -- in this case,
you and your friend -- that dis-assembled something (here, your
bed), then re-assembled it, but did not connect the same pieces in
the same way. To avoid the problem, you must make sure that all
entities involved in disassembly and reassembly agree as to which
sub-parts go where.

If you (and of course your friends too) never break a whole object
up into parts, the problem never occurs. (Transport the bed as a
single unit, it arrives as a single unit, still in "bed" shape.)
The problem occurs only when you *do* break something into parts.
Even then, it occurs only if you put it back together in some other
way. If you and your friends can all agree on some basic,
un-break-able sub-unit -- such as, say, the 8-bit byte -- and you
make sure never to give out anything "too big" so that your friends
have to break them up, *you* can control the order of breaking-up
and re-assembling, and therefore guarantee that the re-assembly
always follows the same sequencing rules as the breaking-up.
-----
[%] The student programmer's lament:

I really hate this darn* machine
I wish that they would sell it
It never does quite what I want
But only what I tell it.

* or other suitable one-syllable word
 
N

Nick Keighley

Thanks. I will check it out.

check what out? Please leave relevent context.
So why does it not do the same with structs? Why should the programmer
take care of it while passing it across platforms? Is it more of a
"rationale" related question?

C takes care of structs *on the same platform*. The C standard does not

address cross-platform issues so it's the programmer's problem.

Note you not only have endian problems, but also fundamental types'
sizes, floating point representations, character sets and struct
padding may all vary. Pointers are a complete no-no.

That's just what I thought of off the top of my head there will be
other stuff.

Take a look at XDR, ASN.1 and XML for portable data formats.

--
Nick Keighley

Unicode is an international standard character set that can be used
to write documents in almost any language you're likely to speak,
learn or encounter in your lifetime, barring alien abduction.
(XML in a Nutshell)
 
D

Dave Thompson

If I am right Endianness is CPU related.

Others have already discussed most of the practical issues. I would
like to point out that endianness is not really "CPU related" at all
though.

Suppose you are getting ready to move from one apartment to another. <snip>
Fortunately, your bed comes apart, into three pieces: <snip>
At the other end, your friend will reassemble the bed while you
drive back to get more stuff. <snip>
Your friend, for some reason, believes that you delivered the
[pieces in a different order and reassembles obviously wrongly]
The problem is that you
and your friend failed to agree on "endianness". (Well, that, and
your friend is about as smart as a typical computer: he only does
what you tell him, instead of what you meant.[%]) <snip>

It's a good thing you're the one driving; I'd hate to see what this
Turing-machine-brained friend does when faced with say a bent or
obscured traffic control sign. (Aside: I lived near Boston back about
1980 when the originally Californian law, allowing by default right
turn after stop at a red light if no traffic, was adopted -- or at
least its adoption 'encouraged' -- Federally as a gasoline saving
measure. So the city went around putting up 'no turn on red' signs
pretty much everywhere. One intersection near me was already signed
'no left turn' AND 'no right turn' and they added 'no turn on red'!)
If you (and of course your friends too) never break a whole object
up into parts, the problem never occurs. (Transport the bed as a
single unit, it arrives as a single unit, still in "bed" shape.)

If it and everything else in the same load is tied down adequately;
otherwise it may arrive in an arbitrary but substantial number of
pieces, none bed-shaped, and probably not reassemble-able at all. FWIW
_this_ problem rarely happens with userlevel computer data; although
it can and does occur in hardware, devices and systems mostly are
designed with error detection and correction features (parity, CRC,
LRC, VRC, EDC, ECC, etc.) which lead the program to see either (1)
correct data as sent/stored/whatever or (2) no data at all, sometimes
but not always with a more-or-less specific error indicator.

<snip rest>

- David.Thompson1 at worldnet.att.net
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,070
Latest member
BiogenixGummies

Latest Threads

Top