Endian Independence

K

Kelly B

#include<stdio.h>

#define LITTLE_ENDIAN 0
#define BIG_ENDIAN 1

int endian() {
int i = 1;
char *p = (char *)&i;

if (p[0] == 1)
return LITTLE_ENDIAN;
else
return BIG_ENDIAN;
}


int reverseInt (int i) {
unsigned char c1, c2, c3, c4;

if ( endian() == BIG_ENDIAN ) {
return i;
} else {
c1 = i & 255;
c2 = (i >> 8) & 255;
c3 = (i >> 16) & 255;
c4 = (i >> 24) & 255;

return ((int)c1 << 24) + ((int)c2 << 16) + ((int)c3 << 8) + c4;
}
}

int main(void)
{
if(endian())
puts("Big Endian Machine");
else
puts("Small Endian Machine");
printf("%d",reverseInt(5));
return 0;

}

I tested it on my PC (On Pentium 4) and this is the output:

Small Endian Machine
83886080.

I am baffled as I was expecting 5 to be printed or is it that I am
missing something completely ?
Probably i have completely misunderstood the idea of endianness :(

Any help is appreciated.

Thank You
 
A

Antoninus Twink

int reverseInt (int i) {
unsigned char c1, c2, c3, c4;

if ( endian() == BIG_ENDIAN ) {
return i;
} else {
c1 = i & 255;
c2 = (i >> 8) & 255;
c3 = (i >> 16) & 255;
c4 = (i >> 24) & 255;

return ((int)c1 << 24) + ((int)c2 << 16) + ((int)c3 << 8) + c4;
}
}

int main(void)
{
if(endian())
puts("Big Endian Machine");
else
puts("Small Endian Machine");
printf("%d",reverseInt(5));
return 0;
}

I tested it on my PC (On Pentium 4) and this is the output:

Small Endian Machine
83886080.

I am baffled as I was expecting 5 to be printed or is it that I am
missing something completely ?

Your (somewhat poorly named) reverseInt function takes an integer i, and
returns 32 bits that give i when interpreted as a bigendian integer.
Since your machine is littleendian, the printf() function interprets its
arguments as if they were littleendian, so when you pass printf
reversInt(5) as an argument, it interprets this as 0x05000000.
 
L

Lew Pitcher

#include<stdio.h>

#define LITTLE_ENDIAN 0
#define BIG_ENDIAN 1

int endian() {
int i = 1;
char *p = (char *)&i;

if (p[0] == 1)
return LITTLE_ENDIAN;
else
return BIG_ENDIAN;
}

In the above code, you assume that there are only two possible "endian"
values.
int reverseInt (int i) {
unsigned char c1, c2, c3, c4;

if ( endian() == BIG_ENDIAN ) {
return i;
} else {
c1 = i & 255;
c2 = (i >> 8) & 255;
c3 = (i >> 16) & 255;
c4 = (i >> 24) & 255;

return ((int)c1 << 24) + ((int)c2 << 16) + ((int)c3 << 8) + c4;
}
}

In the above code, you assume that an int has a sizeof 4 characters.
int main(void)
{
if(endian())
puts("Big Endian Machine");
else
puts("Small Endian Machine");
printf("%d",reverseInt(5));
return 0;

}

"Endianness" usually refers to how elements are ordered when storing
multi-element entities, and is usually referenced with respect to the order
that "bytes" are used to store "integers". For instance, if it takes
2 "bytes" (A and B) to store an "integer", there are two ways that these
bytes can be ordered in memory as they are stored:
A followed by B
and
B followed by A

If an integer takes four bytes (A, B, C, and D), then there are (4 * 3 * 2 *
1) or 24 ways to order these bytes in memory:
A followed by B followed by C followed by D
A followed by B followed by D followed by C
A followed by D followed by B followed by C
...
D followed by C followed by B followed by A

In the simple 2-element case, when the byte containing the most-significant
portion of the compound value is stored first, the order is called "Big
Endian". When the least-significant portion is stored first, the order is
called "Little Endian".

In the more complex n-element cases, "Little Endian" and "Big Endian" are
two of the many possible orders.


Now, to your code....

in the endian() function, you return one of two values, based on whether the
least-significant portion of the value is stored first or not. This binary
return (LITTLE_ENDIAN or BIG_ENDIAN) can only be valid if an int takes the
same space as two (and only two) char elements. If an int took more space
(say, 4 char elements), then endian() would have to return one of many more
possible ordering names (one of 24 names, for a 4 char int, for instance).
Clearly, your endian() function assumes that sizeof(int) == 2

But, your reverseInt() function clearly assumes that sizeof(int) == 4, at
least for LITTLE_ENDIAN values.

You should know that your compiler may actually use a different value for
sizeof(int).

In summary, your code is flawed.
I tested it on my PC (On Pentium 4) and this is the output:

Small Endian Machine
83886080.

I am baffled as I was expecting 5 to be printed or is it that I am
missing something completely ?
Probably i have completely misunderstood the idea of endianness :(
Yes.

Any help is appreciated.

First off, as far as the C language is concerned, there is no need to
determine the "endianness" of stored values. Where it is important (i.e.
when transferring binary values through a file), your compiler's
documentation should tell you the exact format. Otherwise, you don't need
to worry about it.

Secondly, you should first determine how "wide" your compiler makes
integers. Look for the sizeof(int) value, as this will give you the number
of "bytes" that make up an integer.

Thirdly, if sizeof(int) is greater than 2, your compiler may choose some
order other than "Little Endian" /or/ "Big Endian" to store the values
in. "Little" and "Big" endian only name two of the possible orders.

--
Lew Pitcher

Master Codewright & JOAT-in-training | Registered Linux User #112576
http://pitcher.digitalfreehold.ca/ | GPG public key available by request
---------- Slackware - Because I know what I'm doing. ------
 
K

Kelly B

Lew said:
#include<stdio.h>

#define LITTLE_ENDIAN 0
#define BIG_ENDIAN 1

int endian() {
int i = 1;
char *p = (char *)&i;

if (p[0] == 1)
return LITTLE_ENDIAN;
else
return BIG_ENDIAN;
}

In the above code, you assume that there are only two possible "endian"
values.
int reverseInt (int i) {
unsigned char c1, c2, c3, c4;

if ( endian() == BIG_ENDIAN ) {
return i;
} else {
c1 = i & 255;
c2 = (i >> 8) & 255;
c3 = (i >> 16) & 255;
c4 = (i >> 24) & 255;

return ((int)c1 << 24) + ((int)c2 << 16) + ((int)c3 << 8) + c4;
}
}

....Long Snip...
First off, as far as the C language is concerned, there is no need to
determine the "endianness" of stored values. Where it is important (i.e.
when transferring binary values through a file), your compiler's
documentation should tell you the exact format. Otherwise, you don't need
to worry about it.

Secondly, you should first determine how "wide" your compiler makes
integers. Look for the sizeof(int) value, as this will give you the number
of "bytes" that make up an integer.

Thirdly, if sizeof(int) is greater than 2, your compiler may choose some
order other than "Little Endian" /or/ "Big Endian" to store the values
in. "Little" and "Big" endian only name two of the possible orders.

Thanks Antonius and Lew !
This is what bugged me :(

http://www.ibm.com/developerworks/aix/library/au-endianc/

I thought the article was correct and wanted to quickly test it on my
PC.I guess i will have to write my own function(s).
 
A

Antoninus Twink

First off, as far as the C language is concerned, there is no need to
determine the "endianness" of stored values. Where it is important (i.e.
when transferring binary values through a file), your compiler's
documentation should tell you the exact format. Otherwise, you don't need
to worry about it.

Rubbish. Suppose he wants to read a binary file produced by someone else
on a different machine. Then knowing the endianness of both machines is
crucial.
Secondly, you should first determine how "wide" your compiler makes
integers. Look for the sizeof(int) value, as this will give you the number
of "bytes" that make up an integer.

Nonsense. There is nothing to suggest that the OP isn't perfectly well
aware that his compiler uses 32-bit ints.
Thirdly, if sizeof(int) is greater than 2, your compiler may choose some
order other than "Little Endian" /or/ "Big Endian" to store the values
in. "Little" and "Big" endian only name two of the possible orders.

Eyewash. The OP said explicitly that he's using a Pentium 4, which is a
littleendian architecture.
 
B

Ben Bacarisse

This is what bugged me :(

http://www.ibm.com/developerworks/aix/library/au-endianc/

I thought the article was correct and wanted to quickly test it on my
PC.

Well, it is not a good explanation, but it is not exactly wrong
either. The main part you missed is that you don't need to worry
unless your program "exports" multi-byte values. The vast majority of
C programs can be entirely portable without any need to worry about
the endianness of the hardware.

It is not surprising. That article has a section "When endianness
affects code" which has 6 paragraphs. 5 of these about when it does
*not* affect the code! Only that last short paragraph starts to explain
when it does matter.
I guess i will have to write my own function(s).

If you are writing network code (the most common reason to export
multi-byte values) then you can use POSIX functions like htons and
htonl etc. Only write your own if you don't have these available or
you need to something more outlandish.
 
R

Richard Tobin

Antoninus Twink said:
Rubbish. Suppose he wants to read a binary file produced by someone else
on a different machine. Then knowing the endianness of both machines is
crucial.

While this is true, I recommend where possible standardising on a
fixed byte order for files that may be shared between architectures.
You also need to consider the size and padding of items. Writing a
program to deal with all the possibilities of format is more tedious
than writing it to produce a uniform format.

This has the added advantage that files which *should* be identical
*are identical.

-- Richard
 
K

Kelly B

Kelly said:
#include<stdio.h>

#define LITTLE_ENDIAN 0
#define BIG_ENDIAN 1

int endian() {
int i = 1;
char *p = (char *)&i;

if (p[0] == 1)
return LITTLE_ENDIAN;
else
return BIG_ENDIAN;
}

...snip..

Just one more thing.What is the right way to convert a *signed* int from
one endianness to another ( more specifically from big-endian to small
or vice versa). How do i preserve the *sign* bit.
Swapping the bytes cannot be an option, unless i probably somehow
preserve the sign and treat the number as an unsigned int or am i
way-off again ?
 
S

santosh

Kelly said:
Kelly said:
#include<stdio.h>

#define LITTLE_ENDIAN 0
#define BIG_ENDIAN 1

int endian() {
int i = 1;
char *p = (char *)&i;

if (p[0] == 1)
return LITTLE_ENDIAN;
else
return BIG_ENDIAN;
}

..snip..

Just one more thing.What is the right way to convert a *signed* int
from one endianness to another ( more specifically from big-endian to
small or vice versa). How do i preserve the *sign* bit.
Swapping the bytes cannot be an option, unless i probably somehow
preserve the sign and treat the number as an unsigned int or am i
way-off again ?

You'll need to the manner in which signed values are represented on your
machine, whether twos-complement, sign-and-magnitude or
ones-complement, the three formats that C recognises. Such code will
not be portable.

But most system's already provide their own routines for
endian-conversion whereever it's likely to matter. Unix systems provide
htonl/htons and ntohl/ntohs.
 
J

Jean-Marc Bourguet

Antoninus Twink said:
Rubbish. Suppose he wants to read a binary file produced by someone else
on a different machine. Then knowing the endianness of both machines is
crucial.

Why?

1/ You can write binary files which have a defined binary format without
knowing the endianess of either the writer and the reader (obviously
knowing that the file is in the same endianess as you allows optimization)

2/ Even if your binary file is a memory dump which depend on the endianess
of the writer, knowing which one that will return the case 1 for the
reader.

3/ With an adequate file format, (ie a signature which is byte order
dependant) you can detect at run time that the file is not in the same
order as your native one, without knowing which.

And naturally, when you aren't trying to do binary IO manipulating data
wider that a byte in the same format as the one they are represented in
memory, you'd better ensure that all other things which may apply (size and
alignment are the more common one) are the same.
 
B

Ben Bacarisse

Kelly B said:
Kelly said:
#include<stdio.h>

#define LITTLE_ENDIAN 0
#define BIG_ENDIAN 1

int endian() {
int i = 1;
char *p = (char *)&i;

if (p[0] == 1)
return LITTLE_ENDIAN;
else
return BIG_ENDIAN;
}

..snip..

Just one more thing.What is the right way to convert a *signed* int
from one endianness to another ( more specifically from big-endian to
small or vice versa). How do i preserve the *sign* bit.
Swapping the bytes cannot be an option, unless i probably somehow
preserve the sign and treat the number as an unsigned int or am i
way-off again ?

The most portable way is to treat the object as an array of unsigned
char and to re-order these bytes.
 
A

Antoninus Twink

You'll need to the manner in which signed values are represented on your
machine, whether twos-complement, sign-and-magnitude or
ones-complement, the three formats that C recognises.

Why?

Endianness is just a question of how the bytes are arranged in memory,
not related to how they might represent an integer once they've been put
into the right order.

The OP can just treat the bytes as unsigned chars and rearrange them.
 
B

Ben Bacarisse

santosh said:
Kelly said:
Kelly said:
#include<stdio.h>

#define LITTLE_ENDIAN 0
#define BIG_ENDIAN 1

int endian() {
int i = 1;
char *p = (char *)&i;

if (p[0] == 1)
return LITTLE_ENDIAN;
else
return BIG_ENDIAN;
}

..snip..

Just one more thing.What is the right way to convert a *signed* int
from one endianness to another ( more specifically from big-endian to
small or vice versa). How do i preserve the *sign* bit.
Swapping the bytes cannot be an option, unless i probably somehow
preserve the sign and treat the number as an unsigned int or am i
way-off again ?

You'll need to the manner in which signed values are represented on your
machine, whether twos-complement, sign-and-magnitude or
ones-complement, the three formats that C recognises. Such code will
not be portable.

Are you sure?

short int htohl(short int x)
{
short int r;
unsigned char *rp = (void *)&r, *xp = (void *)&x;
assert(sizeof x == 2);
rp[0] = xp[1];
rp[1] = xp[0];
return r;
}

Does this code not work on some systems? I can't see why not.
Obviously there are some systems on which the result is a trap
representation, but you are stuck in those cases anyway. This code
will even swap the bytes when x contains a trap representation (I
think).
 
R

Richard

Antoninus Twink said:
Rubbish. Suppose he wants to read a binary file produced by someone else
on a different machine. Then knowing the endianness of both machines is
crucial.

Heathfield said something similar a while back. I was shocked. Yes,
"system specific" etc but to try an claim "portable C" deals with endian
issues is garbage from what I've seen attempted in the real world.
 
K

Kenny McCormack

Heathfield said something similar a while back. I was shocked. Yes,
"system specific" etc but to try an claim "portable C" deals with endian
issues is garbage from what I've seen attempted in the real world.

Comment:
(I'm surprised no one had posted this yet - in this go around of this
classic thread)

Post:
Actually, the CLC dogma on this subject is that the only portable way
(and thus, the only that can be considered) is to write it out as ASCII
(oops, I think that might be a dirty word as well), er, I mean, "text",
and then read it back as text.
 
C

Chris Torek

Endianness isn't a problem, -2 is 11111111 11111110 on any two's complement
machine (in practical terms, on any machine), if your integers are 16 bits.
A little endian machine just swaps the two so the first byte becomes last in
memory.

The problem comes when trying to convert a 16-bit signed number to a 32 bit
signed number. You need to sign extend. This is easy enough to do in
principle using C, simply check the sign bit. If it is set, set your intial
integer to minus one, or all bits set. If it is clear, set to zero. Then
shift your bits in.
In practise it can be quite tricky to make sure you are shifting legally and
portably.

It is not that hard at all.

We had this entire discussion back in January of 2003:

<http://groups.google.com/group/comp...3ff9894d850f?hl=en&lnk=st&q=#cb3c3ff9894d850f>
 
K

Kelly B

Ben said:
...snip..


The most portable way is to treat the object as an array of unsigned
char and to re-order these bytes.


#include<stdio.h>

#define LITTLE_ENDIAN 0
#define BIG_ENDIAN 1

int endian() {
int i = 1;
char *p = (char *)&i;

if (p[0] == 1)
return LITTLE_ENDIAN;
else
return BIG_ENDIAN;
}


int reverseInt (int x) {
int r;
unsigned char *rp = (void *)&r, *xp = (void *)&x;

if (!endian()) { /*I am using a little endian system*/
rp[0] = xp[0];
rp[1] = xp[1];
rp[2] = xp[2];
rp[3] = xp[3];
} else {
rp[0] = xp[3];
rp[1] = xp[2];
rp[2] = xp[1];
rp[3] = xp[0];
}

return r;
}
int main(void)
{
int a = 5;

if(endian())
puts("Big Endian Machine");
else
puts("Small Endian Machine");

printf("%d",a >= 0 ? reverseInt(a) : reverseInt(-a) );
return 0;

}

I know this is naive but is it correct to handle a negative integer this
way ? I could not think of a situation if this might produce ugly
behavior! Inputs are welcome.

Thanks Again
 
B

Ben Bacarisse

Kelly B said:
Ben said:
..snip..


The most portable way is to treat the object as an array of unsigned
char and to re-order these bytes.


#include<stdio.h>

#define LITTLE_ENDIAN 0
#define BIG_ENDIAN 1

int endian() {
int i = 1;
char *p = (char *)&i;

if (p[0] == 1)
return LITTLE_ENDIAN;
else
return BIG_ENDIAN;
}


int reverseInt (int x) {
int r;
unsigned char *rp = (void *)&r, *xp = (void *)&x;

if (!endian()) { /*I am using a little endian system*/
rp[0] = xp[0];
rp[1] = xp[1];
rp[2] = xp[2];
rp[3] = xp[3];

If you don't want to change anything, r = x; is perfectly OK!
} else {
rp[0] = xp[3];
rp[1] = xp[2];
rp[2] = xp[1];
rp[3] = xp[0];
}

return r;
}
int main(void)
{
int a = 5;

if(endian())
puts("Big Endian Machine");
else
puts("Small Endian Machine");

printf("%d",a >= 0 ? reverseInt(a) : reverseInt(-a) );
return 0;

}

I know this is naive but is it correct to handle a negative integer
this way ? I could not think of a situation if this might produce ugly
behavior! Inputs are welcome.

No one can answer that question. I have no idea what you are trying
to do. Obviously if this program does what is needed, then it is
fine, but it looks entirely pointless to me. Did you understand that
the vast majority of program should never care about endianness? Why
do you think you need to?

It might help if you backed up a bit and explained what you are doing
that needs to consider the endianness of the processor, and why you
are not handling it using the usual methods (htonl etc for networking
code).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,526
Members
44,997
Latest member
mileyka

Latest Threads

Top