Endianess: why does this code not change value on BE machine?

O

Oliver Knoll

Ok,

I've searched this group for Big/Little endian issues, don't kill me,
I know endianess issues have been discussed a 1000 times. But my
question is a bit different:

I've seen the follwing function several times, it converts data stored
in Big Endian (BE) format into host native format (LE on LE machines,
BE on BE machines):

/* - this code swaps the bytes on a Little Endian Machine
- this code returns 'data' unmodified on a Big Endian Machine.
*/
static short getShortBE (char *data)
{
return (short) ((data[0] << 8) | data[1]);
}

I can understand why this function swaps bytes on a LE machine, but
why doesn't it alter 'data' on a BE machine? I've tried to understand
it with diagrams and everything, but my brain went just crazy! Can
anyone give me a simple explanation please? :) I'm really curious...

And how would a function look like which does the "opposite": swap
bytes on a BE machine, don't change values on a LE machine (that is,
"read and convert LE data")? :) Hope I can answer the 2nd quesion
myself when I understand the 1st question... :)


Here's the complete test program:

#include <stdlib.h>
#include <stdio.h>

/* This function is useful when reading and converting data which is
stored
in Big Endian Format.
- this code swaps the bytes on a Little Endian Machine
- this code returns 'data' unmodified on a Big Endian Machine.
*/
static short getShortBE (char *data)
{
return (short) ((data[0] << 8) | data[1]);
}

int main (int argc, char **argv)
{
int a;
char c;
short data1;
short data2;
char *d;

(void)argc;
(void)argv;

a = 0x01020304;
c = ((char *)&a)[0];

if (c == 1)
{
fprintf (stdout, "Integer a: %x - first byte: %x (MSB) -> Big
endian machine.\n", a, c);
}
else if (c == 4)
{
fprintf (stdout, "Integer a: %x - first byte: %x (LSB) -> Little
endian machine.\n", a, c);
}
else
{
fprintf (stdout, "Integer a: %x - first byte: %x -> A weirdo
machine.\n", a, c);
}

data1 = 0x0102;
d = ((char *)&data1);

fprintf (stdout, "Data[0]: %x - Data[1]: %x\n",
*d, *(d + 1));

data2 = getShortBE ((char *)&data1);

d = ((char *)&data2);
fprintf (stdout, "After GET: Data[0]: %x - Data[1]: %x\n",
*d, *(d + 1));

return 1;
}

Output:
-------

Integer a: 1020304 - first byte: 1 (MSB) -> Big endian machine.
Data[0]: 1 - Data[1]: 2
After GET: Data[0]: 1 - Data[1]: 2

On Linux (i386):

Integer a: 1020304 - first byte: 4 (LSB) -> Little endian machine.
Data[0]: 2 - Data[1]: 1
After GET: Data[0]: 1 - Data[1]: 2



Thanks, Oliver
 
M

Malcolm

Oliver Knoll said:
/* - this code swaps the bytes on a Little Endian Machine
- this code returns 'data' unmodified on a Big Endian Machine.
*/
static short getShortBE (char *data)
{
return (short) ((data[0] << 8) | data[1]);
}

I can understand why this function swaps bytes on a LE machine, but
why doesn't it alter 'data' on a BE machine? I've tried to understand
it with diagrams and everything, but my brain went just crazy! Can
anyone give me a simple explanation please? :) I'm really curious...
Firstly arbitrary data should be unsigned char. Plain char is for actual
text.

Don't be fooled by the << operator. This suggests that data is being shifted
"leftwards" in memory, but in fact it always moves less significant bits to
the more significant position.
Your function therefore takes an arbitrary stream of bytes, and treats the
first one as the top eight bits and the second one as the lower eight bits
of a 16-bit number.
Incidentally it will not work as expected if CHAR_BIT is not eight, which it
isn not always. short isn't necessarily sixteen bits, either.
And how would a function look like which does the "opposite": swap
bytes on a BE machine, don't change values on a LE machine (that is,
"read and convert LE data")? :) Hope I can answer the 2nd quesion
myself when I understand the 1st question... :)
So you can treat your arbitrary bit stream as little endian simply by
placing the first byte in the least-significant position, and shifting up
the second byte to the most significant position.
 
R

Richard Tobin

Oliver Knoll said:
static short getShortBE (char *data)
{
return (short) ((data[0] << 8) | data[1]);
}

I can understand why this function swaps bytes on a LE machine, but
why doesn't it alter 'data' on a BE machine? I've tried to understand
it with diagrams and everything, but my brain went just crazy! Can
anyone give me a simple explanation please? :) I'm really curious...

Well, one way to see it is that this function doesn't depend on
endianness at all, so it will give the same result on a big- or
little-endian machine. But just accesing the data as a short will
give different results on the two.

-- Richard
 
O

Old Wolf

/* - this code swaps the bytes on a Little Endian Machine
- this code returns 'data' unmodified on a Big Endian Machine.
*/
static short getShortBE (char *data)
{
return (short)
^^^^^^ useless cast
((data[0] << 8) | data[1]);

Implementation-defined behaviour, due to sign-extension. You should
use unsigned data types for bit manipuation, eg:

short getShortBE(void *data)
{
unsigned char *p = data;
return ( p[0] << CHAR_BIT ) | p[1] ;
I can understand why this function swaps bytes on a LE machine, but
why doesn't it alter 'data' on a BE machine?

Explain in your own words why it works on an LE machine, and
then it should be obvious why it does nothing on BE.
And how would a function look like which does the "opposite": swap
bytes on a BE machine, don't change values on a LE machine (that is,
"read and convert LE data")?

Switch the '0' with the '1'.
 
R

RoSsIaCrIiLoIA

Ok,

I've searched this group for Big/Little endian issues, don't kill me,
I know endianess issues have been discussed a 1000 times. But my
question is a bit different:

I've seen the follwing function several times,
where?

it converts data stored
in Big Endian (BE) format into host native format (LE on LE machines,
BE on BE machines):

it is wrong because if a char "x" is 8 bit then x<<8 == 0
/* - this code swaps the bytes on a Little Endian Machine
- this code returns 'data' unmodified on a Big Endian Machine.
*/
static short getShortBE (char *data)
{
return (short) ((data[0] << 8) | data[1]);
}
#include <limits.h>

/* Suppone 2*sizeof(char)=sizeof(short) */
unsigned short GetShort(unsigned char *data)
{unsigned short u=data[0];
return (u<<CHAR_BIT)| data[1];
}
 
A

Arthur J. O'Dwyer

it is wrong because if a char "x" is 8 bit then x<<8 == 0

No, it's not. Usual arithmetic promotions apply. Stop spreading
misinformation; that's bad.

-Arthur
 
O

Oliver Knoll

Malcolm said:
Oliver Knoll said:
/* - this code swaps the bytes on a Little Endian Machine
- this code returns 'data' unmodified on a Big Endian Machine.
*/
static short getShortBE (char *data)
{
return (short) ((data[0] << 8) | data[1]);
}

I can understand why this function swaps bytes on a LE machine, but
why doesn't it alter 'data' on a BE machine? I've tried to understand
,,,
Firstly arbitrary data should be unsigned char. Plain char is for actual
text.

Thanks :) Good point. (In the real world I'm using such ugly stuff
like Q_UINT16 and the like though - I guess I've never seen a library
(Qt in this case) which doesn't define their own "datatypes" to ensure
correct byte-siyes ;)
Don't be fooled by the << operator. This suggests that data is being shifted
"leftwards" in memory, but in fact it always moves less significant bits to
the more significant position.

Ahh, that's exactly the explanation I was looking for, my brain was
stuck with this "bits go to the left". It makes perfectly sense now,
thanks a lot!
...
Incidentally it will not work as expected if CHAR_BIT is not eight, which it
isn not always. short isn't necessarily sixteen bits, either.

I've taken char and short and was naively assuming them to be 8 and 16
bit for illustration purposes.

Thanks, Oliver
 
R

RoSsIaCrIiLoIA

it is wrong because if a char "x" is 8 bit then x<<8 == 0

No, it's not. Usual arithmetic promotions apply. Stop spreading
misinformation; that's bad.
static short getShortBE (char *data)
{
return (short) ((data[0] << 8) | data[1]);
}

where is the "arithmetic promotion"? I don't know how it can be ok
data[0] is a char and do << 8 (it seems an error to me)
then or data[1]; then the promotion to short
[if it was return (short) (data[0]<<8) | data[1]; you are right]
but (short) ((data[0]<<8) | data[1])
seems to me if *data is 8 bit = (short) (data[1])
 
R

RoSsIaCrIiLoIA

it is wrong because if a char "x" is 8 bit then x<<8 == 0

No, it's not. Usual arithmetic promotions apply. Stop spreading
misinformation; that's bad.
static short getShortBE (char *data)
{
return (short) ((data[0] << 8) | data[1]);
}

where is the "arithmetic promotion"? I don't know how it can be ok
data[0] is a char and do << 8 (it seems an error to me)
then or data[1]; then the promotion to short
[if it was return (short) (data[0]<<8) | data[1]; you are right]
but (short) ((data[0]<<8) | data[1])
seems to me if *data is 8 bit = (short) (data[1])
 
R

RoSsIaCrIiLoIA

it is wrong because if a char "x" is 8 bit then x<<8 == 0

No, it's not. Usual arithmetic promotions apply. Stop spreading
misinformation; that's bad.
static short getShortBE (char *data)
{
return (short) ((data[0] << 8) | data[1]);
}

where is the "arithmetic promotion"? I don't know how it can be ok
data[0] is a char and do << 8 (it seems an error to me)
then or data[1]; then the promotion to short
[if it was return (short) (data[0]<<8) | data[1]; you are right]
but (short) ((data[0]<<8) | data[1])
seems to me if *data is 8 bit = (short) (data[1])
 
A

Andrey Tarasevich

Oliver said:
...
/* - this code swaps the bytes on a Little Endian Machine
- this code returns 'data' unmodified on a Big Endian Machine.
*/
static short getShortBE (char *data)
{
return (short) ((data[0] << 8) | data[1]);
}

I can understand why this function swaps bytes on a LE machine, but
why doesn't it alter 'data' on a BE machine? I've tried to understand
it with diagrams and everything, but my brain went just crazy! Can
anyone give me a simple explanation please? :) I'm really curious...
...

I think it is rather obvious. Assuming that 'data' passed to this
function actually points to a two-byte integral value (say, 'short' in
your implementation, and CHAR_BITS is 8), on a LE machine 'data[0]' is a
low-order byte of that value and 'data[1]' is a high-order byte. Doing

(data[0] << 8) | data[1]

will indeed swap these bytes, i.e. move the former low-order byte into
the high-order position and vice versa.

On a BE machine 'data[0]' is a high-order byte and 'data[1]' is a
low-order byte of a two-byte integral value. Doing

(data[0] << 8) | data[1]

will simply re-construct the original value, i.e. high-order byte is
moved to high-order position and low-order byte is placed in low-order
position. The value remains unchanged.
 
A

Andrey Tarasevich

RoSsIaCrIiLoIA said:
...

it is wrong because if a char "x" is 8 bit then x<<8 == 0
...

No. In C language operands of << operators are first subjected to
integer promotions. In this case a value of type 'char' will be promoted
to a value of type 'int'. The << operator will be applied to an 'int'
operand, not to a 'char' operand. That's why what you are saying is
incorrect.
 
R

RoSsIaCrIiLoIA

No. In C language operands of << operators are first subjected to
integer promotions. In this case a value of type 'char' will be promoted
to a value of type 'int'. The << operator will be applied to an 'int'
operand, not to a 'char' operand. That's why what you are saying is
incorrect.

Ok
unsigned short getShortBE (unsigned char* data)
{return (data[0] << (unsigned short) 8) | data[1];}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top