Byte ordering and array access

  • Thread starter Benjamin M. Stocks
  • Start date
B

Benjamin M. Stocks

Hello all,
I've heard differing opinions on this and would like a definitive
answer on this once and for all. If I have an array of 4 1-byte values
where index 0 is the least signficant byte of a 4-byte value. Can I use
the arithmatic shift operators to hide the endian-ness of the
underlying processor when assembling a native 4-byte value like
follows:

unsigned int integerValue;
unsigned char byteArray[4];

/* byteArray is populated elsewhere, least signficant byte in index 0,
guaranteed */

integerValue = (unsigned int)byteArray[0] |
((unsigned int)byteArray[1] << 8) |
((unsigned int)byteArray[2] << 16) |
((unsigned int)byteArray[3] << 24);

So if byteArray[0] was 0x78, byteArray[1] was 0x56, byteArray[2] was
0x34 and byteArray[3] was 0x12 then would integerValue be 0x12345678 no
matter the endian-ness of the processor?

Thanks in advance,

Ben
 
V

Vladimir S. Oka

Benjamin said:
Hello all,
I've heard differing opinions on this and would like a definitive
answer on this once and for all. If I have an array of 4 1-byte values
where index 0 is the least signficant byte of a 4-byte value. Can I use
the arithmatic shift operators to hide the endian-ness of the
underlying processor when assembling a native 4-byte value like
follows:

unsigned int integerValue;
unsigned char byteArray[4];

/* byteArray is populated elsewhere, least signficant byte in index 0,
guaranteed */

integerValue = (unsigned int)byteArray[0] |
((unsigned int)byteArray[1] << 8) |
((unsigned int)byteArray[2] << 16) |
((unsigned int)byteArray[3] << 24);

So if byteArray[0] was 0x78, byteArray[1] was 0x56, byteArray[2] was
0x34 and byteArray[3] was 0x12 then would integerValue be 0x12345678 no
matter the endian-ness of the processor?

Yes, you can do this. Standard states that shift operations are defined
in terms of the /value/ of the thing you're shifting, not it's bit
representation. You can think of shifts in terms of multiplication and
division, if you will.
 
E

Eric Sosman

Benjamin M. Stocks wrote On 02/08/06 10:39,:
Hello all,
I've heard differing opinions on this and would like a definitive
answer on this once and for all. If I have an array of 4 1-byte values
where index 0 is the least signficant byte of a 4-byte value. Can I use
the arithmatic shift operators to hide the endian-ness of the
underlying processor when assembling a native 4-byte value like
follows:

unsigned int integerValue;
unsigned char byteArray[4];

/* byteArray is populated elsewhere, least signficant byte in index 0,
guaranteed */

integerValue = (unsigned int)byteArray[0] |
((unsigned int)byteArray[1] << 8) |
((unsigned int)byteArray[2] << 16) |
((unsigned int)byteArray[3] << 24);

So if byteArray[0] was 0x78, byteArray[1] was 0x56, byteArray[2] was
0x34 and byteArray[3] was 0x12 then would integerValue be 0x12345678 no
matter the endian-ness of the processor?

Yes, this is fine. Well, almost: I can see two
potential portability problems:

- C guarantees that a char has at least eight bits,
but permits it to have more. Depending on just
what you mean by "1-byte values," you might want
to replace 8,16,24 by CHAR_BIT, 2*CHAR_BIT, and
3*CHAR_BIT (the CHAR_BIT macro is in <limits.h>).

- C guarantees that an int has at least sixteen bits,
but does not promise that it has as many as 32
(or 4*CHAR_BIT). That is, an int may be too narrow
to hold four bytes. This could mess things up in
two ways: first, you obviously use a two-pound sack
to hold four pounds of ... well, better unsaid.
Second, shifts are only guaranteed to work if the
shift distance is strictly less than the number of
bits in the value shifted, so if int is sixteen bits
wide both the 16- and 24-bit shifts are undefined.
 
R

Rod Pemberton

Benjamin M. Stocks said:
Hello all,

Ignoring your question here:
unsigned int integerValue;
unsigned char byteArray[4];

/* byteArray is populated elsewhere, least signficant byte in index 0,
guaranteed */

integerValue = (unsigned int)byteArray[0] |
((unsigned int)byteArray[1] << 8) |
((unsigned int)byteArray[2] << 16) |
((unsigned int)byteArray[3] << 24);

You'll probably need or want to declare 'integerValue' as a more definitive
type: 'unsigned long' or 'uint32_t'.


Rod Pemberton
 
C

CBFalconer

Benjamin M. Stocks said:
I've heard differing opinions on this and would like a definitive
answer on this once and for all. If I have an array of 4 1-byte values
where index 0 is the least signficant byte of a 4-byte value. Can I use
the arithmatic shift operators to hide the endian-ness of the
underlying processor when assembling a native 4-byte value like
follows:

unsigned int integerValue;
unsigned char byteArray[4];

/* byteArray is populated elsewhere, least signficant byte in index 0,
guaranteed */

integerValue = (unsigned int)byteArray[0] |
((unsigned int)byteArray[1] << 8) |
((unsigned int)byteArray[2] << 16) |
((unsigned int)byteArray[3] << 24);

So if byteArray[0] was 0x78, byteArray[1] was 0x56, byteArray[2] was
0x34 and byteArray[3] was 0x12 then would integerValue be 0x12345678 no
matter the endian-ness of the processor?

That is basically the correct way. However you should allow for
the values of CHAR_BIT and sizeof(int). If we assume that your
input data is always in units of 8 bits (i.e. if CHAR_BIT is
greater than 8, you just don't use the extra bit(s)), you could
use:

#define ASIZE sizeof(int);

unsigned int uival;
unsigned char bytes[ASIZE];
int i;
...
for (uival = 0, i = 0; i < ASIZE; i++) {
uival = 256 * uival + (bytes & 0xff);
}

The use of multiplication and unsigned values avoids any unwanted
overflow behaviour, and the 0xff mask ensures only 8 bits are used
per byte. Now the objective of endianess independence is achieved
safely.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>
 
S

stathis gotsis

Benjamin M. Stocks said:
Hello all,
I've heard differing opinions on this and would like a definitive
answer on this once and for all. If I have an array of 4 1-byte values
where index 0 is the least signficant byte of a 4-byte value. Can I use
the arithmatic shift operators to hide the endian-ness of the
underlying processor when assembling a native 4-byte value like
follows:

unsigned int integerValue;
unsigned char byteArray[4];

/* byteArray is populated elsewhere, least signficant byte in index 0,
guaranteed */

integerValue = (unsigned int)byteArray[0] |
((unsigned int)byteArray[1] << 8) |
((unsigned int)byteArray[2] << 16) |
((unsigned int)byteArray[3] << 24);

So if byteArray[0] was 0x78, byteArray[1] was 0x56, byteArray[2] was
0x34 and byteArray[3] was 0x12 then would integerValue be 0x12345678 no
matter the endian-ness of the processor?

May i ask a question on this? Can the endian-ness of the processor affect
the "<<" shifting direction? From the replies i assume it does. I need an
example where this operator shifts to the right.
 
V

Vladimir S. Oka

stathis said:
Benjamin M. Stocks said:
Hello all,
I've heard differing opinions on this and would like a definitive
answer on this once and for all. If I have an array of 4 1-byte
values where index 0 is the least signficant byte of a 4-byte value.
Can I use the arithmatic shift operators to hide the endian-ness of
the underlying processor when assembling a native 4-byte value like
follows:

unsigned int integerValue;
unsigned char byteArray[4];

/* byteArray is populated elsewhere, least signficant byte in index
0, guaranteed */

integerValue = (unsigned int)byteArray[0] |
((unsigned int)byteArray[1] << 8) |
((unsigned int)byteArray[2] << 16) |
((unsigned int)byteArray[3] << 24);

So if byteArray[0] was 0x78, byteArray[1] was 0x56, byteArray[2] was
0x34 and byteArray[3] was 0x12 then would integerValue be 0x12345678
no matter the endian-ness of the processor?

May i ask a question on this? Can the endian-ness of the processor
affect the "<<" shifting direction? From the replies i assume it does.
I need an example where this operator shifts to the right.

No, it does not affect shift "direction". It may help if you think of
shifts as repeated integer divisions/multiplications by 2 (that's how
Standard defines them -- they work on /values/ not representations).
Endianness only affect how values are stored in memory (their bit
representation, if you will). IOW, before performing the shift, C
program reads operand's representation, figures out the /value/,
performs shifting (i.e. division/multiplication), and if required
converts value back to representation, and stores it back.

I don't think you can construct the representation/endinanness
combination that will "reverse" shifts. Or, I'm not at my creative best
at the moment (a distinct possibility -- it's Friday evening, I should
be in a pub).
 
P

pete

Benjamin said:
Hello all,
I've heard differing opinions on this and would like a definitive
answer on this once and for all. If I have an array of 4 1-byte values
where index 0 is the least signficant byte of a 4-byte value.
Can I use
the arithmatic shift operators to hide the endian-ness of the
underlying processor when assembling a native 4-byte value like
follows:

unsigned int integerValue;
unsigned char byteArray[4];

/* byteArray is populated elsewhere, least signficant byte in index 0,
guaranteed */

integerValue = (unsigned int)byteArray[0] |
((unsigned int)byteArray[1] << 8) |
((unsigned int)byteArray[2] << 16) |
((unsigned int)byteArray[3] << 24);

So if byteArray[0] was 0x78, byteArray[1] was 0x56, byteArray[2] was
0x34 and byteArray[3] was 0x12 then would integerValue
be 0x12345678 no
matter the endian-ness of the processor?

Yes, as long as INT_MAX is large enough.
You could also do it this way:

integerValue = byteArray[0]
+ byteArray[1] * 0x100LU
+ byteArray[2] * 0x10000LU
+ byteArray[3] * 0x1000000LU;
 
S

stathis gotsis

Vladimir S. Oka said:
No, it does not affect shift "direction". It may help if you think of
shifts as repeated integer divisions/multiplications by 2 (that's how
Standard defines them -- they work on /values/ not representations).
Endianness only affect how values are stored in memory (their bit
representation, if you will). IOW, before performing the shift, C
program reads operand's representation, figures out the /value/,
performs shifting (i.e. division/multiplication), and if required
converts value back to representation, and stores it back.

I don't think you can construct the representation/endinanness
combination that will "reverse" shifts. Or, I'm not at my creative best
at the moment (a distinct possibility -- it's Friday evening, I should
be in a pub).

Well, yes there is a clear distinction between values and representations,
so my question was pointless anyway. But suppose we have 2-byte integers and
let int a=0xABCD. In one representation that could be: ABCD and in another:
DCBA. In terms of representations, one could say that in the first one "<<"
operator shifts left and in the second to the right. But that cannot happen
in the real world right?

Furthermore, let's take the OP. The program needs to evaluate the following
expression: ((unsigned int)byteArray[1] << 8). I assume that means that
byteArray[1] should move to the place of the next most significant byte in
the unsigned int word. So that shifting could be "left" or "right" depending
on endianness.
 
P

pete

stathis said:
But suppose we have 2-byte integers and
let int a=0xABCD. In one representation that could be: ABCD
and in another: DCBA.

Leftness and Rightness has to do with significance,
as in: The least significant byte is the right most.
It has nothing to do with addresses of the bytes.

Assuming CHAR_BIT equals 8 and
assuming that by DCBA, you mean
the lower byte will have value 0xDC and
the higher byte will have value 0xBA:
that's wrong.

One byte will have value 0xCD
and the other will have 0xAB.
 
J

Jordan Abel

Leftness and Rightness has to do with significance,
as in: The least significant byte is the right most.
It has nothing to do with addresses of the bytes.

Assuming CHAR_BIT equals 8 and
assuming that by DCBA, you mean
the lower byte will have value 0xDC and
the higher byte will have value 0xBA:
that's wrong.

One byte will have value 0xCD
and the other will have 0xAB.

I don't believe this is actually guaranteed. Though, processors which
play silly games with the endian-ness of units other than bytes are rare
indeed [i think i once owned a graphing calulator that did that]
 
P

pete

Jordan said:
I don't believe this is actually guaranteed.

It is.
Though, processors which
play silly games with the endian-ness of units
other than bytes are rare
indeed [i think i once owned a graphing calulator that did that]

Regardless of what the processor actually does,
in a C program,
it has to make objects look like bytes of bits.

In a C program you can examine any byte of any object,
except register class objects,
as a object of unsigned char.
 
J

Jordan Abel

It is.
Where?

Regardless of what the processor actually does, in a C program, it has
to make objects look like bytes of bits.

Yeah, but the values of the bytes are not defined, except that you can
copy them out and back into another object of the same type.
In a C program you can examine any byte of any object,
except register class objects,
as a object of unsigned char.

Sure, but it doesn't guarantee _what_ values there are. There's no
reason 0xABCD couldn't be three bytes: 0x5 0x2D 0x8D. Padding bits and
such. Nothing is guaranteed about the physical order in which bits are
interpreted in an integer vs in an unsigned char
 
S

stathis gotsis

pete said:
Leftness and Rightness has to do with significance,
as in: The least significant byte is the right most.
It has nothing to do with addresses of the bytes.

Assuming that more significant bytes are in the "left" of less significant
ones i come to the conclusion that the "<<" operator always shifts to the
"left". I think it is a matter of convention anyway.
Assuming CHAR_BIT equals 8 and
assuming that by DCBA, you mean
the lower byte will have value 0xDC and
the higher byte will have value 0xBA:
that's wrong.

One byte will have value 0xCD
and the other will have 0xAB.

Well, yes that is the common case. I will take your word on the
non-existence of the opposite.
 
P

pete

Jordan said:
Nothing is guaranteed about the physical order in which bits are
interpreted in an integer vs in an unsigned char

You think a two byte int object with a value of 3,
might have one bit set in each byte?

Maybe, I don't know.
 
S

stathis gotsis

Vladimir S. Oka said:
Yes, you can do this. Standard states that shift operations are defined
in terms of the /value/ of the thing you're shifting, not it's bit
representation. You can think of shifts in terms of multiplication and
division, if you will.

Does that apply for negative (signed) operands as well? I think K&R2 says
that is implementation specific, shifting negatives can be implemented
logically (sticking to the bit representation) or arithmetically (sticking
to real value). Please comment on that.
 
V

Vladimir S. Oka

stathis said:
Does that apply for negative (signed) operands as well? I think K&R2
says that is implementation specific, shifting negatives can be
implemented logically (sticking to the bit representation) or
arithmetically (sticking to real value). Please comment on that.

In short, Standard specifies that, if the right operand is negative (or
= width of the left operand), you get Undefined Behaviour (6.5.7.3).
If the left operand is negative, for left shift (<<) you get U.B.
(6.5.7.4), but for right shift (>>) result is implementation defined
(6.5.7.5).

I think full C&V should answer all you dilemmas (note how Standard talks
both of bits /and/ arithmetic operations):

6.5.7 Bitwise shift operators

6.5.7.2 Constraints
Each of the operands shall have integer type.

6.5.7.3 Semantics
The integer promotions are performed on each of the operands. The type
of the result is that of the promoted left operand. If the value of the
right operand is negative or is greater than or equal to the width of
the promoted left operand, the behavior is undefined.

6.5.7.4
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits
are filled with zeros. If E1 has an unsigned type, the value of the
result is E1x2^E2, reduced modulo one more than the maximum value
representable in the result type. If E1 has a signed type and
nonnegative value, and E1x2^E2 is representable in the result type,
then that is the resulting value; otherwise, the behavior is undefined.

6.5.7.5
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has
an unsigned type or if E1 has a signed type and a nonnegative value,
the value of the result is the integral part of the quotient of E1
divided by the quantity, 2 raised to the power E2. If E1 has a signed
type and a negative value, the resulting value is
implementation-defined.

--
BR, Vladimir

If you've done six impossible things before breakfast, why not round it
off with dinner at Milliway's, the restaurant at the end of the
universe.
 
C

Chris Torek

Well, yes there is a clear distinction between values and representations,
so my [original] question was pointless anyway.

Indeed. :)
But suppose we have 2-byte integers [and standard 8-bit bytes]
and let int a=0xABCD. In one representation that could be: ABCD and
in another: DCBA.

This is getting close to the heart of the issue (with "endiannness"
being "the issue" in question).

"Endianness" is an artifact that arises when some entity takes a
whole -- such as the value 0xABCD -- and splits it into parts.
Here, you have allowed someone(s) to split it into two parts, "AB"
and "CD", and then scatter those two parts about your room, where
the cat can subsequently gnaw on them.

The question you should ask yourself is: who is this entity that
is splitting up your whole, and why are you giving him, her, or it
permission to do so? What will he/she/it do with the pieces? Who
or what will re-assemble them later, and will all the various
entities doing this splitting-up and re-assembling cooperate?

If *you* do the splitting-up yourself:

unsigned char split[4];
unsigned long value;

split[0] = (value >> 24) & 0xff;
split[1] = (value >> 16) & 0xff;
split[2] = (value >> 8) & 0xff;
split[3] = value & 0xff;

and *you* do the re-assembling later:

value = (unsigned long)split[0] << 24;
value |= (unsigned long)split[1] << 16;
value |= (unsigned long)split[2] << 8;
value |= (unsigned long)split[3];

will you co-operate with yourself? Will that guarantee that you
get the proper value back?
In terms of representations, one could say ...

In the Olden Daze, computer memory was stored in little magnetic
donuts called "cores" (see <http://en.wikipedia.org/wiki/Core_memory>).
You could actually point to the individual donuts holding each
individual bit in memory. Depending on the architecture (core
memory was often stored in "planes" for speed), it is quite reasonable
to expect that each bit of a single word would be stored in a
different circuit board in the computer. If you had an 18 or 36
bit word (those being common word sizes at the time), any given
value was stored in 18 or 36 different locations, none particularly
being "left" or "right" hand sided.

Even today, the actual bit layout on any given DRAM card "stick"
may be spread out, so that the chips holding your values may not
be particularly sort-able into "left" and "right" (they may be
mixed together, and/or "up" and "down"). You never notice because
you are unable -- at least without a logic probe -- to observe the
bits being split up and reassembled. A single entity (the memory
controller on the particular card) is responsible for the splitting-up
and re-assembling, and it always cooperates with itself.
 
S

stathis gotsis

Chris Torek said:
Well, yes there is a clear distinction between values and representations,
so my [original] question was pointless anyway.

Indeed. :)

Thank you for taking the time to clarify these issues.
But suppose we have 2-byte integers [and standard 8-bit bytes]
and let int a=0xABCD. In one representation that could be: ABCD and
in another: DCBA.

This is getting close to the heart of the issue (with "endiannness"
being "the issue" in question).

"Endianness" is an artifact that arises when some entity takes a
whole -- such as the value 0xABCD -- and splits it into parts.
Here, you have allowed someone(s) to split it into two parts, "AB"
and "CD", and then scatter those two parts about your room, where
the cat can subsequently gnaw on them.

Well, in my previous example i allowed someone to split the 2-byte whole,
into 4-bit entities. I was wondering if that can happen in real-life
systems. Is the byte the atom (the smallest entity that cannot be further
split) in the context of endianness? If it is, then my example resides in
the field of imagination.
The question you should ask yourself is: who is this entity that
is splitting up your whole, and why are you giving him, her, or it
permission to do so? What will he/she/it do with the pieces? Who
or what will re-assemble them later, and will all the various
entities doing this splitting-up and re-assembling cooperate?

If *you* do the splitting-up yourself:

unsigned char split[4];
unsigned long value;

split[0] = (value >> 24) & 0xff;
split[1] = (value >> 16) & 0xff;
split[2] = (value >> 8) & 0xff;
split[3] = value & 0xff;

and *you* do the re-assembling later:

value = (unsigned long)split[0] << 24;
value |= (unsigned long)split[1] << 16;
value |= (unsigned long)split[2] << 8;
value |= (unsigned long)split[3];

will you co-operate with yourself? Will that guarantee that you
get the proper value back?

I think i will get the proper value back. In this example, C will hide any
implementation specific representation. I think with this expression:
(value >> n*8) & 0xff;
we get the (n+1) most significant byte, regardless of endianness. Is that
true? Can you show me of an example where C reveals endianness?
In the Olden Daze, computer memory was stored in little magnetic
donuts called "cores" (see <http://en.wikipedia.org/wiki/Core_memory>).
You could actually point to the individual donuts holding each
individual bit in memory. Depending on the architecture (core
memory was often stored in "planes" for speed), it is quite reasonable
to expect that each bit of a single word would be stored in a
different circuit board in the computer. If you had an 18 or 36
bit word (those being common word sizes at the time), any given
value was stored in 18 or 36 different locations, none particularly
being "left" or "right" hand sided.
Even today, the actual bit layout on any given DRAM card "stick"
may be spread out, so that the chips holding your values may not
be particularly sort-able into "left" and "right" (they may be
mixed together, and/or "up" and "down"). You never notice because
you are unable -- at least without a logic probe -- to observe the
bits being split up and reassembled. A single entity (the memory
controller on the particular card) is responsible for the splitting-up
and re-assembling, and it always cooperates with itself.

That was food for thought but i think you went too low-level. Yes, memory
hides all internal implementation details, collects the 8-bits of a byte,
which maybe scattered on the chip, and gives the byte. I believe that the
real question is whether we can access pieces of data smaller than bytes in
a real memory? If we cannot then all possible processor-specific
endianness-es are the ways we can put two or more bytes in some memory
piece.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,015
Latest member
AmbrosePal

Latest Threads

Top