memcpy() and endianness

Kevin Bracey · May 11, 2004

Eric Sosman said:
Other formats are possible, of course, and permitted by the
C Standard. Also, the latest C99 Standard permits an `int' to
have "trap representations" somewhat like an IEEE signalling NaN:
some arrangements of bits may signify "erroneous data" rather than
encoding a numeric value. It's at least possible thet storing
these four bytes in an integer could produce such a result.

For what it's worth, I've never encountered a machine that
used trap representations in integers or that used an "endian"
arrangement other than the three listed above. YMMV.

I think trap representations for C99 _Bools are likely, at least. I suspect
that may have been one of the motivations for adding them. My implementation
has _Bool having the same representation as an unsigned char, with any
contents other than 0x00 or 0x01 being a trap representation.

Dan Pop · May 11, 2004

In said:
I think trap representations for C99 _Bools are likely, at least. I suspect
that may have been one of the motivations for adding them. My implementation
has _Bool having the same representation as an unsigned char, with any
contents other than 0x00 or 0x01 being a trap representation.

And what happens in your implementation when a trap representation is
evaluated?

Dan

James Kanze · May 11, 2004

|> > >>#include <string.h>

|> > >>int i; /* 4-byte == 4-char */
|> > >>char data[] = { 0x78, 0x56, 0x34, 0x12 };

|> > >>int main()
|> > >>{
|> > >> memcpy(&i, data, 4);

|> > >> /*
|> > >> * Thinking about endianness, what can be said about
|> > >> * the value of i according to the C-spec?
|> > >> */
|> > >>}

|> > >>/* Thanks for listening! Case */

|> > How many different values can i have given code above? With value
|> > I mean a number at C level, not implementation level.

|> In terms of existing implementations, probably about a dozen.
|> Usually numbers will be big- or little- endian and in two's
|> complement notation, so for practical purposes the answer is two.
|> However you could run into non-two's complement machines, machines
|> where there are 9 bits in a byte, and all sorts of other wonderful
|> variations.

Why be so exotic? I've used machines on which int was 16 bits, so the
memcpy becomes undefined behavior, and anything is possible.

Joe Wright · May 12, 2004

Sam said:
<sarcasm> Well, that's really going to clear up the OP's confusion.

In C, a byte is a unit of storage large enough to hold a char. By this
definition, similar to that used in the Standard, sizeof(char) == 1

The meaning that many people incorrectly associate with `byte' actually
belongs with `octet'; the latter just happens to be a common choice for
size of the former.

So, as byte is an octet is nybble a quartet?

Applying the sizeof operator directly to the `char' type is not harmful
but it is indicative of a grave misunderstanding of the meaning of byte
or character in C, and thus throws doubt on the correctness of all uses
of sizeof by that programmer.

I'm sorry. I just couldn't stop myself.

Christian Bau · May 12, 2004

Case <[email protected]> said:
Yes, you are correct. All I meant was: 'Assuming that my compiler sees
an int as a 4-byte entity and a char as a 1-byte entity, what is the
result of ...' BTW, why doesn't anyone question the sizeof char in
my example? Is char perhaps *silently* assumed to be a byte?

sizeof (char) is always equal to one. A char is a byte. However, a byte
is _not_ an octet; a byte could have more than eight bits. And there are
C compilers where a char is 32 bits, and sizeof (int) is one.

(Eight bits are called an octet. Whatever number of bits you need for a
char is called a byte. C requires that a byte has at least eight bits,
so a byte is at least as large as an octet, but never less).

Kevin Bracey · May 12, 2004

And what happens in your implementation when a trap representation is
evaluated?

Well, you'd get undefined behaviour, basically. Here's an example of what
could happen (off the top of my head):

_Bool b, b2, b3, b4;
unsigned char c;
int i;

struct { _Bool b:1 } s;

c = 2;
memcpy(&b, &c, 1);

i = b;
printf("i = %d\n", i);

s.b = b;
printf("s.b = %d\n", s.b);

b2 = b;
printf("b2 = %d\n", b2);

b3 = i;
printf("b3 = %d\n", b3);

b4 = !b;
printf("b4 = %d\n", b4);

This would output:

i = 2;
s.b = 0;
b2 = 2;
b3 = 1;
b4 = 3;

The knowledge that a bool "cannot" have a value other than 0 or 1 is used to
eliminate the "!= 0" test that would otherwise be inserted, as illustrated
by b2 and b3 there.

Internally, this is handled by having a hidden "boolean" attribute of a type
that indicates that its value is known to be 0 or 1. For example, the
expressions "!x" and "x && y" have type "boolean int".

Dan Pop · May 12, 2004

In said:
In message <[email protected]>

Well, you'd get undefined behaviour, basically.

There is no such thing where a *concrete* implementation is concerned.
Even if not documented, the behaviour is defined by the code generated
by the compiler (and if the compiler generates random garbage, by the
algorithm used to generate it and by the way the processor handles it).

Thanks for the example: it's a reasonable optimisation.

Dan

Adding adressing of IPv6 to program	1	Feb 16, 2023
Endianness macros	48	Aug 23, 2013
Portably determine endianness?	8	Jul 1, 2007
[memcpy] dst=NULL,size=0	9	Mar 3, 2009
gcc inline memcpy	7	Jul 12, 2012
Is memcpy with len=0 a NOP?	16	Jan 24, 2011
bitwise operator and endianness	5	Nov 5, 2007
Use of memcpy() to transfer from memory to a variable	25	May 18, 2007

memcpy() and endianness

Kevin Bracey

Dan Pop

James Kanze

Joe Wright

Christian Bau

Kevin Bracey

Dan Pop

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads