memcpy() and endianness

K

Kevin Bracey

In message <[email protected]>
Eric Sosman said:
Other formats are possible, of course, and permitted by the
C Standard. Also, the latest C99 Standard permits an `int' to
have "trap representations" somewhat like an IEEE signalling NaN:
some arrangements of bits may signify "erroneous data" rather than
encoding a numeric value. It's at least possible thet storing
these four bytes in an integer could produce such a result.

For what it's worth, I've never encountered a machine that
used trap representations in integers or that used an "endian"
arrangement other than the three listed above. YMMV.

I think trap representations for C99 _Bools are likely, at least. I suspect
that may have been one of the motivations for adding them. My implementation
has _Bool having the same representation as an unsigned char, with any
contents other than 0x00 or 0x01 being a trap representation.
 
D

Dan Pop

In said:
I think trap representations for C99 _Bools are likely, at least. I suspect
that may have been one of the motivations for adding them. My implementation
has _Bool having the same representation as an unsigned char, with any
contents other than 0x00 or 0x01 being a trap representation.

And what happens in your implementation when a trap representation is
evaluated?

Dan
 
J

James Kanze

|> > >>#include <string.h>

|> > >>int i; /* 4-byte == 4-char */
|> > >>char data[] = { 0x78, 0x56, 0x34, 0x12 };

|> > >>int main()
|> > >>{
|> > >> memcpy(&i, data, 4);

|> > >> /*
|> > >> * Thinking about endianness, what can be said about
|> > >> * the value of i according to the C-spec?
|> > >> */
|> > >>}

|> > >>/* Thanks for listening! Case */

|> > How many different values can i have given code above? With value
|> > I mean a number at C level, not implementation level.

|> In terms of existing implementations, probably about a dozen.
|> Usually numbers will be big- or little- endian and in two's
|> complement notation, so for practical purposes the answer is two.
|> However you could run into non-two's complement machines, machines
|> where there are 9 bits in a byte, and all sorts of other wonderful
|> variations.

Why be so exotic? I've used machines on which int was 16 bits, so the
memcpy becomes undefined behavior, and anything is possible.
 
J

Joe Wright

Sam said:
<sarcasm> Well, that's really going to clear up the OP's confusion.

In C, a byte is a unit of storage large enough to hold a char. By this
definition, similar to that used in the Standard, sizeof(char) == 1

The meaning that many people incorrectly associate with `byte' actually
belongs with `octet'; the latter just happens to be a common choice for
size of the former.
So, as byte is an octet is nybble a quartet?
Applying the sizeof operator directly to the `char' type is not harmful
but it is indicative of a grave misunderstanding of the meaning of byte
or character in C, and thus throws doubt on the correctness of all uses
of sizeof by that programmer.
I'm sorry. I just couldn't stop myself. :)
 
C

Christian Bau

Case <[email protected]> said:
Yes, you are correct. All I meant was: 'Assuming that my compiler sees
an int as a 4-byte entity and a char as a 1-byte entity, what is the
result of ...' BTW, why doesn't anyone question the sizeof char in
my example? Is char perhaps *silently* assumed to be a byte?

sizeof (char) is always equal to one. A char is a byte. However, a byte
is _not_ an octet; a byte could have more than eight bits. And there are
C compilers where a char is 32 bits, and sizeof (int) is one.

(Eight bits are called an octet. Whatever number of bits you need for a
char is called a byte. C requires that a byte has at least eight bits,
so a byte is at least as large as an octet, but never less).
 
K

Kevin Bracey

And what happens in your implementation when a trap representation is
evaluated?

Well, you'd get undefined behaviour, basically. Here's an example of what
could happen (off the top of my head):

_Bool b, b2, b3, b4;
unsigned char c;
int i;

struct { _Bool b:1 } s;

c = 2;
memcpy(&b, &c, 1);

i = b;
printf("i = %d\n", i);

s.b = b;
printf("s.b = %d\n", s.b);

b2 = b;
printf("b2 = %d\n", b2);

b3 = i;
printf("b3 = %d\n", b3);

b4 = !b;
printf("b4 = %d\n", b4);


This would output:

i = 2;
s.b = 0;
b2 = 2;
b3 = 1;
b4 = 3;

The knowledge that a bool "cannot" have a value other than 0 or 1 is used to
eliminate the "!= 0" test that would otherwise be inserted, as illustrated
by b2 and b3 there.

Internally, this is handled by having a hidden "boolean" attribute of a type
that indicates that its value is known to be 0 or 1. For example, the
expressions "!x" and "x && y" have type "boolean int".
 
D

Dan Pop

In said:
In message <[email protected]>


Well, you'd get undefined behaviour, basically.

There is no such thing where a *concrete* implementation is concerned.
Even if not documented, the behaviour is defined by the code generated
by the compiler (and if the compiler generates random garbage, by the
algorithm used to generate it and by the way the processor handles it).

Thanks for the example: it's a reasonable optimisation.

Dan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,772
Messages
2,569,593
Members
45,112
Latest member
BrentonMcc
Top