memcpy() and endianness

C

Case

#include <string.h>

int i; /* 4-byte == 4-char */
char data[] = { 0x78, 0x56, 0x34, 0x12 };

int main()
{
memcpy(&i, data, 4);

/*
* Thinking about endianness, what can be said about
* the value of i according to the C-spec?
*/
}

/* Thanks for listening! Case */
 
L

Lew Pitcher

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
#include <string.h>

int i; /* 4-byte == 4-char */
char data[] = { 0x78, 0x56, 0x34, 0x12 };

int main()
{
memcpy(&i, data, 4);

First off, sizeof(i) may not be equal to 4. So, this may or may not do what you
expect it to do.

/*
* Thinking about endianness, what can be said about
* the value of i according to the C-spec?
*/
Nothing can be said about the value of i.
1) you may or may not have set the value of i to a known quantity. If sizeof(i)
is greater than 4, then you didn't set i's storage completely, and if sizeof(i)
is less than 4, then some of your initialization was not used to set i (and
overwrote something else instead)
2) the standard doesn't specify how an integer is to map into a character array.
It doesn't specify a particular endianness for integers.

}

/* Thanks for listening! Case */


- --
Lew Pitcher
IT Consultant, Enterprise Application Architecture,
Enterprise Technology Solutions, TD Bank Financial Group

(Opinions expressed are my own, not my employers')
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)

iD8DBQFAn5YjagVFX4UWr64RAgNYAKCGonjwOnfElYsZrCbrxpSzMS+rdgCg0oeE
3mzpLbH2n9S6Pv2gfAIfvTs=
=hmVd
-----END PGP SIGNATURE-----
 
M

Martin Dickopp

Case said:
#include <string.h>

int i; /* 4-byte == 4-char */
char data[] = { 0x78, 0x56, 0x34, 0x12 };

int main()
{
memcpy(&i, data, 4);

/*
* Thinking about endianness, what can be said about
* the value of i according to the C-spec?
*/
}

/* Thanks for listening! Case */

A signed integer has a sign bit, a number of value bits (each of which
has a value that is an integral power of two), and possibly padding
bits. The standard does not impose any rule how the bits have to be
arranged.

For example, in the special case of `int' having 31 value bits and no
padding bits, there are 263130836933693530167218012160000000 (== 32!)
possibilities how to arrange the bits. Three are particularly popular
among implementors, so that they have special names: little, big, and
mixed endian. The remaining 263130836933693530167218012159999997 don't
have any endianess.

Therefore, not much can be said about the value of `i' from the
perspective of the C standard.

Martin
 
C

Case

Lew said:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
#include <string.h>

int i; /* 4-byte == 4-char */
char data[] = { 0x78, 0x56, 0x34, 0x12 };

int main()
{
memcpy(&i, data, 4);


First off, sizeof(i) may not be equal to 4. So, this may or may not do what you
expect it to do.

Yes, I know. That's why I said i is '4-byte == 4-char'.
Nothing can be said about the value of i.
1) you may or may not have set the value of i to a known quantity. If sizeof(i)
is greater than 4, then you didn't set i's storage completely, and if sizeof(i)
is less than 4, then some of your initialization was not used to set i (and
overwrote something else instead)

It's 4 as I said (see above). And, doesn't the C standard say that
'global' data (as i is) is initialized to 0?!
 
C

Case

Martin said:
#include <string.h>

int i; /* 4-byte == 4-char */
char data[] = { 0x78, 0x56, 0x34, 0x12 };

int main()
{
memcpy(&i, data, 4);

/*
* Thinking about endianness, what can be said about
* the value of i according to the C-spec?
*/
}

/* Thanks for listening! Case */


A signed integer has a sign bit, a number of value bits (each of which
has a value that is an integral power of two), and possibly padding
bits. The standard does not impose any rule how the bits have to be
arranged.

For example, in the special case of `int' having 31 value bits and no
padding bits, there are 263130836933693530167218012160000000 (== 32!)
possibilities how to arrange the bits. Three are particularly popular
among implementors, so that they have special names: little, big, and
mixed endian. The remaining 263130836933693530167218012159999997 don't
have any endianess.

Therefore, not much can be said about the value of `i' from the
perspective of the C standard.

How many different values can i have given code above? With value I
mean a number at C level, not implementation level.
 
C

Case

Lew said:
Case said:
#include <string.h>

int i; /* 4-byte == 4-char */
char data[] = { 0x78, 0x56, 0x34, 0x12 };

int main()
{
memcpy(&i, data, 4);


First off, sizeof(i) may not be equal to 4. So, this may or may not do what you
expect it to do.


/*
* Thinking about endianness, what can be said about
* the value of i according to the C-spec?
*/

Nothing can be said about the value of i.
1) you may or may not have set the value of i to a known quantity. If sizeof(i)
is greater than 4, then you didn't set i's storage completely, and if sizeof(i)
is less than 4, then some of your initialization was not used to set i (and
overwrote something else instead)
2) the standard doesn't specify how an integer is to map into a character array.
It doesn't specify a particular endianness for integers.

In terms of implementation, what mappings mapping are common?
 
C

CBFalconer

Case said:
Lew said:
Case said:
#include <string.h>

int i; /* 4-byte == 4-char */
char data[] = { 0x78, 0x56, 0x34, 0x12 };

int main()
{
memcpy(&i, data, 4);


First off, sizeof(i) may not be equal to 4. So, this may or may
not do what you expect it to do.

Yes, I know. That's why I said i is '4-byte == 4-char'.
.... snip ...
Nothing can be said about the value of i.
1) you may or may not have set the value of i to a known
quantity. If sizeof(i) is greater than 4, then you didn't set
i's storage completely, and if sizeof(i) is less than 4, then
some of your initialization was not used to set i (and
overwrote something else instead)

It's 4 as I said (see above). And, doesn't the C standard say
that 'global' data (as i is) is initialized to 0?!

The fact that 'you said' doesn't make it so. The initialization
doesn't matter, because you have no idea what bits or bytes belong
where. This newsgroup deals only with portable standard C, so
your particular platform is of no interest whatsoever.
 
E

Eric Sosman

Case said:
[code setting the bytes of a four-byte `int' to:]
char data[] = { 0x78, 0x56, 0x34, 0x12 };

In terms of implementation, what mappings mapping are common?

"Big-Endian:" the value is 0x78563412

"Little-Endian:" the value is 0x12345678

"Middle-Endian:" the value is 0x56781234

Other formats are possible, of course, and permitted by the
C Standard. Also, the latest C99 Standard permits an `int' to
have "trap representations" somewhat like an IEEE signalling NaN:
some arrangements of bits may signify "erroneous data" rather than
encoding a numeric value. It's at least possible thet storing
these four bytes in an integer could produce such a result.

For what it's worth, I've never encountered a machine that
used trap representations in integers or that used an "endian"
arrangement other than the three listed above. YMMV.
 
M

Martin Dickopp

Case said:
Martin said:
Case said:
#include <string.h>

int i; /* 4-byte == 4-char */
char data[] = { 0x78, 0x56, 0x34, 0x12 };

int main()
{
memcpy(&i, data, 4);

/*
* Thinking about endianness, what can be said about
* the value of i according to the C-spec?
*/
}

/* Thanks for listening! Case */
A signed integer has a sign bit, a number of value bits (each of which
has a value that is an integral power of two), and possibly padding
bits. The standard does not impose any rule how the bits have to be
arranged.
For example, in the special case of `int' having 31 value bits and no
padding bits, there are 263130836933693530167218012160000000 (== 32!)
possibilities how to arrange the bits. Three are particularly popular
among implementors, so that they have special names: little, big, and
mixed endian. The remaining 263130836933693530167218012159999997 don't
have any endianess.
Therefore, not much can be said about the value of `i' from the
perspective of the C standard.

How many different values can i have given code above?

If type `int' has 31 value bits and no padding bits, and bytes have 8
bits, then `i' will have 13 one-bits and 19 zero-bits. The number of
values with this property is given by the binomial coefficient
"32 choose 13", which is 347373600. That's how many different values
`i' can have.
With value I mean a number at C level, not implementation level.

I don't know what you mean by "C level" or "implementation level".

Martin
 
C

Christian Bau

Case <[email protected]> said:
#include <string.h>

int i; /* 4-byte == 4-char */
char data[] = { 0x78, 0x56, 0x34, 0x12 };

int main()
{
memcpy(&i, data, 4);

/*
* Thinking about endianness, what can be said about
* the value of i according to the C-spec?
*/
}

Nothing.
 
M

Malcolm

Case said:
#include <string.h>

int i; /* 4-byte == 4-char */
char data[] = { 0x78, 0x56, 0x34, 0x12 };

int main()
{
memcpy(&i, data, 4);

/*
* Thinking about endianness, what can be said about
* the value of i according to the C-spec?
*/
}

/* Thanks for listening! Case */

How many different values can i have given code above? With
value I mean a number at C level, not implementation level.
In terms of existing implementations, probably about a dozen. Usually
numbers will be big- or little- endian and in two's complement notation, so
for practical purposes the answer is two. However you could run into
non-two's complement machines, machines where there are 9 bits in a byte,
and all sorts of other wonderful variations.
 
S

Stephen L.

Christian said:
Case <[email protected]> said:
#include <string.h>

int i; /* 4-byte == 4-char */
char data[] = { 0x78, 0x56, 0x34, 0x12 };

int main()
{
memcpy(&i, data, 4);

/*
* Thinking about endianness, what can be said about
* the value of i according to the C-spec?
*/
}

Nothing.

I agree.

I believe what is missing in all of the
discussions is what endianness _is_.

In simple terms, it is the relationship between the CPU
and its memory. The above code example will, on any
architecture/platform it's run on, ALWAYS do the
following (assuming sizeof (int) == 4 for sake of argument):

*((char *)(&i) + 0) = data[ 0 ];
*((char *)(&i) + 1) = data[ 1 ];
*((char *)(&i) + 2) = data[ 2 ];
*((char *)(&i) + 3) = data[ 3 ];

However, how the CPU interprets the bits now contained
in the variable "i" is where the concept of its endianness
comes in. An Intel CPU will see the ordering of the
bits _differently_ then a SPARC CPU (or a 68040, etc.).

The code snippet will produce identical results _in
memory_ on all architectures where the sizeof (int) is four,
however, there is nothing to say that each architecture
will interpret the arrangement of the bits in the same way.

See man htonl(), etc. for more details.


HTH...

Stephen
 
L

Lew Pitcher

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Case wrote:
| Lew Pitcher wrote:
|
|> -----BEGIN PGP SIGNED MESSAGE-----
|> Hash: SHA1
|>
|> Case wrote:
|>
|>> #include <string.h>
|>>
|>> int i; /* 4-byte == 4-char */
|>> char data[] = { 0x78, 0x56, 0x34, 0x12 };
|>>
|>> int main()
|>> {
|>> memcpy(&i, data, 4);
|>
|>
|>
|> First off, sizeof(i) may not be equal to 4. So, this may or may not do
|> what you
|> expect it to do.
|
|
| Yes, I know. That's why I said i is '4-byte == 4-char'.

No. sizeof(int) is 4 if the *compiler* says it is. Your word doesn't count
here at all. And we haven't seen anything from the compiler to indicate that
sizeof(int) == 4

|>
|>> /*
|>> * Thinking about endianness, what can be said about
|>> * the value of i according to the C-spec?
|>> */
|>
|>
|> Nothing can be said about the value of i.
|> 1) you may or may not have set the value of i to a known quantity. If
|> sizeof(i)
|> is greater than 4, then you didn't set i's storage completely, and if
|> sizeof(i)
|> is less than 4, then some of your initialization was not used to set i
|> (and
|> overwrote something else instead)
|
|
| It's 4 as I said (see above).

See above. It's not 4 on your word.

| And, doesn't the C standard say that
| 'global' data (as i is) is initialized to 0?!

So? We're not talking about /before/ you memcpy(). We're talking about /after/
you memcpy()

Think of it this way. If, unlike you, your compiler believes that
sizeof(int) == 2, then your memcpy() of 4 bytes over a 2-byte int just wiped
out two additional bytes somewhere. Your int only holds the first two bytes of
the 4 byte array that you used to init with, and that value might be
interpreted /either/ in big-endian /or/ little-endian format.

OTOH, if (unlike you) your compiler believes that sizeof(int) == 8), then your
memcpy() of 4 bytes over an 8-byte int only placed data into four of the eight
bytes. The other four bytes are not touched. So, we now have an int in which
four bytes are known quantities, but that can be interpreted in one of 8! ways
(big-endian and little-endian being two of those ways). So, even knowing the 4
bytes (and by inference from the rules, all 8 bytes) we can't tell what the
value of your int is.

|> 2) the standard doesn't specify how an integer is to map into a
|> character array.
|> It doesn't specify a particular endianness for integers.
|
|


- --
Lew Pitcher

Master Codewright & JOAT-in-training | GPG public key available on request
Registered Linux User #112576 (http://counter.li.org/)
Slackware - Because I know what I'm doing.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFAoEBOagVFX4UWr64RAmnTAKDaJ1lt0cW8WHF753pjcGWQHMHChACbBSsD
miBERGc25WSOMfhSWfdQi28=
=woxR
-----END PGP SIGNATURE-----
 
C

Case

Lew said:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Case wrote:
| Lew Pitcher wrote:
|
|> -----BEGIN PGP SIGNED MESSAGE-----
|> Hash: SHA1
|>
|> Case wrote:
|>
|>> #include <string.h>
|>>
|>> int i; /* 4-byte == 4-char */
|>> char data[] = { 0x78, 0x56, 0x34, 0x12 };
|>>
|>> int main()
|>> {
|>> memcpy(&i, data, 4);
|>
|>
|>
|> First off, sizeof(i) may not be equal to 4. So, this may or may not do
|> what you
|> expect it to do.
|
|
| Yes, I know. That's why I said i is '4-byte == 4-char'.

No. sizeof(int) is 4 if the *compiler* says it is. Your word doesn't count
here at all. And we haven't seen anything from the compiler to indicate
that
sizeof(int) == 4

Yes, you are correct. All I meant was: 'Assuming that my compiler sees
an int as a 4-byte entity and a char as a 1-byte entity, what is the
result of ...' BTW, why doesn't anyone question the sizeof char in
my example? Is char perhaps *silently* assumed to be a byte?

Assuming my question is clear now, how should I have coded my example
unambiguously (without the use of comments)?
 
C

Case

Martin said:
Martin said:
#include <string.h>

int i; /* 4-byte == 4-char */
char data[] = { 0x78, 0x56, 0x34, 0x12 };

int main()
{
memcpy(&i, data, 4);

/*
* Thinking about endianness, what can be said about
* the value of i according to the C-spec?
*/
}

/* Thanks for listening! Case */

A signed integer has a sign bit, a number of value bits (each of which
has a value that is an integral power of two), and possibly padding
bits. The standard does not impose any rule how the bits have to be
arranged.
For example, in the special case of `int' having 31 value bits and no
padding bits, there are 263130836933693530167218012160000000 (== 32!)
possibilities how to arrange the bits. Three are particularly popular
among implementors, so that they have special names: little, big, and
mixed endian. The remaining 263130836933693530167218012159999997 don't
have any endianess.
Therefore, not much can be said about the value of `i' from the
perspective of the C standard.

How many different values can i have given code above?


If type `int' has 31 value bits and no padding bits, and bytes have 8
bits, then `i' will have 13 one-bits and 19 zero-bits. The number of
values with this property is given by the binomial coefficient
"32 choose 13", which is 347373600. That's how many different values
`i' can have.

So this means that bit ordering, as defined in the C spec, can be
completely different for int and char (and other basic types)?
I don't know what you mean by "C level" or "implementation level".

At "C level" the bits have a fixed position, for example 0x00000001
can be used to get least significant bit (bit 0) of a 4 byte int;
at implementation level there are (as I understand it from you) 32
possible positions this bit could be.
 
R

Richard Bos

Case said:
Yes, you are correct. All I meant was: 'Assuming that my compiler sees
an int as a 4-byte entity and a char as a 1-byte entity, what is the
result of ...' BTW, why doesn't anyone question the sizeof char in
my example? Is char perhaps *silently* assumed to be a byte?

No. It is _explicitly_ defined to be one byte by the Standard.

Richard

[ BTW, please learn to snip. ]
 
M

Martin Dickopp

Case said:
BTW, why doesn't anyone question the sizeof char in my example? Is
char perhaps *silently* assumed to be a byte?

Yes, `char' *always* has a size of one byte, so `sizeof(char) == 1' is
always true. However, a byte can have more than 8 bits.

Note that my other answer to you in this thread deals with the special
case that seems to apply to your implementation: 8 bit bytes, 4 byte
`int's with no padding bits.

Martin
 
M

Martin Dickopp

Case said:
Martin said:
Case said:
Martin Dickopp wrote:



#include <string.h>

int i; /* 4-byte == 4-char */
char data[] = { 0x78, 0x56, 0x34, 0x12 };

int main()
{
memcpy(&i, data, 4);

/*
* Thinking about endianness, what can be said about
* the value of i according to the C-spec?
*/
}

/* Thanks for listening! Case */

A signed integer has a sign bit, a number of value bits (each of which
has a value that is an integral power of two), and possibly padding
bits. The standard does not impose any rule how the bits have to be
arranged.
For example, in the special case of `int' having 31 value bits and no
padding bits, there are 263130836933693530167218012160000000 (== 32!)
possibilities how to arrange the bits. Three are particularly popular
among implementors, so that they have special names: little, big, and
mixed endian. The remaining 263130836933693530167218012159999997 don't
have any endianess.
Therefore, not much can be said about the value of `i' from the
perspective of the C standard.

How many different values can i have given code above?
If type `int' has 31 value bits and no padding bits, and bytes have 8
bits, then `i' will have 13 one-bits and 19 zero-bits. The number of
values with this property is given by the binomial coefficient
"32 choose 13", which is 347373600. That's how many different values
`i' can have.

So this means that bit ordering, as defined in the C spec, can be
completely different for int and char (and other basic types)?

Yes. Although in reality, I have never seen a machine which didn't
either use big endian, little endian, or mixed endian bit order, the
C standard certainly allows others.
At "C level" the bits have a fixed position, for example 0x00000001
can be used to get least significant bit (bit 0) of a 4 byte int;
at implementation level there are (as I understand it from you) 32
possible positions this bit could be.

I see. These are usually referred to as "value" and "representation",
respectively. Note that the `memcpy' call sets the /representation/
of `i'.

Martin
 
S

Sam Dennis

Richard said:
No. It is _explicitly_ defined to be one byte by the Standard.

<sarcasm> Well, that's really going to clear up the OP's confusion.

In C, a byte is a unit of storage large enough to hold a char. By this
definition, similar to that used in the Standard, sizeof(char) == 1

The meaning that many people incorrectly associate with `byte' actually
belongs with `octet'; the latter just happens to be a common choice for
size of the former.

Applying the sizeof operator directly to the `char' type is not harmful
but it is indicative of a grave misunderstanding of the meaning of byte
or character in C, and thus throws doubt on the correctness of all uses
of sizeof by that programmer.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top