Use of memcpy() to transfer from memory to a variable

Martin · May 18, 2007

For reasons I won't go into, I need to transfer from 1 to 3 bytes to a
variable that I know is 4 bytes long. Bytes not written to in the 4-byte
target variable must be zero. Is the following use of memcpy() a
well-defined way of so doing? The code is written knowing that
sizeof(unsigned long) == 4 in this instance. The code is somewhat contrived
in order to provide a self-contained program that will compile and show the
use of memcpy() I am asking about.

The following code clean compiles using

gcc -Wall -ansi -pedantic

where

gcc -dumpversion displays

4.10.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void)
{

unsigned long value;
unsigned char *pAddress;
unsigned char rem;

pAddress = malloc(3);
if ( pAddress == NULL )
{
puts("malloc failed");
exit(EXIT_FAILURE);
}

/* load in some arbitrary values */
pAddress[0] = 1;
pAddress[1] = 2;
pAddress[2] = 4;

rem = 3; /* hard-wired for demo - normally calculated */

if ( rem )
{
value = 0;
memcpy( &value, pAddress, rem );
}

/* for demo - shows values in pAddress have been transferred */
printf("value = 0x%0lX\n", value);

return 0;
}

Richard Heathfield · May 18, 2007

[In a model post, much of which I've snipped for brevity but please do
see the original rather than deducing fault on his part from omissions

For reasons I won't go into, I need to transfer from 1 to 3 bytes to a
variable that I know is 4 bytes long. Bytes not written to in the
4-byte target variable must be zero. Is the following use of memcpy()
a well-defined way of so doing?

No, but it's not exactly undefined either.

unsigned long value;
unsigned char *pAddress;
unsigned char rem;

pAddress = malloc(3);
if ( pAddress == NULL )
{

So far so good (and said:
/* load in some arbitrary values */
pAddress[0] = 1;
pAddress[1] = 2;
pAddress[2] = 4;

rem = 3; /* hard-wired for demo - normally calculated */

if ( rem )
{
value = 0;
memcpy( &value, pAddress, rem );

Okay, this is certainly legal, given that sizeof(unsigned long) is 4, as
stated in your article.

But what do you actually get? Answer: it depends on the byte ordering
that pertains to your implementation. If you have little endian
integers, you'll get one result, and if you have big-endian, you'll get
another. And of course there are various flavours of middle-endian.

So if you're not too fussy about what value 'value' contains, or if
you're happy that it's correct on your implementation and you're not
worried about porting, you're fine.

HTH, HAND.

David Wade · May 19, 2007

Martin said:
For reasons I won't go into, I need to transfer from 1 to 3 bytes to a
variable that I know is 4 bytes long. Bytes not written to in the 4-byte
target variable must be zero. Is the following use of memcpy() a
well-defined way of so doing?

memcpy() copies bytes so it will always produce the same bit pattern in the
result. As others have said, depending on the "endianess" this may yeild
different integer values. I just wonder if using a struct with a union may
produce more unnderstandable code?

The code is written knowing that
sizeof(unsigned long) == 4 in this instance. The code is somewhat contrived
in order to provide a self-contained program that will compile and show the
use of memcpy() I am asking about.

The following code clean compiles using

gcc -Wall -ansi -pedantic

where

gcc -dumpversion displays

4.10.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void)
{

unsigned long value;
unsigned char *pAddress;
unsigned char rem;

pAddress = malloc(3);
if ( pAddress == NULL )
{
puts("malloc failed");
exit(EXIT_FAILURE);
}

/* load in some arbitrary values */
pAddress[0] = 1;
pAddress[1] = 2;
pAddress[2] = 4;

rem = 3; /* hard-wired for demo - normally calculated */

if ( rem )
{
value = 0;
memcpy( &value, pAddress, rem );
}

/* for demo - shows values in pAddress have been transferred */
printf("value = 0x%0lX\n", value);

return 0;
}

Chris Dollin · May 19, 2007

Martin said:
For reasons I won't go into, I need to transfer from 1 to 3 bytes to a
variable that I know is 4 bytes long. Bytes not written to in the 4-byte
target variable must be zero. Is the following use of memcpy() a
well-defined way of so doing? The code is written knowing that
sizeof(unsigned long) == 4 in this instance. The code is somewhat
contrived in order to provide a self-contained program that will compile
and show the use of memcpy() I am asking about.

So what's wrong with something like

unsigned long value =
(unsigned long) byteA
| ((unsigned long) byteB << 8)
| ((unsigned long) byteC << 16)
;

where bytes A, B, and C are the bytes you want to transfer to
the low, low-middle, and high-middle parts of `value`?

[This assumes 8-bit bytes.]

Advantages (a) bypasses endianness issues (b) no need to muck
around with `memcpy`.

Martin · May 21, 2007

Richard Heathfield said:
Okay, this is certainly legal, given that sizeof(unsigned long) is 4, as
stated in your article.

But what do you actually get? Answer: it depends on the byte ordering
that pertains to your implementation. If you have little endian
integers, you'll get one result, and if you have big-endian, you'll get
another. And of course there are various flavours of middle-endian.

So if you're not too fussy about what value 'value' contains, or if
you're happy that it's correct on your implementation and you're not
worried about porting, you're fine.

Indeed, I'm not fussy about the actual value; it is correct on my
implementation and portability is not an issue. I really want to transfer a
sequence of bytes from memory into a "container" which holds four bytes. So
you have confirmed for me my use of memcpy() does not invoke undefined
behaviour.

Many thanks.

Peter 'Shaggy' Haywood · May 25, 2007

Subject: Re: Use of memcpy() to transfer from memory to a variable
From: Peter 'Shaggy' Haywood <[email protected]>
Newsgroups: comp.lang.c
Date: Mon, 21 May 2007 13:32:23 +1000

Martin said:
Martin said:

For reasons I won't go into, I need to transfer from 1 to 3 bytes to a
variable that I know is 4 bytes long. Bytes not written to in the
4-byte target variable must be zero. Is the following use of memcpy() a
well-defined way of so doing? The code is written knowing that
sizeof(unsigned long) == 4 in this instance. The code is somewhat
contrived in order to provide a self-contained program that will
compile and show the use of memcpy() I am asking about.

Click to expand...

So what's wrong with something like

unsigned long value =
(unsigned long) byteA
| ((unsigned long) byteB << 8)
| ((unsigned long) byteC << 16)
;

where bytes A, B, and C are the bytes you want to transfer to the low,
low-middle, and high-middle parts of `value`?

[This assumes 8-bit bytes.]

Advantages (a) bypasses endianness issues (b) no need to muck around
with `memcpy`.

Disadvantage: relies on 8 bit bytes.
A better solution would use CHAR_BIT from limits.h instead of hard
coding magic numbers.

#include <limits.h>
....
unsigned long value = (unsigned long)byteA |
(unsigned long)byteB << CHAR_BIT |
(unsigned long)byteC << 2 * CHAR_BIT;

Hallvard B Furuseth · May 25, 2007

Richard said:
Martin said:

No, but it's not exactly undefined either.

It is if it produces a trap representation in the 'long' variable. Due
to either padding bits in the long type, or on a sign/magnitude host
where it can produce negative zero, which can be a trap representation.

Not exactly common, but it's possible. And if you use unsigned long
instead, it's possible to trap it at compile time by checking that
ULONG_MAX uses all the bits in a long.

Note that you need to look out for C's aliasing rules in code like that.
I _think_ it's safe when pAddress holds a malloced char array, but
otherwise the compiler would be allowed to "know" that pAddress does not
hold a long and 'value' thus is not set to a long after 'value = 0;'.
Then it could optimize away the memcpy, since it can tell the
destination value is not used (validly).

Compilers get smarter all the time, I remember a recent comment from the
gcc guys about some other hack: "We are working on a feature which will
break your hack."

Sigh. I guess we'll have to throw away hash functions which read
aligned data in larger chunks than byte by byte soon... Or maybe it
would help to declare the input values volatile.

Richard Heathfield · May 25, 2007

Hallvard B Furuseth said:

It is if it produces a trap representation in the 'long' variable.

That isn't possible in C89, of course, since there's no such thing as a
trap representation in C89.

Martin · May 25, 2007

Chris Dollin said:
So what's wrong with something like

unsigned long value =
(unsigned long) byteA
| ((unsigned long) byteB << 8)
| ((unsigned long) byteC << 16)
;

where bytes A, B, and C are the bytes you want to transfer to
the low, low-middle, and high-middle parts of `value`?

[This assumes 8-bit bytes.]

Advantages (a) bypasses endianness issues (b) no need to muck
around with `memcpy`.

Thanks for that Chris. I can see the byte order issue solved by it, but it
doesn't seem any easier (or worse) than using memcpy() - plus memcpy()'s
third argument specifies how many bytes I want to transfer whereas I'd have
to do some extra coding to accomodate that with your method.

Martin · May 25, 2007

Richard Heathfield replied:

That isn't possible in C89, of course, since there's no such thing as a
trap representation in C89.

I am using a C89 compiler so it seems well-defined to do this then,
excellent.

Keith Thompson · May 26, 2007

Martin said:
Richard Heathfield replied:

I am using a C89 compiler so it seems well-defined to do this then,
excellent.

I don't think it's well-defined in C89, though it may happen to be
safe under some particular implementation.

C89/C90 doesn't use the concept of "trap representation", but it does
have "indeterminately valued objects". Here's (part of) the C99
definition of "undefined behavior":

3.16 undefined behavior: Behavior, upon use of a nonponable or
erroneous program construct, of erroneous data, or of indeterminately
valued objects, for which this International Standard imposes no
requirements.
[...]

It seems to me that a conforming C90 implementation can have the
equivalent of "trap representatations", even though the C90 standard
doesn't use that term.

Richard Heathfield · May 26, 2007

Martin said:

Richard Heathfield replied:

I am using a C89 compiler so it seems well-defined to do this then,
excellent.

Well, I refer you to my earlier reply, in which I said that it is /not/
well-defined. See <[email protected]> for more details.

Richard Heathfield · May 26, 2007

CBFalconer said:

Richard said:
Richard said:

Hallvard B Furuseth said:

It is [undefined] if it produces a trap representation in the
'long' variable.

Click to expand...

That isn't possible in C89, of course, since there's no such thing
as a trap representation in C89.

Click to expand...

How can you say that? A C89/C90 int can have trap values,
independant of the actual code employed.

Chapter and verse, please.

Keith Thompson · May 26, 2007

Richard Heathfield said:
CBFalconer said:

Richard said:

Hallvard B Furuseth said:

Click to expand...

It is [undefined] if it produces a trap representation in the
'long' variable.

That isn't possible in C89, of course, since there's no such thing
as a trap representation in C89.

Click to expand...

How can you say that? A C89/C90 int can have trap values,
independant of the actual code employed.

Click to expand...

Chapter and verse, please.

There is no direct C&V in C89/C90, since that standard doesn't define
the term "trap value". (Then again, neither does C99, but C99 does
define "trap representation", which is what we're really talking
about.)

But the concept is there implicitly, I think.

C90 6.5.7:

If an object that has automatic storage duration is not
initialized explicitly, its value is indeterminate.

C90 3.16:

Undefined behavior: behavior, upon use of a nonportable or
erroneous program construct, of erroneous data, or of
indeterminately-valued objects, for which the Standard imposes no
requirements.
[...]

In C99, a trap representation is one that causes undefined behavior if
the program attempts to access it. C90 had the same concept, but not
the same term. C99 didn't really change the semantics, it just made
it more explicit and nailed down the terminology.

Harald van =?UTF-8?B?RMSzaw==?= · May 26, 2007

Keith said:
Richard Heathfield said:

CBFalconer said:

Richard Heathfield wrote:
Hallvard B Furuseth said:

Click to expand...

It is [undefined] if it produces a trap representation in the
'long' variable.

That isn't possible in C89, of course, since there's no such thing
as a trap representation in C89.

How can you say that? A C89/C90 int can have trap values,
independant of the actual code employed.

Click to expand...

Chapter and verse, please.

Click to expand...

There is no direct C&V in C89/C90, since that standard doesn't define
the term "trap value". (Then again, neither does C99, but C99 does
define "trap representation", which is what we're really talking
about.)

But the concept is there implicitly, I think.

C90 6.5.7:

If an object that has automatic storage duration is not
initialized explicitly, its value is indeterminate.

C90 3.16:

Undefined behavior: behavior, upon use of a nonportable or
erroneous program construct, of erroneous data, or of
indeterminately-valued objects, for which the Standard imposes no
requirements.
[...]

In C99, a trap representation is one that causes undefined behavior if
the program attempts to access it. C90 had the same concept, but not
the same term. C99 didn't really change the semantics, it just made
it more explicit and nailed down the terminology.

If you set the bytes that make up an object to specific values, the object
is initialised, so in C90, it is then allowed to be read, regardless of
what values you used for the representation, right?

Keith Thompson · May 26, 2007

Harald van DÄ³k said:
Keith Thompson wrote: [...]

In C99, a trap representation is one that causes undefined behavior if
the program attempts to access it. C90 had the same concept, but not
the same term. C99 didn't really change the semantics, it just made
it more explicit and nailed down the terminology.

Click to expand...

If you set the bytes that make up an object to specific values, the object
is initialised, so in C90, it is then allowed to be read, regardless of
what values you used for the representation, right?

As far as the wording of the C90 standard is concerned, I'm not sure.
As far as actual implementations are concerned, a floating-point
object could contain a signalling NaN representation, and accessing it
could cause Bad Things to happen. Similar things could happen for
some pointer representations on some systems, or even integers.

I'd suggest that this needs to be clarified, but it already was when
the C99 standard came out, and they're not doing DRs for C90.

Richard Heathfield · May 26, 2007

Keith Thompson said:

Richard Heathfield said:
Richard Heathfield said:

CBFalconer said:

Richard Heathfield wrote:
Hallvard B Furuseth said:

Click to expand...

It is [undefined] if it produces a trap representation in the
'long' variable.

That isn't possible in C89, of course, since there's no such thing
as a trap representation in C89.

How can you say that? A C89/C90 int can have trap values,
independant of the actual code employed.

Click to expand...

Chapter and verse, please.

Click to expand...

There is no direct C&V in C89/C90, since that standard doesn't define
the term "trap value".

Quite so. In fact, it doesn't even mention the word "trap".

(Then again, neither does C99, but C99 does
define "trap representation", which is what we're really talking
about.)

Which is why I referred specifically to C89 rather than C99.

But the concept is there implicitly, I think.

The concept of "indeterminate value" exists.

C90 6.5.7:

If an object that has automatic storage duration is not
initialized explicitly, its value is indeterminate.

Look at the OP's code again. None of the objects concerned had its value
read without that value first being set. Therefore, it's hard to argue
that any of the values in the OP's code are indeterminate.

Harald van =?UTF-8?B?RMSzaw==?= · May 26, 2007

CBFalconer said:
No. The only type that is immune from trap representation is the
unsigned char.

Please keep in mind that I was asking about C90, not C99. As Keith Thompson
pointed out, real-world implementations aiming to conform to C90 do have
trap representations, but where does C90 allow them to?

Hallvard B Furuseth · May 26, 2007

Keith said:
There is no direct C&V in C89/C90, since that standard doesn't define
the term "trap value". (Then again, neither does C99, but C99 does
define "trap representation", which is what we're really talking
about.)

But the concept is there implicitly, I think.
(...)

A real-world example I vaguely remember from earlier discussions, I
think before C99, which the OP's example can produce: Sign bit 1, all
other bits 0, when LONG_MIN == -LONG_MAX, on a two's complement machine.

Richard Heathfield · May 26, 2007

CBFalconer said:

Richard said:
Richard said:

CBFalconer said:

Richard Heathfield wrote:
Hallvard B Furuseth said:

Click to expand...

It is [undefined] if it produces a trap representation in the
'long' variable.

That isn't possible in C89, of course, since there's no such
thing as a trap representation in C89.

How can you say that? A C89/C90 int can have trap values,
independant of the actual code employed.

Click to expand...

Chapter and verse, please.

Click to expand...

I don't have a C90 std, but the following (para. 5) is from N869:

....and it is irrelevant to C89, which *pre-dates* N869 by a substantial
number of years.

C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022
Adding adressing of IPv6 to program	1	Feb 16, 2023
Please help with C programming to save GPS reception data in Raspberry Pi.	0	Dec 8, 2022
Memory corruption on freeing a pointer to pointer	172	Aug 23, 2013
memory manager to prevent memory leaks	4	Jan 17, 2014
Is it good to use char instead of int to save memory?	82	Mar 18, 2010
Outputting signal values to terminal Within Character Array	0	Dec 10, 2021
conversion from unsigned to signed with a value range shift	5	Sep 21, 2009

Use of memcpy() to transfer from memory to a variable

Martin

Richard Heathfield

David Wade

Chris Dollin

Martin

Peter 'Shaggy' Haywood

Hallvard B Furuseth

Richard Heathfield

Martin

Martin

Keith Thompson

Richard Heathfield

Richard Heathfield

Keith Thompson

Harald van =?UTF-8?B?RMSzaw==?=

Keith Thompson

Richard Heathfield

Harald van =?UTF-8?B?RMSzaw==?=

Hallvard B Furuseth

Richard Heathfield

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads