memcpy() with unitialised memory

S

Spiros Bousbouras

#include <string.h>

int main(void) {
int a[10] , b[10] ;
memcpy(a,b,10) ;
return 0 ;
}

Is this undefined behaviour ? If yes how does it follow from the
standard ?
 
Q

qarnos

#include <string.h>

int main(void) {
    int a[10] , b[10] ;
    memcpy(a,b,10) ;
    return 0 ;

}

Is this undefined behaviour ? If yes how does it follow from the
standard ?

If you are having problems with that code, it's because you are
forgetting to use the sizeof operator.

int main(void) {
int a[10], b[10];
memcpy(a, b, sizeof(int) * 10);
return 0;
}
 
B

Ben Pfaff

Spiros Bousbouras said:
#include <string.h>

int main(void) {
int a[10] , b[10] ;
memcpy(a,b,10) ;
return 0 ;
}

Is this undefined behaviour ? If yes how does it follow from the
standard ?

No, it is not undefined behavior. memcpy copies an object as an
array of unsigned char. The values of b's elements are
indeterminate, but unsigned char has no trap representation, so
their values are merely unspecified.
 
T

Tomás Ó hÉilidhe

#include <string.h>

int main(void) {
    int a[10] , b[10] ;
    memcpy(a,b,10) ;
    return 0 ;

}

Is this undefined behaviour ? If yes how does it follow from the
standard ?


This is one of the places where I get lazy and don't bother
thinking about it because I never do it. Same goes for bitwise
operations on signed integers -- I haven't bothered learning about it
because I'll never do it.

For what it's worth though, I've seen very proficient programmers
on this newsgroup do stuff like copy uninitialised arrays of unsigned
char and say it's OK, so my first guess would be that it's OK. Not
that you'd have a reason to do it, of course.

Of course if you try to access the uninitialised data is if it
were int's, it'd be UB.
 
U

user923005

#include <string.h>

int main(void) {
    int a[10] , b[10] ;
    memcpy(a,b,10) ;
    return 0 ;

}

Is this undefined behaviour ? If yes how does it follow from the
standard ?

6.2.6 Representations of types
6.2.6.1 General

5 Certain object representations need not represent a value of the
object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does not have
character type, the behavior is undefined. If such a representation is
produced by a side effect that modifies all or any part of the object
by an lvalue expression that does not have character type, the
behavior is undefined.41) Such a representation is called a trap
representation.

Footnote 41) Thus, an automatic variable can be initialized to a trap
representation without causing undefined behavior, but the value of
the variable cannot be used until a proper value is stored in it.

And I guess you meant:

/* No undefined behavior here. */
#include <string.h>
int main(void) {
int a[10], b[10]={0};
memcpy(a,b,sizeof a) ;
return 0;
}
 
S

Sjouke Burry

qarnos said:
#include <string.h>

int main(void) {
int a[10] , b[10] ;
memcpy(a,b,10) ;
return 0 ;

}

Is this undefined behaviour ? If yes how does it follow from the
standard ?

If you are having problems with that code, it's because you are
forgetting to use the sizeof operator.

int main(void) {
int a[10], b[10];
memcpy(a, b, sizeof(int) * 10);
return 0;
}
Although you are still copying uninitialized data,
which is as usefull as carrying water to the sea.
 
U

user923005

Spiros Bousbouras said:
#include <string.h>
int main(void) {
    int a[10] , b[10] ;
    memcpy(a,b,10) ;
    return 0 ;
}
Is this undefined behaviour ? If yes how does it follow from the
standard ?

No, it is not undefined behavior.  memcpy copies an object as an
array of unsigned char.  The values of b's elements are
indeterminate, but unsigned char has no trap representation, so
their values are merely unspecified.

I see nothing about the mechanics of the copy operation here:
7.21.2 Copying functions
7.21.2.1 The memcpy function
Synopsis
1 #include <string.h>
void *memcpy(void * restrict s1,
const void * restrict s2,
size_t n);
Description
2 The memcpy function copies n characters from the object pointed to
by s2 into the object pointed to by s1. If copying takes place between
objects that overlap, the behavior is undefined.
Returns
3 The memcpy function returns the value of s1.

The reference to "n characters" only implies size.

Mechanically, most library source I know of does not actually use char
anyway. If there is some rule that says the memcpy() function must
behave as if it is moving unsigned characters, then you are right.
 
U

user923005

Spiros Bousbouras said:
#include <string.h>
int main(void) {
    int a[10] , b[10] ;
    memcpy(a,b,10) ;
    return 0 ;
}
Is this undefined behaviour ? If yes how does it follow from the
standard ?
No, it is not undefined behavior.  memcpy copies an object as an
array of unsigned char.  The values of b's elements are
indeterminate, but unsigned char has no trap representation, so
their values are merely unspecified.

I see nothing about the mechanics of the copy operation here:
7.21.2 Copying functions
7.21.2.1 The memcpy function
Synopsis
1 #include <string.h>
void *memcpy(void * restrict s1,
const void * restrict s2,
size_t n);
Description
2 The memcpy function copies n characters from the object pointed to
by s2 into the object pointed to by s1. If copying takes place between
objects that overlap, the behavior is undefined.
Returns
3 The memcpy function returns the value of s1.

The reference to "n characters" only implies size.

Mechanically, most library source I know of does not actually use char
anyway.  If there is some rule that says the memcpy() function must
behave as if it is moving unsigned characters, then you are right.

See, for instance:
http://www.pell.portland.or.us/~orc/Code/libc/libc-current/string/memcpy.c
http://www.koders.com/c/fidE40953362C44848125DB7B62E480E2E1675F7166.aspx?s=mdef:insert
 
B

Ben Pfaff

user923005 said:
I see nothing about the mechanics of the copy operation here: ....
2 The memcpy function copies n characters from the object pointed to
by s2 into the object pointed to by s1. If copying takes place between
objects that overlap, the behavior is undefined. ....
The reference to "n characters" only implies size.

Why do you think so? It seems pretty clear to me that it copies
characters, since that it what the plain text of the standard
says.

The memmove description is even more explicit about characters
being involved:

Copying takes place as if the n characters from the
object pointed to by s2 are first copied into a
temporary array of n characters that does not overlap
the objects pointed to by s1 and s2, and then the n
characters from the temporary array are copied into the
object pointed to by s1.

TC2 adds this paragraph to 7.21.1 "String function conventions":

For all functions in this subclause, each character
shall be interpreted as if it had the type unsigned
char (and therefore every possible object
representation is valid and has a different value).

This is a result of DR 274:
http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_274.htm
Mechanically, most library source I know of does not actually use char
anyway. If there is some rule that says the memcpy() function must
behave as if it is moving unsigned characters, then you are right.

This falls under the "as if" rule.
 
B

Ben Pfaff

Han from China - Master Troll said:
So those three tell us that we need only bother exploring further to see
what these pesky trap representations imply for us.


7.21.1{3}:
For all functions in this subclause, each character shall be interpreted
as if it had the type unsigned char (and therefore every possible object
representation is valid and has a different value).

Does your copy of C99 have TC1 and TC2 pre-applied, then? Where
did you get it?
 
U

user923005

Why do you think so?  It seems pretty clear to me that it copies
characters, since that it what the plain text of the standard
says.

The memmove description is even more explicit about characters
being involved:

         Copying takes place as if the n characters from the
         object pointed to by s2 are first copied into a
         temporary array of n characters that does not overlap
         the objects pointed to by s1 and s2, and then the n
         characters from the temporary array are copied into the
         object pointed to by s1.

TC2 adds this paragraph to 7.21.1 "String function conventions":

          For all functions in this subclause, each character
          shall be interpreted as if it had the type unsigned
          char (and therefore every possible object
          representation is valid and has a different value).

I guess it is time for me to get a copy of TC2. The above is very
clear.
Looks like n1256.pdf has TC1+TC2+TC3
 
B

Bartc

Sjouke Burry said:
qarnos said:
#include <string.h>

int main(void) {
int a[10] , b[10] ;
memcpy(a,b,10) ;
return 0 ;

}

Is this undefined behaviour ? If yes how does it follow from the
standard ?

If you are having problems with that code, it's because you are
forgetting to use the sizeof operator.

int main(void) {
int a[10], b[10];
memcpy(a, b, sizeof(int) * 10);
return 0;
}
Although you are still copying uninitialized data,
which is as usefull as carrying water to the sea.

It ensures both a and b have the same contents; this could possibly be
significant.

(Now of course someone will say this is not guaranteed by the C standard,
but that wouldn't surprise me.)
 
W

Wolfgang Draxinger

qarnos said:
If you are having problems with that code, it's because you are
forgetting to use the sizeof operator.

int main(void) {
int a[10], b[10];
memcpy(a, b, sizeof(int) * 10);
return 0;
}

Since the C standard states, that

sizeof(char) <= sizeof( any_other_type )

the only thing that might happen is, that "too few" elements are
copied. Otherwise the code shows no undefined behaviour:

* 'a' and 'b' don't overlap
* 'a' and 'b' are allocated (automatically)

Surely the contents are uninitialized, but sometimes one might
_want_ to read out the contents of uninitialized memory (either
to initialize some entropy pool^1, or for data forensics^2).

Wolfgang Draxinger

[1]: OpenSSL does this - and the Debian folks "corrected" it away
resulting in the Debian-OpenSSL desaster.

[2]: like in: Inject some shell code into an application, call
the forensics function, exploiting knowledge about the
implementation, e.g. that this certain implementation uses a
stack and with the following code

void foo()
{
unsigned char test[32];
/* do something on test */
}

void bar()
{
unsigned char gotcha[1024];
}

void baz()
{
foo(); /* somehow inject a call to bar after foo here */
/* -> */ bar(); /* On certain architectures utilizing a stack,
like the x86, bar's 'gotcha' will now
contain the last contents of
foo's 'test' */
}

This technique is usefull, if you can't run a debugger on the
system, but can inject shellcode (through some exploit e.g.)
 
J

jameskuyper

Bartc said:
Sjouke Burry said:
qarnos wrote: ....
int main(void) {
int a[10], b[10];
memcpy(a, b, sizeof(int) * 10);
return 0;
}
Although you are still copying uninitialized data,
which is as usefull as carrying water to the sea.

Or more precisely, it's as useful a replacing sea water with other sea
water.
It ensures both a and b have the same contents; this could possibly be
significant.

(Now of course someone will say this is not guaranteed by the C standard,
but that wouldn't surprise me.)

That the contents are identical is indeed guaranteed. But I still
don't see why it would be important for a correctly written program
that two objects contain identical copies of uninitialized memory.
 
B

Ben Pfaff

Han from China - Master Troll said:
I'm surprised you've missed Chuck Fucking Falconer dumping the
damn link in every thread.

I usually just read threads that have very few posts. There's
rarely anything left to contribute to popular threads.
 
J

jameskuyper

Wolfgang Draxinger wrote:
....
Surely the contents are uninitialized, but sometimes one might
_want_ to read out the contents of uninitialized memory (either
to initialize some entropy pool^1, or for data forensics^2).

Wolfgang Draxinger

[1]: OpenSSL does this - and the Debian folks "corrected" it away
resulting in the Debian-OpenSSL desaster.

It sounds like such code makes unwarranted assumptions about the
entropy of uninitialized memory. Lets put it this way: the compiler I
use most frequently has the "feature" that uninitialized memory is
always filled with zeros. This feature does not render it non-
conforming. Would that be a problem for such applications?
 
F

Falcon Kirtaran

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Wolfgang Draxinger wrote:
...
Surely the contents are uninitialized, but sometimes one might
_want_ to read out the contents of uninitialized memory (either
to initialize some entropy pool^1, or for data forensics^2).

Wolfgang Draxinger

[1]: OpenSSL does this - and the Debian folks "corrected" it away
resulting in the Debian-OpenSSL desaster.

It sounds like such code makes unwarranted assumptions about the
entropy of uninitialized memory. Lets put it this way: the compiler I
use most frequently has the "feature" that uninitialized memory is
always filled with zeros. This feature does not render it non-
conforming. Would that be a problem for such applications?

It's kind of silly to try to use that as a source of entropy in the
first place. malloc() is often implemented in such a way that
unallocated memory (might) contain information used for memory
allocation within the program's heap (which makes the data there much
more predictable).

- --
- --Falcon Darkstar Kirtaran
- --
- --OpenPGP: (7902:4457) 9282:A431

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQIcBAEBAgAGBQJJbggmAAoJEKmxP9YxEE4rEvUP/iUpZVa7dr5TO+0FqKO3eVXi
rhOrmv1qLu+k6gj9eS8HrFEv7zE7L5CwAAgSQEx6FhS11U2lQt68qVGw9C9O5OMS
6GI81+qMoKAdq4HLb575jfivfSJWvgnRjg2lVsadMN2SeAYHu3oxj/ptOeLA0+M+
DDlNqZ6xXglwBQnowD+JX69daoJCkJW5+TY3hKTbqMBQfEI5VJOrHveqflpyJVoo
Z/NfwPW2C2CeAXU8sb3kd9UoSgdrcrRbMDV1Fo+Qf/lKA85GPCofwY9oUhTBnePa
W6WYYR7CB3SgyZSv+v0IDyY+1HZr4fB6g5CmXIowx+ligYt82dX0jz1GzF+J79GM
hAd4QSyIt+G9M12mRLhG1DMo1t6YkvQ24MIlxSh8bQe9pMqaHYzwnnnPw95T5348
CKJkOaFzuyadqCnG+oRaLj1zO1IqH05kO4Ag3WkXIbcIa5d8ZekK1CtLJf7T+E3p
HXXDJgVk7T+nGI9BzAs/JnbIyNLuKvN8hN9JCI6Iu+P2zSXg1ujuN/aVhL17iMBc
BpwEy73kCULKqHmpWa9qW0kJSfVo0SqwneTHezoy2/gq0ObD3u33FkwueyWT4fTP
WYDkNcrqVF93Deve2boMfdcSMK59pYODOvg8VY9i7fW21P+v5wxxE32vN6MnNFHV
PbJZohWmkVYWkQI2XO1t
=PV2n
-----END PGP SIGNATURE-----
 
R

Richard Tobin

jameskuyper said:
That the contents are identical is indeed guaranteed. But I still
don't see why it would be important for a correctly written program
that two objects contain identical copies of uninitialized memory.

Suppose a struct contains members that are not used in all cases. It
may be convenient to compare them without knowledge of which members
are used, so when making a copy you might choose to copy all the
members regardless of whether they have been initialised. Of course,
it might well be better to always initialise the data to zero.

-- Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,053
Latest member
BrodieSola

Latest Threads

Top