This version of bcopy() is implemented to behave more "correctly" when =
memory blocks are overlaped. We know that according to the C89 standard, =
function memcpy() does not need to have this kind of "correct" =
behavior(maybe bcopy() needs for some dependence issues), and if a =
programmer calls memcpy() with two overlaped memory blocks, its behavior =
is not defined. So, I feel that this implementation of memcpy() is too =
awful. The following implementation can be better,
I believe the "definition" of bcopy() (which is not ANSI C, but some
kind of old BSD de-facto non-standard) includes non-destructive
handling of overlapping areas. This is NOT true of memcpy() in
ANSI C but is true of memmove().
void* memcpy1 (register void* des, register void* src, register size_t =
len)
{
void* pdes =3D des;
for(; len>0; --len)
*(char*)des++ =3D *(char*)src++;
return pdes;
}
And it can be more efficient when copy a word directly,
Warning: source code below appears to have been MIMEd to death.
void* memcpy2 (register void* des, register void* src, register size_t =
len)
{
void* pdes =3D des;
switch(len%sizeof(int))
{
case 3: *(char*)des++ =3D *(char*)src++;
case 2: *(char*)des++ =3D *(char*)src++;
case 1: *(char*)des++ =3D *(char*)src++;
}
for(len/=3Dsizeof(int); len>0; --len)
*(int*)des++ =3D *(int*)src++;
I can see no reason why the above line won't smegfault on a
majority of calls to memcpy2() on a machine which enforces alignment
restrictions. Nasty example:
char buf[10240];
... something to put some data in buf ...
memcpy2(buf+3, buf, strlen(buf)+1);
Another possibility is that the machine doesn't enforce alignment
restrictions but comes up with the wrong answer. That is, assuming
4 byte ints,
*(int *) 0xdeadbee3
fetches or stores the integer at the addresses 0xdeadbee0 thru 0xdeadbee3,
*NOT* 0xdeadbee3 thru 0xdeadbee6.
return pdes;
}
It can be much more efficient if I copy more words rather than one word =
from des to src in "for" loop.
I don't consider "segmentation fault - core dumped" to be more
efficient than anything which doesn't core dump. There are ways
to copy words at a time in the presence of alignment restrictions.
This isn't it.
Anyhow, memcpy2() should run faster than =
memcpy() does when processing large memory blocks, I believe.
I believe that any such statement about how performance otto-be is
made *BECAUSE* it is wrong.
But, when =
I test them(copy between two 10240 bytes memory blocks), I am surprised =
to find that memcpy() runs the fastest. This result make me completely =
confused. Do you know the reason? Would you kind to explain it to me? =
Thank you!
I don't see any measurement methodologies or test results here.
Any performance measurements where the difference between two
ways of doing something are less than 1% or less than 10 times
the granularity of the clock being used to measure the time are
likely crap. And multitasking screws things up even worse.
The best performance demonstrations are those where you can
easily measure the difference in time with a wrist watch, *IF*
throwing the test in a loop and repeating it a million times
doesn't screw up what you are trying to measure (e.g. maybe
you don't want the test run completely from cache).
Also, are you sure you are using the memcpy() from the libiberty
directory? (As opposed to one in libc?) On FreeBSD the two
are very different.
Gordon L. Burditt