libitery directory in gcc-3.1.1 source code package

Discussion in 'C Programming' started by Liang Chen, Aug 14, 2004.

  1. Liang Chen

    Liang Chen Guest

    Is the file "bcopy.c" in the "libitery" directory the implement of the
    GNU C library function "bcopy"? If so, how can it run so fast?(copy-by-byte
    rather than copy-by-word) When I copy the code of "libitery/bcopy.c" to my
    own code, I find that it is so slow even if I turn on "-O3" and define
    "NDEBUG". Why?
    Liang Chen, Aug 14, 2004
    #1
    1. Advertising

  2. > Is the file "bcopy.c" in the "libitery" directory the implement of the
    >GNU C library function "bcopy"?


    Most likely. I'm not sure I'm looking at the same version you are,
    and that interface and functionality of "bcopy()" predates GNU, I
    believe, but it looks like it.

    >If so, how can it run so fast?


    Who claims that it does run fast, whether so or non-so? And with
    what evidence?

    >(copy-by-byte
    >rather than copy-by-word)
    >When I copy the code of "libitery/bcopy.c" to my
    >own code, I find that it is so slow even if I turn on "-O3" and define
    >"NDEBUG". Why?


    That implementation of bcopy() (which seems to be portable to all
    platforms) still runs faster than a program that needs bcopy() but
    doesn't have any implementation of it at all (and therefore won't
    link).

    Often you have a tradeoff: pick 1:

    portable and mediocre performance
    unportable, good performance on some platforms,
    won't work or works incorrectly on others
    unportable, good performance on some platforms,
    terrible performance on others

    Gordon L. Burditt
    Gordon Burditt, Aug 14, 2004
    #2
    1. Advertising

  3. Liang Chen

    Liang Chen Guest

    Thank you for your reply, and I should say sorry for my unclear expression of these questions, which should mainly attribute to my poor English :(

    In "libitery", I find the file, "memcpy.c". In it, there is the following code fragment,

    PTR
    DEFUN(memcpy, (out, in, length), PTR out AND const PTR in AND size_t length)
    {
    bcopy(in, out, length);
    return out;
    }


    It is clear that memcpy() calls bcopy(). So, I open "bcopy.c" in the same directory and find these codes,

    void
    bcopy (src, dest, len)
    register char *src, *dest;
    int len;
    {
    if (dest < src)
    while (len--)
    *dest++ = *src++;
    else
    {
    char *lasts = src + (len-1);
    char *lastd = dest + (len-1);
    while (len--)
    *(char *)lastd-- = *(char *)lasts--;
    }
    }


    This version of bcopy() is implemented to behave more "correctly" when memory blocks are overlaped. We know that according to the C89 standard, function memcpy() does not need to have this kind of "correct" behavior(maybe bcopy() needs for some dependence issues), and if a programmer calls memcpy() with two overlaped memory blocks, its behavior is not defined. So, I feel that this implementation of memcpy() is too awful. The following implementation can be better,

    void* memcpy1 (register void* des, register void* src, register size_t len)
    {
    void* pdes = des;

    for(; len>0; --len)
    *(char*)des++ = *(char*)src++;

    return pdes;
    }

    And it can be more efficient when copy a word directly,

    void* memcpy2 (register void* des, register void* src, register size_t len)
    {
    void* pdes = des;

    switch(len%sizeof(int))
    {
    case 3: *(char*)des++ = *(char*)src++;
    case 2: *(char*)des++ = *(char*)src++;
    case 1: *(char*)des++ = *(char*)src++;
    }
    for(len/=sizeof(int); len>0; --len)
    *(int*)des++ = *(int*)src++;

    return pdes;
    }

    It can be much more efficient if I copy more words rather than one word from des to src in "for" loop. Anyhow, memcpy2() should run faster than memcpy() does when processing large memory blocks, I believe. But, when I test them(copy between two 10240 bytes memory blocks), I am surprised to find that memcpy() runs the fastest. This result make me completely confused. Do you know the reason? Would you kind to explain it to me? Thank you!


    Liang Chen

    "Gordon Burditt" <> wrote in message news:cfk1d1$...
    > > Is the file "bcopy.c" in the "libitery" directory the implement of the
    > >GNU C library function "bcopy"?

    >
    > Most likely. I'm not sure I'm looking at the same version you are,
    > and that interface and functionality of "bcopy()" predates GNU, I
    > believe, but it looks like it.
    >
    > >If so, how can it run so fast?

    >
    > Who claims that it does run fast, whether so or non-so? And with
    > what evidence?
    >
    > >(copy-by-byte
    > >rather than copy-by-word)
    > >When I copy the code of "libitery/bcopy.c" to my
    > >own code, I find that it is so slow even if I turn on "-O3" and define
    > >"NDEBUG". Why?

    >
    > That implementation of bcopy() (which seems to be portable to all
    > platforms) still runs faster than a program that needs bcopy() but
    > doesn't have any implementation of it at all (and therefore won't
    > link).
    >
    > Often you have a tradeoff: pick 1:
    >
    > portable and mediocre performance
    > unportable, good performance on some platforms,
    > won't work or works incorrectly on others
    > unportable, good performance on some platforms,
    > terrible performance on others
    >
    > Gordon L. Burditt
    Liang Chen, Aug 16, 2004
    #3
  4. >This version of bcopy() is implemented to behave more "correctly" when =
    >memory blocks are overlaped. We know that according to the C89 standard, =
    >function memcpy() does not need to have this kind of "correct" =
    >behavior(maybe bcopy() needs for some dependence issues), and if a =
    >programmer calls memcpy() with two overlaped memory blocks, its behavior =
    >is not defined. So, I feel that this implementation of memcpy() is too =
    >awful. The following implementation can be better,


    I believe the "definition" of bcopy() (which is not ANSI C, but some
    kind of old BSD de-facto non-standard) includes non-destructive
    handling of overlapping areas. This is NOT true of memcpy() in
    ANSI C but is true of memmove().

    >void* memcpy1 (register void* des, register void* src, register size_t =
    >len)
    >{
    > void* pdes =3D des;
    >
    > for(; len>0; --len)
    > *(char*)des++ =3D *(char*)src++;
    >
    > return pdes;
    >}
    >
    >And it can be more efficient when copy a word directly,


    Warning: source code below appears to have been MIMEd to death.

    >void* memcpy2 (register void* des, register void* src, register size_t =
    >len)
    >{
    > void* pdes =3D des;
    >
    > switch(len%sizeof(int))
    > {
    > case 3: *(char*)des++ =3D *(char*)src++;
    > case 2: *(char*)des++ =3D *(char*)src++;
    > case 1: *(char*)des++ =3D *(char*)src++;
    > }
    > for(len/=3Dsizeof(int); len>0; --len)
    > *(int*)des++ =3D *(int*)src++;


    I can see no reason why the above line won't smegfault on a
    majority of calls to memcpy2() on a machine which enforces alignment
    restrictions. Nasty example:
    char buf[10240];

    ... something to put some data in buf ...
    memcpy2(buf+3, buf, strlen(buf)+1);

    Another possibility is that the machine doesn't enforce alignment
    restrictions but comes up with the wrong answer. That is, assuming
    4 byte ints,
    *(int *) 0xdeadbee3
    fetches or stores the integer at the addresses 0xdeadbee0 thru 0xdeadbee3,
    *NOT* 0xdeadbee3 thru 0xdeadbee6.

    > return pdes;
    >}
    >
    >It can be much more efficient if I copy more words rather than one word =
    >from des to src in "for" loop.


    I don't consider "segmentation fault - core dumped" to be more
    efficient than anything which doesn't core dump. There are ways
    to copy words at a time in the presence of alignment restrictions.
    This isn't it.

    >Anyhow, memcpy2() should run faster than =
    >memcpy() does when processing large memory blocks, I believe.


    I believe that any such statement about how performance otto-be is
    made *BECAUSE* it is wrong.

    >But, when =
    >I test them(copy between two 10240 bytes memory blocks), I am surprised =
    >to find that memcpy() runs the fastest. This result make me completely =
    >confused. Do you know the reason? Would you kind to explain it to me? =
    >Thank you!


    I don't see any measurement methodologies or test results here.
    Any performance measurements where the difference between two
    ways of doing something are less than 1% or less than 10 times
    the granularity of the clock being used to measure the time are
    likely crap. And multitasking screws things up even worse.
    The best performance demonstrations are those where you can
    easily measure the difference in time with a wrist watch, *IF*
    throwing the test in a loop and repeating it a million times
    doesn't screw up what you are trying to measure (e.g. maybe
    you don't want the test run completely from cache).

    Also, are you sure you are using the memcpy() from the libiberty
    directory? (As opposed to one in libc?) On FreeBSD the two
    are very different.

    Gordon L. Burditt
    Gordon Burditt, Aug 16, 2004
    #4
  5. Liang Chen

    CBFalconer Guest

    Liang Chen wrote:
    >
    > Part 1.1 Type: Plain Text (text/plain)
    > Encoding: quoted-printable


    Please do not use html or mime attachments in newsgroups.

    --
    Chuck F () ()
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net> USE worldnet address!
    CBFalconer, Aug 16, 2004
    #5
  6. Liang Chen

    Liang Chen Guest

    > I can see no reason why the above line won't smegfault on a
    > majority of calls to memcpy2() on a machine which enforces alignment
    > restrictions. Nasty example:
    > char buf[10240];
    >
    > ... something to put some data in buf ...
    > memcpy2(buf+3, buf, strlen(buf)+1);


    Could I consider that memcpy2() is un-portable and hardware-sensitive?

    > Another possibility is that the machine doesn't enforce alignment
    > restrictions but comes up with the wrong answer. That is, assuming
    > 4 byte ints,
    > *(int *) 0xdeadbee3
    > fetches or stores the integer at the addresses 0xdeadbee0 thru 0xdeadbee3,
    > *NOT* 0xdeadbee3 thru 0xdeadbee6.


    I run and test my programmes on a PC. The CPU is Intel Pentium. The OS is
    Linux 2.4.18-12L, not xBSD. I use GDB to debug memcpy2(), and I find that
    the situation is not the same as you said above. *(int*)0xdeadbee3 does
    fetch and store the integer, for example, at the addresses 0xdeadbee3 thru
    0xdeadbee6 rather than 0xdeadbee0 thru 0xdeadbee3.

    > I don't consider "segmentation fault - core dumped" to be more
    > efficient than anything which doesn't core dump. There are ways
    > to copy words at a time in the presence of alignment restrictions.
    > This isn't it.


    I checked my programme thoroughly last night. Now, memcpy2() looks like
    this,

    void* memcpy2 (register void* dest, register void* src, register size_t len)
    {
    void* pdest = dest;

    for(; len%sizeof(int)!=0; --len, dest=(char*)dest+1, src=(char*)src+1)
    *(char*)dest = *(char*)src;
    for(len/=sizeof(int); len>0; --len, dest=(int*)dest+1, src=(int*)src+1)
    *(int*)dest = *(int*)src;

    return pdest;
    }

    It is a ANSI C program this time. But my machine doesn't enforce alignment
    restrictions. I do not know how to copy words at a time in the presence of
    alignment restrictions. Can you give me some examples or hints?

    > I don't see any measurement methodologies or test results here.
    > Any performance measurements where the difference between two
    > ways of doing something are less than 1% or less than 10 times
    > the granularity of the clock being used to measure the time are
    > likely crap. And multitasking screws things up even worse.
    > The best performance demonstrations are those where you can
    > easily measure the difference in time with a wrist watch, *IF*
    > throwing the test in a loop and repeating it a million times
    > doesn't screw up what you are trying to measure (e.g. maybe
    > you don't want the test run completely from cache).


    Now memcpy2() is as fast as memcpy() in library.

    > Also, are you sure you are using the memcpy() from the libiberty
    > directory? (As opposed to one in libc?) On FreeBSD the two
    > are very different.


    When I say "memcpy()", I mean the memcpy() in libc.
    They are different? You mean the memcpy() in libiberty is not the real code
    to be compiled to add into libc? But, does the libc be made when I MAKE a
    GCC package? If it does, where is it's source codes, whatever they are C
    codes or ASM codes?

    Chen L.
    Liang Chen, Aug 17, 2004
    #6
  7. Liang Chen

    Liang Chen Guest

    > I can see no reason why the above line won't smegfault on a
    > majority of calls to memcpy2() on a machine which enforces alignment
    > restrictions. Nasty example:
    > char buf[10240];
    >
    > ... something to put some data in buf ...
    > memcpy2(buf+3, buf, strlen(buf)+1);


    Could I consider that memcpy2() is un-portable and hardware-sensitive?

    > Another possibility is that the machine doesn't enforce alignment
    > restrictions but comes up with the wrong answer. That is, assuming
    > 4 byte ints,
    > *(int *) 0xdeadbee3
    > fetches or stores the integer at the addresses 0xdeadbee0 thru 0xdeadbee3,
    > *NOT* 0xdeadbee3 thru 0xdeadbee6.


    I run and test my programmes on a PC. The CPU is Intel Pentium. The OS is
    Linux 2.4.18-12L, not xBSD. I use GDB to debug memcpy2(), and I find that
    the situation is not the same as you said above. *(int*)0xdeadbee3 does
    fetch and store the integer, for example, at the addresses 0xdeadbee3 thru
    0xdeadbee6 rather than 0xdeadbee0 thru 0xdeadbee3.

    > I don't consider "segmentation fault - core dumped" to be more
    > efficient than anything which doesn't core dump. There are ways
    > to copy words at a time in the presence of alignment restrictions.
    > This isn't it.


    I checked my programme thoroughly last night. Now, memcpy2() looks like
    this,

    void* memcpy2 (register void* dest, register void* src, register size_t len)
    {
    void* pdest = dest;

    for(; len%sizeof(int)!=0; --len, dest=(char*)dest+1, src=(char*)src+1)
    *(char*)dest = *(char*)src;
    for(len/=sizeof(int); len>0; --len, dest=(int*)dest+1, src=(int*)src+1)
    *(int*)dest = *(int*)src;

    return pdest;
    }

    It is a ANSI C program this time. But my machine doesn't enforce alignment
    restrictions. I do not know how to copy words at a time in the presence of
    alignment restrictions. Can you give me some examples or hints?

    > I don't see any measurement methodologies or test results here.
    > Any performance measurements where the difference between two
    > ways of doing something are less than 1% or less than 10 times
    > the granularity of the clock being used to measure the time are
    > likely crap. And multitasking screws things up even worse.
    > The best performance demonstrations are those where you can
    > easily measure the difference in time with a wrist watch, *IF*
    > throwing the test in a loop and repeating it a million times
    > doesn't screw up what you are trying to measure (e.g. maybe
    > you don't want the test run completely from cache).


    Now memcpy2() is as fast as memcpy() in library.

    > Also, are you sure you are using the memcpy() from the libiberty
    > directory? (As opposed to one in libc?) On FreeBSD the two
    > are very different.


    When I say "memcpy()", I mean the memcpy() in libc.
    They are different? You mean the memcpy() in libiberty is not the real code
    to be compiled to add into libc? But, does the libc be made when I MAKE a
    GCC package? If it does, where is it's source codes, whatever they are C
    codes or ASM codes?

    Chen L.
    Liang Chen, Aug 17, 2004
    #7
  8. Liang Chen

    Chris Torek Guest

    [someone noted possible alignment problems in some code variants]

    In article <news:cfrqn3$5fo$99.com>
    Liang Chen <> wrote:
    >I run and test my programmes on a PC. The CPU is Intel Pentium. ...


    Pentium-based systems never[%] enforce alignment constraints.
    Try a MIPS, ARM, or SPARC-based system, for instance (if you
    can get hold of one).

    >When I say "memcpy()", I mean the memcpy() in libc.
    >They are different? You mean the memcpy() in libiberty is not the real code
    >to be compiled to add into libc? But, does the libc be made when I MAKE a
    >GCC package? If it does, where is it's source codes, whatever they are C
    >codes or ASM codes?


    None of these are really questions about using Standard C, but rather
    about how to build GNU programs with nonstandard extensions.

    As it happens, the answer (based on your earlier mention of underlying
    OS -- which I snipped) is that they are indeed different, the source
    code is not in libiberty at all, and the source code *is* available
    somewhere (because of the nature of Linux) but it is difficult to
    say precisely where (again because of the nature of Linux :) ).
    The Linux C library is built when you build the Linux C library --
    which, unless you-the-reader rebuild Linux, is not something you-
    the-reader would normally do, even when installing various GNU
    software.

    As it also happens, if you use the GNU C compiler on a Pentium
    system and turn optimization up high, calls to memcpy() often never
    even call anything at all -- they turn into inline assembly code
    instead. The compiler is allowed to do this because the name
    "memcpy" is reserved, so the compiler can be sure precisely what
    any call to memcpy() is supposed to do. This in turn means that
    if you attempt to replace memcpy(), but do it by supplying a
    different memcpy() function, your new function may never get called
    at all!

    The behavior described in the last paragraph above -- in which an
    attempt to replace a C library function with some other substitute
    fails -- is allowed by the C standard. If you want your programs
    to run on any system that supports Standard C, do not attempt to
    override library functions: if it works at all, it may not work
    correctly.
    --
    In-Real-Life: Chris Torek, Wind River Systems
    Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
    email: forget about it http://web.torek.net/torek/index.html
    Reading email is like searching for food in the garbage, thanks to spammers.
    Chris Torek, Aug 17, 2004
    #8
  9. Liang Chen

    Chris Torek Guest

    In article <news:> I wrote:
    >Pentium-based systems never[%] enforce alignment constraints.


    Gah, I forgot the footnote:

    [%] What, never?
    No, never!
    What, never?
    Well, hardly ever!

    (The SSE instructions require alignment.)
    --
    In-Real-Life: Chris Torek, Wind River Systems
    Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
    email: forget about it http://web.torek.net/torek/index.html
    Reading email is like searching for food in the garbage, thanks to spammers.
    Chris Torek, Aug 17, 2004
    #9
  10. Liang Chen

    Ben Pfaff Guest

    Chris Torek <> writes:

    > In article <news:> I wrote:
    >>Pentium-based systems never[%] enforce alignment constraints.

    >
    > Gah, I forgot the footnote:
    >
    > [%] What, never?
    > No, never!
    > What, never?
    > Well, hardly ever!
    >
    > (The SSE instructions require alignment.)


    Also, if you set bit 18, called "AC" or "Alignment Check", in
    EFLAGS, then most unaligned accesses in user mode will fault.
    --
    "I should killfile you where you stand, worthless human." --Kaz
    Ben Pfaff, Aug 17, 2004
    #10
  11. Liang Chen

    Dan Pop Guest

    In <> Ben Pfaff <> writes:

    >Chris Torek <> writes:
    >
    >> In article <news:> I wrote:
    >>>Pentium-based systems never[%] enforce alignment constraints.

    >>
    >> Gah, I forgot the footnote:
    >>
    >> [%] What, never?
    >> No, never!
    >> What, never?
    >> Well, hardly ever!
    >>
    >> (The SSE instructions require alignment.)

    >
    >Also, if you set bit 18, called "AC" or "Alignment Check", in
    >EFLAGS, then most unaligned accesses in user mode will fault.


    Unfortunately, no Pentium-based OS in wide use does it.

    Dan
    --
    Dan Pop
    DESY Zeuthen, RZ group
    Email:
    Dan Pop, Aug 17, 2004
    #11
  12. >> I can see no reason why the above line won't smegfault on a
    >> majority of calls to memcpy2() on a machine which enforces alignment
    >> restrictions. Nasty example:
    >> char buf[10240];
    >>
    >> ... something to put some data in buf ...
    >> memcpy2(buf+3, buf, strlen(buf)+1);

    >
    >Could I consider that memcpy2() is un-portable and hardware-sensitive?


    Yes.

    >> Another possibility is that the machine doesn't enforce alignment
    >> restrictions but comes up with the wrong answer. That is, assuming
    >> 4 byte ints,
    >> *(int *) 0xdeadbee3
    >> fetches or stores the integer at the addresses 0xdeadbee0 thru 0xdeadbee3,
    >> *NOT* 0xdeadbee3 thru 0xdeadbee6.

    >
    >I run and test my programmes on a PC. The CPU is Intel Pentium.


    This is not a CPU that enforces alignment restrictions, in general.
    There's a bit you can turn on to try enforcing restrictions, but I
    don't think any major OS running on an i386 platform lets you use it.

    >The OS is
    >Linux 2.4.18-12L, not xBSD. I use GDB to debug memcpy2(), and I find that
    >the situation is not the same as you said above. *(int*)0xdeadbee3 does
    >fetch and store the integer, for example, at the addresses 0xdeadbee3 thru
    >0xdeadbee6 rather than 0xdeadbee0 thru 0xdeadbee3.


    It could behave that way on some CPU. I didn't say it would
    on the one you happen to use.

    >> I don't consider "segmentation fault - core dumped" to be more
    >> efficient than anything which doesn't core dump. There are ways
    >> to copy words at a time in the presence of alignment restrictions.
    >> This isn't it.

    >
    >I checked my programme thoroughly last night. Now, memcpy2() looks like
    >this,
    >
    >void* memcpy2 (register void* dest, register void* src, register size_t len)
    >{
    > void* pdest = dest;
    >
    > for(; len%sizeof(int)!=0; --len, dest=(char*)dest+1, src=(char*)src+1)
    > *(char*)dest = *(char*)src;
    > for(len/=sizeof(int); len>0; --len, dest=(int*)dest+1, src=(int*)src+1)
    > *(int*)dest = *(int*)src;
    >
    > return pdest;
    >}


    I don't see any significant change: casting a void * pointer to
    int * and then dereferencing it can cause a segfault.

    >It is a ANSI C program this time.


    One which invokes the wrath of undefined behavior under many
    combinations of parameters which are perfectly acceptable to pass
    to memcpy().

    >But my machine doesn't enforce alignment
    >restrictions. I do not know how to copy words at a time in the presence of
    >alignment restrictions. Can you give me some examples or hints?


    Example: if you dereference an int pointer containing an address that
    is not a multiple of 4, you get a smegmentation fault.

    Question: how do you *PORTABLY* figure out whether a pointer
    is aligned to a multiple of 4?

    If dest and src are 3 apart, then this:
    > *(int*)dest = *(int*)src;

    is *GUARANTEED* to cause a smegmentation fault on such a machine, because
    one of them MUST be odd. If you increment both of them by the
    same amount first, you still have the same problem.

    >> I don't see any measurement methodologies or test results here.
    >> Any performance measurements where the difference between two
    >> ways of doing something are less than 1% or less than 10 times
    >> the granularity of the clock being used to measure the time are
    >> likely crap. And multitasking screws things up even worse.
    >> The best performance demonstrations are those where you can
    >> easily measure the difference in time with a wrist watch, *IF*
    >> throwing the test in a loop and repeating it a million times
    >> doesn't screw up what you are trying to measure (e.g. maybe
    >> you don't want the test run completely from cache).

    >
    >Now memcpy2() is as fast as memcpy() in library.


    If you don't tell me how you measured it, or at least establish
    credentials in knowing how to do benchmarks, I'm not going to believe
    any statement that X is faster than Y on platform Z. This could
    just as well mean "X is faster than Y on platform Z by
    0.0000000000000001%", which is a meaningless difference.

    >> Also, are you sure you are using the memcpy() from the libiberty
    >> directory? (As opposed to one in libc?) On FreeBSD the two
    >> are very different.

    >
    >When I say "memcpy()", I mean the memcpy() in libc.
    >They are different? You mean the memcpy() in libiberty is not the real code
    >to be compiled to add into libc? But, does the libc be made when I MAKE a
    >GCC package? If it does, where is it's source codes, whatever they are C
    >codes or ASM codes?


    When I make a GCC package, I do not make libc, as GCC does not include
    a C library at all (on platforms such as FreeBSD, Ultrix, Tru64 aka OSF,
    etc.). I believe that even on Linux the C library is not considered
    to be part of gcc.

    On FreeBSD, the memcpy() and bcopy() code under 'libiberty' is very
    different from the code under /usr/src/lib/libc.

    Gordon L. Burditt
    Gordon Burditt, Aug 18, 2004
    #12
  13. Liang Chen

    L. Chen Guest

    > Question: how do you *PORTABLY* figure out whether a pointer
    > is aligned to a multiple of 4?


    How about this one?

    void* memcpy3 (register void* dest, register void* src, register size_t len)
    {
    void* pdest = dest;

    if( ((unsigned int)dest)%4==((unsigned int)src)%4 )
    {
    for(; dest%4!=0; --len, dest=(char*)dest+1, src=(char*)src+1)
    *(char*)dest = *(char*)src;
    for(; len>0; len-=sizeof(int), dest=(int*)dest+1, src=(int*)src+1)
    *(int*)dest = *(int*)src;
    for(; len>0; --len, dest=(char*)dest+1, src=(char*)src+1)
    *(char*)dest = *(char*)src;
    }
    else
    {
    for(; len>0; --len, dest=(char*)dest+1, src=(char*)src+1)
    *(char*)dest = *(char*)src;
    }

    return pdest;
    }

    > When I make a GCC package, I do not make libc, as GCC does not include
    > a C library at all (on platforms such as FreeBSD, Ultrix, Tru64 aka OSF,
    > etc.). I believe that even on Linux the C library is not considered
    > to be part of gcc.
    >
    > On FreeBSD, the memcpy() and bcopy() code under 'libiberty' is very
    > different from the code under /usr/src/lib/libc.


    Oh, I see.(I always think when I build GCC, it will automatically re-compile
    libc.)
    L. Chen, Aug 19, 2004
    #13
  14. Liang Chen

    L. Chen Guest

    > Question: how do you *PORTABLY* figure out whether a pointer
    > is aligned to a multiple of 4?


    How about this one?

    void* memcpy3 (register void* dest, register void* src, register size_t len)
    {
    void* pdest = dest;

    if( ((unsigned int)dest)%4==((unsigned int)src)%4 )
    {
    for(; dest%4!=0; --len, dest=(char*)dest+1, src=(char*)src+1)
    *(char*)dest = *(char*)src;
    for(; len>0; len-=sizeof(int), dest=(int*)dest+1, src=(int*)src+1)
    *(int*)dest = *(int*)src;
    for(; len>0; --len, dest=(char*)dest+1, src=(char*)src+1)
    *(char*)dest = *(char*)src;
    }
    else
    {
    for(; len>0; --len, dest=(char*)dest+1, src=(char*)src+1)
    *(char*)dest = *(char*)src;
    }

    return pdest;
    }

    > When I make a GCC package, I do not make libc, as GCC does not include
    > a C library at all (on platforms such as FreeBSD, Ultrix, Tru64 aka OSF,
    > etc.). I believe that even on Linux the C library is not considered
    > to be part of gcc.
    >
    > On FreeBSD, the memcpy() and bcopy() code under 'libiberty' is very
    > different from the code under /usr/src/lib/libc.


    Oh, I see.(I always think when I build GCC, it will automatically re-compile
    libc.)

    ---
    Liang Chen
    L. Chen, Aug 19, 2004
    #14
  15. Liang Chen

    CBFalconer Guest

    "L. Chen" wrote:
    >
    > > Question: how do you *PORTABLY* figure out whether a pointer
    > > is aligned to a multiple of 4?

    >
    > How about this one?
    >
    > void* memcpy3 (register void* dest, register void* src, register size_t len)
    > {
    > void* pdest = dest;
    >
    > if( ((unsigned int)dest)%4==((unsigned int)src)%4 )


    Nope. Casting a pointer to any form of integer is not guaranteed
    to be reversible, and the results are implementation defined.

    --
    Chuck F () ()
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net> USE worldnet address!
    CBFalconer, Aug 19, 2004
    #15
  16. Liang Chen

    L. Chen Guest

    > Pentium-based systems never[%] enforce alignment constraints.
    > Try a MIPS, ARM, or SPARC-based system, for instance (if you
    > can get hold of one).




    > As it happens, the answer (based on your earlier mention of underlying
    > OS -- which I snipped) is that they are indeed different, the source
    > code is not in libiberty at all, and the source code *is* available
    > somewhere (because of the nature of Linux) but it is difficult to
    > say precisely where (again because of the nature of Linux :) ).
    > The Linux C library is built when you build the Linux C library --
    > which, unless you-the-reader rebuild Linux, is not something you-
    > the-reader would normally do, even when installing various GNU
    > software.


    I am a beginner in Linux. Sometimes, I lack the basic knowledge about it :(

    > As it also happens, if you use the GNU C compiler on a Pentium
    > system and turn optimization up high, calls to memcpy() often never
    > even call anything at all -- they turn into inline assembly code
    > instead. The compiler is allowed to do this because the name
    > "memcpy" is reserved, so the compiler can be sure precisely what
    > any call to memcpy() is supposed to do. This in turn means that
    > if you attempt to replace memcpy(), but do it by supplying a
    > different memcpy() function, your new function may never get called
    > at all!


    I find gcc has been more and more clever.

    > The behavior described in the last paragraph above -- in which an
    > attempt to replace a C library function with some other substitute
    > fails -- is allowed by the C standard. If you want your programs
    > to run on any system that supports Standard C, do not attempt to
    > override library functions: if it works at all, it may not work
    > correctly.


    I am not going to override them. I am just surperised about the source codes
    in libitery. Of course, I know that they are not the source code of memcpy
    in libc.:p

    ---
    L. Chen
    L. Chen, Aug 19, 2004
    #16
  17. Liang Chen

    L. Chen Guest


    > Nope. Casting a pointer to any form of integer is not guaranteed
    > to be reversible, and the results are implementation defined.


    Sometimes, I feel it is so difficult to make the C programmes portable. :(
    There is the modified one,

    void* memcpy3 (register void* dest, register void* src, register size_t len)
    {
    void* pdest = dest;

    if( (dest-src)%4==0 )
    {
    for(; dest%4!=0; --len, dest=(char*)dest+1, src=(char*)src+1)
    *(char*)dest = *(char*)src;
    for(; len>0; len-=sizeof(int), dest=(int*)dest+1, src=(int*)src+1)
    *(int*)dest = *(int*)src;
    for(; len>0; --len, dest=(char*)dest+1, src=(char*)src+1)
    *(char*)dest = *(char*)src;
    }
    else
    {
    for(; len>0; --len, dest=(char*)dest+1, src=(char*)src+1)
    *(char*)dest = *(char*)src;
    }

    return pdest;
    }

    (dest-src) results a ptrdiff_t variable. Then, I treat it as an integer. Is
    that OK?


    ---
    L. Chen
    L. Chen, Aug 19, 2004
    #17
  18. "L. Chen" <> writes:
    > > Nope. Casting a pointer to any form of integer is not guaranteed
    > > to be reversible, and the results are implementation defined.

    >
    > Sometimes, I feel it is so difficult to make the C programmes portable. :(
    > There is the modified one,
    >
    > void* memcpy3 (register void* dest, register void* src, register size_t len)
    > {
    > void* pdest = dest;
    >
    > if( (dest-src)%4==0 )
    > {
    > for(; dest%4!=0; --len, dest=(char*)dest+1, src=(char*)src+1)
    > *(char*)dest = *(char*)src;
    > for(; len>0; len-=sizeof(int), dest=(int*)dest+1, src=(int*)src+1)
    > *(int*)dest = *(int*)src;
    > for(; len>0; --len, dest=(char*)dest+1, src=(char*)src+1)
    > *(char*)dest = *(char*)src;
    > }
    > else
    > {
    > for(; len>0; --len, dest=(char*)dest+1, src=(char*)src+1)
    > *(char*)dest = *(char*)src;
    > }
    >
    > return pdest;
    > }
    >
    > (dest-src) results a ptrdiff_t variable. Then, I treat it as an integer. Is
    > that OK?


    You can certainly treat (dest-src) as an integer, but it doesn't
    buy you anything.

    For one thing, I think you're assuming that sizeof(int)==4.

    You're still computing dest%4; the "%" operator doesn't apply to
    pointers.

    To implement something like memcpy() efficiently, you pretty much have
    to make non-portable assumptions. There's no portable way to detect
    the alignment of a pointer, but there's almost always a reasonably
    efficient non-portable way to do it (such as examining the low-order
    bits of the pointer's representation).

    Assume the CPU traps on unaligned memory accesses.

    If the source and destination are both word-aligned, or are both
    misaligned by the same amount, you can probably save some time by
    copying a word at a time.

    If the source and destination address differ in alignment by 1 byte,
    you can't copy data from one to the other using chunks larger than 1
    byte; a 4-byte aligned chunk of the source corresponds to a misaligned
    4-byte chunk of the target. If they differ in alignment by 2 bytes,
    you can probably copy 2-byte chunks.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    We must do something. This is something. Therefore, we must do this.
    Keith Thompson, Aug 19, 2004
    #18
  19. Liang Chen

    Tim Rentsch Guest

    "L. Chen" <> writes:

    > > Question: how do you *PORTABLY* figure out whether a pointer
    > > is aligned to a multiple of 4?

    >
    > How about this one?
    >
    > void* memcpy3 (register void* dest, register void* src, register size_t len)
    > {
    > void* pdest = dest;
    >
    > if( ((unsigned int)dest)%4==((unsigned int)src)%4 )
    > {
    > for(; dest%4!=0; --len, dest=(char*)dest+1, src=(char*)src+1)
    > *(char*)dest = *(char*)src;
    > for(; len>0; len-=sizeof(int), dest=(int*)dest+1, src=(int*)src+1)
    > *(int*)dest = *(int*)src;
    > for(; len>0; --len, dest=(char*)dest+1, src=(char*)src+1)
    > *(char*)dest = *(char*)src;
    > }
    > else
    > {
    > for(; len>0; --len, dest=(char*)dest+1, src=(char*)src+1)
    > *(char*)dest = *(char*)src;
    > }
    >
    > return pdest;
    > }


    First let's see if we can fix the definite bugs (not counting any
    possible problems with casting a pointer to an unsigned int, I count
    at least three), and clean the code up a bit:

    void *
    memcpy3( register void *dest, register void *src, register size_t len )
    {
    char *d = dest, *s = src;
    size_t n = len;
    const size_t K = sizeof(int);

    while( n > 0 && (unsigned int)d % K != 0 ) n--, *d++ = *s++;

    if( (unsigned int)s % K == 0 ){
    while( n >= K ) n -= K, *(int*)d = *(int*)s, d += K, s += K;
    }

    while( n > 0 ) n--, *d++ = *s++;

    return dest;
    }


    Now let's return to the question at the start of the posting.

    Even though the method of testing pointer alignment in the code above
    isn't guaranteed to work, the fact is that it will work on many
    architectures (probably most architectures, but I expect that depends
    on how the counting is done). Since this is so, why not provide a
    standard means of checking for it? There could be a C preprocessor
    symbol, eg, SINGLE_LINEAR_ADDRESS_SPACE, that could be used to mean
    that pointers look like integers. Something along these lines could
    be written into the standard to provide a conformant means of writing
    code to do this kind of pointer manipulation. Make sense?
    Tim Rentsch, Aug 19, 2004
    #19
  20. Tim Rentsch <> writes:
    [...]
    > First let's see if we can fix the definite bugs (not counting any
    > possible problems with casting a pointer to an unsigned int, I count
    > at least three), and clean the code up a bit:
    >

    [snip]
    > {

    [snip]
    > if( (unsigned int)s % K == 0 ){
    >
    > Now let's return to the question at the start of the posting.
    >
    > Even though the method of testing pointer alignment in the code above
    > isn't guaranteed to work, the fact is that it will work on many
    > architectures (probably most architectures, but I expect that depends
    > on how the counting is done). Since this is so, why not provide a
    > standard means of checking for it? There could be a C preprocessor
    > symbol, eg, SINGLE_LINEAR_ADDRESS_SPACE, that could be used to mean
    > that pointers look like integers. Something along these lines could
    > be written into the standard to provide a conformant means of writing
    > code to do this kind of pointer manipulation. Make sense?


    I suspect that would encourage programmers to write code that only
    works if SINGLE_LINEAR_ADDRESS_SPACE is true. (Too many programmers
    do that already, of course.)

    Of course you can implement such a preprocessor symbol yourself, and
    configure it for each system. It's a little extra work, but frankly
    it probably should be.

    Currently, if I write portable code that will work even for a
    non-linear address space, I can recompile and run it on a
    "non-lineary" system and it should work.

    An example of a system where SINGLE_LINEAR_ADDRESS_SPACE would be
    undefined is a Cray vector system, where a machine address points to a
    64-bit word. The C compiler has CHAR_BIT==8 to allow for code
    portability, but a char* pointer has a 3-bit offset in the top of the
    word. Well written portable code works just fine. Code that makes
    assumptions about how pointers are represented doesn't. (The systems
    run a Unix-based OS, and most Unix-based software compiles and runs
    correctly, so the lack of ability to do that kind of low-level pointer
    manipulation hasn't been much of a problem.)

    Of course something like memcpy() can be made much more efficient if
    it can detect pointer alignment and copy word-by-word whenever
    possible. That's why memcpy() is in the standard library, where it
    can be implemented with non-portable code.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    We must do something. This is something. Therefore, we must do this.
    Keith Thompson, Aug 19, 2004
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    8
    Views:
    411
  2. Kevin P. Fleming

    C99 structure initialization in gcc-2.95.3 vs gcc-3.3.1

    Kevin P. Fleming, Nov 6, 2003, in forum: C Programming
    Replies:
    2
    Views:
    630
    Kevin P. Fleming
    Nov 6, 2003
  3. Replies:
    5
    Views:
    350
    Nathan Addy
    Sep 17, 2005
  4. ashnin

    GCC 3.4.3 and GCC 4.1.2

    ashnin, Jul 7, 2008, in forum: C++
    Replies:
    1
    Views:
    499
    Michael DOUBEZ
    Jul 7, 2008
  5. Krishna Chaitanya
    Replies:
    5
    Views:
    157
    Gunnar Hjalmarsson
    Apr 3, 2009
Loading...

Share This Page