a = b or memset/cpy?

Discussion in 'C Programming' started by nroberts, Feb 7, 2012.

  1. nroberts

    nroberts Guest

    memset and memcpy are turning up in profiles a lot. I'd like to speed
    things up a bit.

    Sometimes it is clear that using = to initialize a local would be
    better than memset. I might not gain anything, but at least there's a
    chance.

    However, can I gain performance improvements when zeroing out say some
    global element in an array like so:

    typedef struct x { int var0; char var1[20]; } X;

    X gX[30];

    void f(int slot)
    {
    X init = {0};

    gX[slot] = init;

    ...
    }

    vs.
    void f(int slot)
    {
    memset(&gX[slot], 0, sizeof(X));

    ...
    }

    Normally I wouldn't look for a micro-optimization like this but I'm
    kind of stuck with the parameters I'm given.
     
    nroberts, Feb 7, 2012
    #1
    1. Advertising

  2. nroberts

    Jens Gustedt Guest

    Am 02/07/2012 06:02 PM, schrieb nroberts:
    > X gX[30];
    >
    > void f(int slot)
    > {
    > X init = {0};
    >
    > gX[slot] = init;
    >
    > ...
    > }


    make it

    X const init = { 0 };

    or even better use a compound literal

    gX[slot] = (X const){ 0 };

    > Normally I wouldn't look for a micro-optimization like this but I'm
    > kind of stuck with the parameters I'm given.


    On any decent compiler the assignment version should not be worse that
    the memset version, because the compiler must be able to see that it
    is an object only filled with 0.

    On the other hand the assignment version *may* be better, when the
    compiler can do a data flow analysis that shows e.g that part of what
    you initialize is overwritten before being read.

    So I'd always prefer the assigment version.

    Jens
     
    Jens Gustedt, Feb 7, 2012
    #2
    1. Advertising

  3. nroberts

    James Kuyper Guest

    On 02/07/2012 12:41 PM, Jens Gustedt wrote:
    > Am 02/07/2012 06:02 PM, schrieb nroberts:
    >> X gX[30];
    >>
    >> void f(int slot)
    >> {
    >> X init = {0};
    >>
    >> gX[slot] = init;
    >>
    >> ...
    >> }

    >
    > make it
    >
    > X const init = { 0 };
    >
    > or even better use a compound literal
    >
    > gX[slot] = (X const){ 0 };
    >
    >> Normally I wouldn't look for a micro-optimization like this but I'm
    >> kind of stuck with the parameters I'm given.

    >
    > On any decent compiler the assignment version should not be worse that


    This is initialization, not assignment.

    > the memset version, because the compiler must be able to see that it
    > is an object only filled with 0.


    I've used a compiler which, given the following code:

    double array[10][1354][3] = {0};

    generated the equivalent of the following:

    array[0][0][0] = 0;
    array[0][0][1] = 0;
    etc.
    The resulting executable was noticeably larger that I had expected it to
    be. I was a little annoyed when I figured out what was going on. I
    changed it to use memset(), and got a lot smaller, and executed somewhat
    faster, too. The support person I talked with said that my use of {0}
    was unreasonable, not their compiler's code generation.
     
    James Kuyper, Feb 7, 2012
    #3
  4. nroberts

    nroberts Guest

    On Feb 7, 9:41 am, Jens Gustedt <> wrote:
    > Am 02/07/2012 06:02 PM, schrieb nroberts:
    >
    > > X gX[30];

    >
    > > void f(int slot)
    > > {
    > >   X init = {0};

    >
    > >   gX[slot] = init;

    >
    > >   ...
    > > }

    >
    > make it
    >
    > X const init = { 0 };
    >
    > or even better use a compound literal
    >
    > gX[slot] = (X const){ 0 };
    >
    > > Normally I wouldn't look for a micro-optimization like this but I'm
    > > kind of stuck with the parameters I'm given.

    >
    > On any decent compiler the assignment version should not be worse that
    > the memset version, because the compiler must be able to see that it
    > is an object only filled with 0.
    >
    > On the other hand the assignment version *may* be better, when the
    > compiler can do a data flow analysis that shows e.g that part of what
    > you initialize is overwritten before being read.
    >
    > So I'd always prefer the assigment version.
    >
    > Jens


    LOL!

    Nevermind. I'm not allowed to use this language feature. It's too
    "complex". People won't know what it does.

    Not the '=' operator... Initializing a structure to all 0 with = {0}.

    :/

    I keep running into bosses like this. Is this normal in the
    programming field or am I just incredibly unlucky?
     
    nroberts, Feb 7, 2012
    #4
  5. nroberts

    Jens Gustedt Guest

    Am 02/07/2012 07:40 PM, schrieb James Kuyper:
    > On 02/07/2012 12:41 PM, Jens Gustedt wrote:
    >> On any decent compiler the assignment version should not be worse that

    >
    > This is initialization, not assignment.


    No, you are mistaken. The relevant part is assignment to gX[slot]. The
    other part is just initialization of a const. In particular the
    initialization of the const qualified compound literal can be done at
    compile time if the compiler decides that it is beneficial (as if it
    where declared as a static variable).

    >> the memset version, because the compiler must be able to see that it
    >> is an object only filled with 0.

    >
    > I've used a compiler which, given the following code:
    >
    > double array[10][1354][3] = {0};
    >
    > generated the equivalent of the following:
    >
    > array[0][0][0] = 0;
    > array[0][0][1] = 0;
    > etc.
    > The resulting executable was noticeably larger that I had expected it to
    > be. I was a little annoyed when I figured out what was going on. I
    > changed it to use memset(), and got a lot smaller, and executed somewhat
    > faster, too. The support person I talked with said that my use of {0}
    > was unreasonable, not their compiler's code generation.


    How long ago and what compiler was that? My observation over the last
    years is that a compiler like gcc is capable of optimizing assignments
    to struct fields or different array members as if all of these were
    different variables.

    (and double may be special, setting all bytes to 0 and initializing
    with 0 must not necessarily be the same thing.)

    Jens
     
    Jens Gustedt, Feb 7, 2012
    #5
  6. nroberts

    Ben Pfaff Guest

    nroberts <> writes:

    > Nevermind. I'm not allowed to use this language feature. It's too
    > "complex". People won't know what it does.
    >
    > Not the '=' operator... Initializing a structure to all 0 with = {0}.


    Look on the bright side: on that basis, you should have no
    trouble avoiding C++ entirely at that workplace.
    --
    char a[]="\n .CJacehknorstu";int putchar(int);int main(void){unsigned long b[]
    ={0x67dffdff,0x9aa9aa6a,0xa77ffda9,0x7da6aa6a,0xa67f6aaa,0xaa9aa9f6,0x11f6},*p
    =b,i=24;for(;p+=!*p;*p/=4)switch(0[p]&3)case 0:{return 0;for(p--;i--;i--)case+
    2:{i++;if(i)break;else default:continue;if(0)case 1:putchar(a[i&15]);break;}}}
     
    Ben Pfaff, Feb 7, 2012
    #6
  7. On Feb 7, 6:40 pm, James Kuyper <> wrote:
    > The support person I talked with said that my use of {0}
    > was unreasonable, not their compiler's code generation.
    >

    Well what can he say? He can't patch the compiler to replace a long
    intialisation with a call to memset().
     
    Malcolm McLean, Feb 7, 2012
    #7
  8. nroberts

    nroberts Guest

    On Feb 7, 11:14 am, (Ben Pfaff) wrote:
    > nroberts <> writes:
    > > Nevermind.  I'm not allowed to use this language feature.  It's too
    > > "complex".  People won't know what it does.

    >
    > > Not the '=' operator... Initializing a structure to all 0 with = {0}.

    >
    > Look on the bright side: on that basis, you should have no
    > trouble avoiding C++ entirely at that workplace.


    I don't consider that a good thing.
     
    nroberts, Feb 7, 2012
    #8
  9. nroberts

    Shao Miller Guest

    On 2/7/2012 14:37, Malcolm McLean wrote:
    > On Feb 7, 6:40 pm, James Kuyper<> wrote:
    >> The support person I talked with said that my use of {0}
    >> was unreasonable, not their compiler's code generation.
    >>

    > Well what can he say? He can't patch the compiler to replace a long
    > intialisation with a call to memset().
    >


    Call == LOL. Good one. :)

    And the support person cannot patch the compiler to replace a 'struct'
    object assignment with a call to 'memcpy' either, presumably.

    I've used a Microsoft C implementation which actually will give you a
    linker error if you do:

    void func(void) {
    int array[42] = { 0 };
    return;
    }

    and choose not to link with the standard library... It complains about
    a missing 'memset' symbol...
     
    Shao Miller, Feb 7, 2012
    #9
  10. nroberts

    Shao Miller Guest

    On 2/7/2012 14:07, nroberts wrote:
    > LOL!
    >
    > Nevermind. I'm not allowed to use this language feature. It's too
    > "complex". People won't know what it does.
    >
    > Not the '=' operator... Initializing a structure to all 0 with = {0}.
    >
    > :/
    >
    > I keep running into bosses like this. Is this normal in the
    > programming field or am I just incredibly unlucky?


    That feature has been around since C89/C90. Perhaps you can find a
    clever way for your boss to find that out without losing face or without
    regretting disallowing its use.
     
    Shao Miller, Feb 7, 2012
    #10
  11. nroberts

    Shao Miller Guest

    On 2/7/2012 12:02, nroberts wrote:
    > memset and memcpy are turning up in profiles a lot. I'd like to speed
    > things up a bit.
    >


    You might find that the implementation actually translates a '= { 0
    };'-style initializer into a call to 'memset'. An experiment might
    reveal whether or not that's the case.

    > Sometimes it is clear that using = to initialize a local would be
    > better than memset. I might not gain anything, but at least there's a
    > chance.
    >


    I'm not sure how you could gain anything unless the call to 'memset'
    actually translates differently than a '= { 0 };'-style initializer.

    Did you know that after all subobjects that are explicitly initialized
    (by the initializer-list) have been so, the rest are initialized to what
    they would have been had the object been declared with 'static' storage
    duration? The whole containing object is thus "touched."

    > However, can I gain performance improvements when zeroing out say some
    > global element in an array like so:
    >
    > typedef struct x { int var0; char var1[20]; } X;
    >
    > X gX[30];
    >
    > void f(int slot)
    > {
    > X init = {0};
    >
    > gX[slot] = init;
    >
    > ...
    > }
    >
    > vs.
    > void f(int slot)
    > {
    > memset(&gX[slot], 0, sizeof(X));
    >
    > ...
    > }
    >


    Well these aren't the same. The former initializes all sub-objects to
    the "zeroey" values that would initialize a 'static'-storage-duration
    object having the same type as the sub-object and having no explicit
    initializer.

    The latter fills the object with bytes with the 'unsigned char' value
    '0', which is all-bits-zero.

    In your example, the 'struct' type 'X' has an 'int' member. The object
    representation of an 'int' can have padding bits that can be used any
    way the implementation pleases.

    If filling the padding bits with zeroes results in a trap representation
    for an 'int', then you might be in for a surprise.

    There are similar concerns for other types, including pointers, where a
    null pointer value might not be all-bits-zero.

    That is why I believe some people consider a '= { 0 };'-style
    initializer to be more portable than 'memset'. If portability isn't a
    concern, oh well.

    > Normally I wouldn't look for a micro-optimization like this but I'm
    > kind of stuck with the parameters I'm given.


    Optmizing and making portable might not always be compatible. If you
    have a particular set of implementations as your target(s), there might
    be "compiler intrinsics" that you can use which are
    implementation-defined extensions to C that could offer you speed
    advantages.

    For example, some Microsoft compilers offer '__movsd':

    http://msdn.microsoft.com/en-us/library/9d196b9h.aspx
     
    Shao Miller, Feb 7, 2012
    #11
  12. nroberts

    Eric Sosman Guest

    On 2/7/2012 12:02 PM, nroberts wrote:
    > memset and memcpy are turning up in profiles a lot. I'd like to speed
    > things up a bit.
    >
    > Sometimes it is clear that using = to initialize a local would be
    > better than memset. I might not gain anything, but at least there's a
    > chance.
    >
    > However, can I gain performance improvements when zeroing out say some
    > global element in an array like so:
    >
    > typedef struct x { int var0; char var1[20]; } X;
    >
    > X gX[30];
    >
    > void f(int slot)
    > {
    > X init = {0};
    >
    > gX[slot] = init;
    >
    > ...
    > }
    >
    > vs.
    > void f(int slot)
    > {
    > memset(&gX[slot], 0, sizeof(X));
    >
    > ...
    > }


    The official answer is: The definition of the C language says
    nothing about which constructs are faster or slower than others.

    That said, I would expect memset() to be faster, usually, if
    the wind is not unfavorable and the Moon is in the right quarter.
    Argument: In the assignment version, the code must allocate the auto
    variable `init', zero it, and then copy all those zeroes to `gX[slot]';
    on the face of it, this sounds like more work than just zeroing
    `gX[slot]' to begin with.

    It is just possible that a very smart compiler could (1) realize
    that the `init' variable is not actually necessary, (2) decide to
    clear `gX[slot]' directly instead of clearing `init' and copying,
    and (3) clear `gX[slot]' more efficiently than memset() can, perhaps
    with in-line code. My suspicion, though, is that a compiler smart
    enough for (1,2,3) would not at the same time be so dumb as to
    implement memset() with an actual call to an actual external function;
    you'd need a strange combination of brilliance and stupidity to get
    an advantage for initialize-and-copy.

    ... and, of course, measurement is the only way to be sure.

    > Normally I wouldn't look for a micro-optimization like this but I'm
    > kind of stuck with the parameters I'm given.


    My prejudice (and I admit it's something of a prejudice) would be
    to take a hard look at those memset() and memcpy() calls, with a view
    toward eliminating at least some of them -- if you can eliminate a
    call you get an infinite speedup, as opposed to a mere hundredfold!
    Making copies of bits you've already computed usually doesn't advance
    the state of the computation very much; making many duplicates of a
    single byte is also not usually a great addition to the program's
    "knowledge." There are, of course, exceptions: qsort() just rearranges
    bits you already own, for example, but can be useful nonetheless.
    Still, if memset() and memcpy() are dominating the run time, it seems
    likely that there may be a lot of needless setting and copying going
    on. See what you can jettison.

    --
    Eric Sosman
    d
     
    Eric Sosman, Feb 8, 2012
    #12
  13. nroberts

    Shao Miller Guest

    On 2/7/2012 18:58, Shao Miller wrote:
    >>
    >> typedef struct x { int var0; char var1[20]; } X;
    >>
    >> X gX[30];
    >>
    >> void f(int slot)
    >> {
    >> X init = {0};
    >>
    >> gX[slot] = init;
    >>
    >> ...
    >> }
    >>
    >> vs.
    >> void f(int slot)
    >> {
    >> memset(&gX[slot], 0, sizeof(X));
    >>
    >> ...
    >> }
    >>

    >
    > Well these aren't the same. The former initializes all sub-objects to
    > the "zeroey" values that would initialize a 'static'-storage-duration
    > object having the same type as the sub-object and having no explicit
    > initializer.
    >
    > The latter fills the object with bytes with the 'unsigned char' value
    > '0', which is all-bits-zero.
    >
    > In your example, the 'struct' type 'X' has an 'int' member. The object
    > representation of an 'int' can have padding bits that can be used any
    > way the implementation pleases.
    >
    > If filling the padding bits with zeroes results in a trap representation
    > for an 'int', then you might be in for a surprise.
    >


    Ben Bacarisse proved in another thread that my claim for a potential
    surprise is false; there is no potential for all-zero-bits in an
    integer's object representation to be a trap representation. Sorry
    about that!

    > There are similar concerns for other types, including pointers, where a
    > null pointer value might not be all-bits-zero.
    >


    Still applies for other things, like pointers. :)
     
    Shao Miller, Feb 8, 2012
    #13
  14. nroberts

    Jorgen Grahn Guest

    On Tue, 2012-02-07, nroberts wrote:
    > memset and memcpy are turning up in profiles a lot. I'd like to speed
    > things up a bit.
    >
    > Sometimes it is clear that using = to initialize a local would be
    > better than memset. I might not gain anything, but at least there's a
    > chance.


    For copying with memcpy(), I much prefer assignment since it doesn't
    bypass the type system, and is more readable.

    I won't comment on the memset() part.

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
     
    Jorgen Grahn, Feb 8, 2012
    #14
  15. nroberts

    Joe keane Guest

    In article <>,
    nroberts <> wrote:
    >memset and memcpy are turning up in profiles a lot.


    Indeed.

    >Sometimes it is clear that using = to initialize a local would be
    >better than memset.


    It's a shame if you call a function with a size parameter, when in fact
    the size is a compile-time constant. You also probably know a bit about
    alignment, whereas those guys have to assume the worst.

    >I might not gain anything, but at least there's a chance.


    Please to use real data! 'gprof' is very good at this. It works [so
    far as i have seen] on stdlib calls as well as your functions.

    It can tell you where you're getting killed by function call overhead,
    and where the copy is taking a long time, such that you may go to more
    length to avoid it. It can also (by switching back to a function) tell
    you where your 'optimization' does nothing except increase code size.
     
    Joe keane, Feb 8, 2012
    #15
  16. nroberts

    Ian Collins Guest

    On 02/ 9/12 09:05 AM, Joe keane wrote:
    > In article<>,
    > nroberts<> wrote:
    >> memset and memcpy are turning up in profiles a lot.

    >
    > Indeed.
    >
    >> Sometimes it is clear that using = to initialize a local would be
    >> better than memset.

    >
    > It's a shame if you call a function with a size parameter, when in fact
    > the size is a compile-time constant. You also probably know a bit about
    > alignment, whereas those guys have to assume the worst.


    A decent compiler will inline the call to memset() in this case, so
    there is no call overhead. Whether the inline memset() is faster or
    slower than an assignment to a const initialiser is something the OP
    would have to measure.

    >> I might not gain anything, but at least there's a chance.

    >
    > Please to use real data! 'gprof' is very good at this. It works [so
    > far as i have seen] on stdlib calls as well as your functions.


    Assuming the OP uses GNU tools...

    > It can tell you where you're getting killed by function call overhead,
    > and where the copy is taking a long time, such that you may go to more
    > length to avoid it. It can also (by switching back to a function) tell
    > you where your 'optimization' does nothing except increase code size.


    Assuming there is a function call...

    --
    Ian Collins
     
    Ian Collins, Feb 8, 2012
    #16
  17. nroberts

    Jens Gustedt Guest

    Am 02/08/2012 12:58 AM, schrieb Shao Miller:
    > On 2/7/2012 12:02, nroberts wrote:


    > I'm not sure how you could gain anything unless the call to 'memset'
    > actually translates differently than a '= { 0 };'-style initializer.


    The gain is in the knowledge of the optimizer. If you have a memset
    initialization it is difficult (but not impossible) for the optimizer
    to keep track of initializations. If it knows of initializations and
    it encounters an assignment of a field of the struct before it is ever
    read, the optimizer is allowed to omit the initialization. Modern
    optimizers can be quite good in tracking individual struct or array
    members.

    Jens
     
    Jens Gustedt, Feb 9, 2012
    #17
  18. nroberts

    Tim Prince Guest

    On 02/07/2012 12:02 PM, nroberts wrote:
    > memset and memcpy are turning up in profiles a lot. I'd like to speed
    > things up a bit.
    >
    > Sometimes it is clear that using = to initialize a local would be
    > better than memset. I might not gain anything, but at least there's a
    > chance.
    >
    > However, can I gain performance improvements when zeroing out say some
    > global element in an array like so:
    >
    > typedef struct x { int var0; char var1[20]; } X;
    >
    > X gX[30];
    >
    > void f(int slot)
    > {
    > X init = {0};
    >
    > gX[slot] = init;
    >
    > ...
    > }
    >
    > vs.
    > void f(int slot)
    > {
    > memset(&gX[slot], 0, sizeof(X));
    >
    > ...
    > }
    >
    > Normally I wouldn't look for a micro-optimization like this but I'm
    > kind of stuck with the parameters I'm given.


    Certain compilers make such transformations automatically; for only 30
    elements, presumably with reasonable alignment (with compiler able to
    see it via in-lining), in-line code may be best, but compilers may
    prefer memset() to reduce code size. It may make a difference when one
    or the other applies a cache bypass (IA nontemporal) when the move is
    seen as large enough to need it, which 30 elements clearly is not.
     
    Tim Prince, Feb 14, 2012
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. k-man
    Replies:
    4
    Views:
    4,556
    Shelley Hebert
    Dec 18, 2003
  2. Joe C
    Replies:
    5
    Views:
    8,959
    tom_usenet
    Aug 24, 2004
  3. Replies:
    17
    Views:
    6,594
    Greg Comeau
    Sep 22, 2004
  4. Tony Johansson
    Replies:
    1
    Views:
    303
    Ivan Vecerina
    Aug 22, 2005
  5. Bill Cunningham

    cpy functions

    Bill Cunningham, Apr 28, 2014, in forum: C Programming
    Replies:
    37
    Views:
    194
    Bill Cunningham
    Apr 29, 2014
Loading...

Share This Page