memset and memcpy are turning up in profiles a lot. I'd like to speed
things up a bit.
Sometimes it is clear that using = to initialize a local would be
better than memset. I might not gain anything, but at least there's a
chance.
However, can I gain performance improvements when zeroing out say some
global element in an array like so:
typedef struct x { int var0; char var1[20]; } X;
X gX[30];
void f(int slot)
{
X init = {0};
gX[slot] = init;
...
}
vs.
void f(int slot)
{
memset(&gX[slot], 0, sizeof(X));
...
}
The official answer is: The definition of the C language says
nothing about which constructs are faster or slower than others.
That said, I would expect memset() to be faster, usually, if
the wind is not unfavorable and the Moon is in the right quarter.
Argument: In the assignment version, the code must allocate the auto
variable `init', zero it, and then copy all those zeroes to `gX[slot]';
on the face of it, this sounds like more work than just zeroing
`gX[slot]' to begin with.
It is just possible that a very smart compiler could (1) realize
that the `init' variable is not actually necessary, (2) decide to
clear `gX[slot]' directly instead of clearing `init' and copying,
and (3) clear `gX[slot]' more efficiently than memset() can, perhaps
with in-line code. My suspicion, though, is that a compiler smart
enough for (1,2,3) would not at the same time be so dumb as to
implement memset() with an actual call to an actual external function;
you'd need a strange combination of brilliance and stupidity to get
an advantage for initialize-and-copy.
... and, of course, measurement is the only way to be sure.
Normally I wouldn't look for a micro-optimization like this but I'm
kind of stuck with the parameters I'm given.
My prejudice (and I admit it's something of a prejudice) would be
to take a hard look at those memset() and memcpy() calls, with a view
toward eliminating at least some of them -- if you can eliminate a
call you get an infinite speedup, as opposed to a mere hundredfold!
Making copies of bits you've already computed usually doesn't advance
the state of the computation very much; making many duplicates of a
single byte is also not usually a great addition to the program's
"knowledge." There are, of course, exceptions: qsort() just rearranges
bits you already own, for example, but can be useful nonetheless.
Still, if memset() and memcpy() are dominating the run time, it seems
likely that there may be a lot of needless setting and copying going
on. See what you can jettison.