a = b or memset/cpy?

N

nroberts

memset and memcpy are turning up in profiles a lot. I'd like to speed
things up a bit.

Sometimes it is clear that using = to initialize a local would be
better than memset. I might not gain anything, but at least there's a
chance.

However, can I gain performance improvements when zeroing out say some
global element in an array like so:

typedef struct x { int var0; char var1[20]; } X;

X gX[30];

void f(int slot)
{
X init = {0};

gX[slot] = init;

...
}

vs.
void f(int slot)
{
memset(&gX[slot], 0, sizeof(X));

...
}

Normally I wouldn't look for a micro-optimization like this but I'm
kind of stuck with the parameters I'm given.
 
J

Jens Gustedt

Am 02/07/2012 06:02 PM, schrieb nroberts:
X gX[30];

void f(int slot)
{
X init = {0};

gX[slot] = init;

...
}

make it

X const init = { 0 };

or even better use a compound literal

gX[slot] = (X const){ 0 };
Normally I wouldn't look for a micro-optimization like this but I'm
kind of stuck with the parameters I'm given.

On any decent compiler the assignment version should not be worse that
the memset version, because the compiler must be able to see that it
is an object only filled with 0.

On the other hand the assignment version *may* be better, when the
compiler can do a data flow analysis that shows e.g that part of what
you initialize is overwritten before being read.

So I'd always prefer the assigment version.

Jens
 
J

James Kuyper

Am 02/07/2012 06:02 PM, schrieb nroberts:
X gX[30];

void f(int slot)
{
X init = {0};

gX[slot] = init;

...
}

make it

X const init = { 0 };

or even better use a compound literal

gX[slot] = (X const){ 0 };
Normally I wouldn't look for a micro-optimization like this but I'm
kind of stuck with the parameters I'm given.

On any decent compiler the assignment version should not be worse that

This is initialization, not assignment.
the memset version, because the compiler must be able to see that it
is an object only filled with 0.

I've used a compiler which, given the following code:

double array[10][1354][3] = {0};

generated the equivalent of the following:

array[0][0][0] = 0;
array[0][0][1] = 0;
etc.
The resulting executable was noticeably larger that I had expected it to
be. I was a little annoyed when I figured out what was going on. I
changed it to use memset(), and got a lot smaller, and executed somewhat
faster, too. The support person I talked with said that my use of {0}
was unreasonable, not their compiler's code generation.
 
N

nroberts

Am 02/07/2012 06:02 PM, schrieb nroberts:
X gX[30];
void f(int slot)
{
  X init = {0};
  gX[slot] = init;

make it

X const init = { 0 };

or even better use a compound literal

gX[slot] = (X const){ 0 };
Normally I wouldn't look for a micro-optimization like this but I'm
kind of stuck with the parameters I'm given.

On any decent compiler the assignment version should not be worse that
the memset version, because the compiler must be able to see that it
is an object only filled with 0.

On the other hand the assignment version *may* be better, when the
compiler can do a data flow analysis that shows e.g that part of what
you initialize is overwritten before being read.

So I'd always prefer the assigment version.

Jens

LOL!

Nevermind. I'm not allowed to use this language feature. It's too
"complex". People won't know what it does.

Not the '=' operator... Initializing a structure to all 0 with = {0}.

:/

I keep running into bosses like this. Is this normal in the
programming field or am I just incredibly unlucky?
 
J

Jens Gustedt

Am 02/07/2012 07:40 PM, schrieb James Kuyper:
This is initialization, not assignment.

No, you are mistaken. The relevant part is assignment to gX[slot]. The
other part is just initialization of a const. In particular the
initialization of the const qualified compound literal can be done at
compile time if the compiler decides that it is beneficial (as if it
where declared as a static variable).
the memset version, because the compiler must be able to see that it
is an object only filled with 0.

I've used a compiler which, given the following code:

double array[10][1354][3] = {0};

generated the equivalent of the following:

array[0][0][0] = 0;
array[0][0][1] = 0;
etc.
The resulting executable was noticeably larger that I had expected it to
be. I was a little annoyed when I figured out what was going on. I
changed it to use memset(), and got a lot smaller, and executed somewhat
faster, too. The support person I talked with said that my use of {0}
was unreasonable, not their compiler's code generation.

How long ago and what compiler was that? My observation over the last
years is that a compiler like gcc is capable of optimizing assignments
to struct fields or different array members as if all of these were
different variables.

(and double may be special, setting all bytes to 0 and initializing
with 0 must not necessarily be the same thing.)

Jens
 
B

Ben Pfaff

nroberts said:
Nevermind. I'm not allowed to use this language feature. It's too
"complex". People won't know what it does.

Not the '=' operator... Initializing a structure to all 0 with = {0}.

Look on the bright side: on that basis, you should have no
trouble avoiding C++ entirely at that workplace.
 
M

Malcolm McLean

The support person I talked with said that my use of {0}
was unreasonable, not their compiler's code generation.
Well what can he say? He can't patch the compiler to replace a long
intialisation with a call to memset().
 
S

Shao Miller

Well what can he say? He can't patch the compiler to replace a long
intialisation with a call to memset().

Call == LOL. Good one. :)

And the support person cannot patch the compiler to replace a 'struct'
object assignment with a call to 'memcpy' either, presumably.

I've used a Microsoft C implementation which actually will give you a
linker error if you do:

void func(void) {
int array[42] = { 0 };
return;
}

and choose not to link with the standard library... It complains about
a missing 'memset' symbol...
 
S

Shao Miller

LOL!

Nevermind. I'm not allowed to use this language feature. It's too
"complex". People won't know what it does.

Not the '=' operator... Initializing a structure to all 0 with = {0}.

:/

I keep running into bosses like this. Is this normal in the
programming field or am I just incredibly unlucky?

That feature has been around since C89/C90. Perhaps you can find a
clever way for your boss to find that out without losing face or without
regretting disallowing its use.
 
S

Shao Miller

memset and memcpy are turning up in profiles a lot. I'd like to speed
things up a bit.

You might find that the implementation actually translates a '= { 0
};'-style initializer into a call to 'memset'. An experiment might
reveal whether or not that's the case.
Sometimes it is clear that using = to initialize a local would be
better than memset. I might not gain anything, but at least there's a
chance.

I'm not sure how you could gain anything unless the call to 'memset'
actually translates differently than a '= { 0 };'-style initializer.

Did you know that after all subobjects that are explicitly initialized
(by the initializer-list) have been so, the rest are initialized to what
they would have been had the object been declared with 'static' storage
duration? The whole containing object is thus "touched."
However, can I gain performance improvements when zeroing out say some
global element in an array like so:

typedef struct x { int var0; char var1[20]; } X;

X gX[30];

void f(int slot)
{
X init = {0};

gX[slot] = init;

...
}

vs.
void f(int slot)
{
memset(&gX[slot], 0, sizeof(X));

...
}

Well these aren't the same. The former initializes all sub-objects to
the "zeroey" values that would initialize a 'static'-storage-duration
object having the same type as the sub-object and having no explicit
initializer.

The latter fills the object with bytes with the 'unsigned char' value
'0', which is all-bits-zero.

In your example, the 'struct' type 'X' has an 'int' member. The object
representation of an 'int' can have padding bits that can be used any
way the implementation pleases.

If filling the padding bits with zeroes results in a trap representation
for an 'int', then you might be in for a surprise.

There are similar concerns for other types, including pointers, where a
null pointer value might not be all-bits-zero.

That is why I believe some people consider a '= { 0 };'-style
initializer to be more portable than 'memset'. If portability isn't a
concern, oh well.
Normally I wouldn't look for a micro-optimization like this but I'm
kind of stuck with the parameters I'm given.

Optmizing and making portable might not always be compatible. If you
have a particular set of implementations as your target(s), there might
be "compiler intrinsics" that you can use which are
implementation-defined extensions to C that could offer you speed
advantages.

For example, some Microsoft compilers offer '__movsd':

http://msdn.microsoft.com/en-us/library/9d196b9h.aspx
 
E

Eric Sosman

memset and memcpy are turning up in profiles a lot. I'd like to speed
things up a bit.

Sometimes it is clear that using = to initialize a local would be
better than memset. I might not gain anything, but at least there's a
chance.

However, can I gain performance improvements when zeroing out say some
global element in an array like so:

typedef struct x { int var0; char var1[20]; } X;

X gX[30];

void f(int slot)
{
X init = {0};

gX[slot] = init;

...
}

vs.
void f(int slot)
{
memset(&gX[slot], 0, sizeof(X));

...
}

The official answer is: The definition of the C language says
nothing about which constructs are faster or slower than others.

That said, I would expect memset() to be faster, usually, if
the wind is not unfavorable and the Moon is in the right quarter.
Argument: In the assignment version, the code must allocate the auto
variable `init', zero it, and then copy all those zeroes to `gX[slot]';
on the face of it, this sounds like more work than just zeroing
`gX[slot]' to begin with.

It is just possible that a very smart compiler could (1) realize
that the `init' variable is not actually necessary, (2) decide to
clear `gX[slot]' directly instead of clearing `init' and copying,
and (3) clear `gX[slot]' more efficiently than memset() can, perhaps
with in-line code. My suspicion, though, is that a compiler smart
enough for (1,2,3) would not at the same time be so dumb as to
implement memset() with an actual call to an actual external function;
you'd need a strange combination of brilliance and stupidity to get
an advantage for initialize-and-copy.

... and, of course, measurement is the only way to be sure.
Normally I wouldn't look for a micro-optimization like this but I'm
kind of stuck with the parameters I'm given.

My prejudice (and I admit it's something of a prejudice) would be
to take a hard look at those memset() and memcpy() calls, with a view
toward eliminating at least some of them -- if you can eliminate a
call you get an infinite speedup, as opposed to a mere hundredfold!
Making copies of bits you've already computed usually doesn't advance
the state of the computation very much; making many duplicates of a
single byte is also not usually a great addition to the program's
"knowledge." There are, of course, exceptions: qsort() just rearranges
bits you already own, for example, but can be useful nonetheless.
Still, if memset() and memcpy() are dominating the run time, it seems
likely that there may be a lot of needless setting and copying going
on. See what you can jettison.
 
S

Shao Miller

typedef struct x { int var0; char var1[20]; } X;

X gX[30];

void f(int slot)
{
X init = {0};

gX[slot] = init;

...
}

vs.
void f(int slot)
{
memset(&gX[slot], 0, sizeof(X));

...
}

Well these aren't the same. The former initializes all sub-objects to
the "zeroey" values that would initialize a 'static'-storage-duration
object having the same type as the sub-object and having no explicit
initializer.

The latter fills the object with bytes with the 'unsigned char' value
'0', which is all-bits-zero.

In your example, the 'struct' type 'X' has an 'int' member. The object
representation of an 'int' can have padding bits that can be used any
way the implementation pleases.

If filling the padding bits with zeroes results in a trap representation
for an 'int', then you might be in for a surprise.

Ben Bacarisse proved in another thread that my claim for a potential
surprise is false; there is no potential for all-zero-bits in an
integer's object representation to be a trap representation. Sorry
about that!
There are similar concerns for other types, including pointers, where a
null pointer value might not be all-bits-zero.

Still applies for other things, like pointers. :)
 
J

Jorgen Grahn

memset and memcpy are turning up in profiles a lot. I'd like to speed
things up a bit.

Sometimes it is clear that using = to initialize a local would be
better than memset. I might not gain anything, but at least there's a
chance.

For copying with memcpy(), I much prefer assignment since it doesn't
bypass the type system, and is more readable.

I won't comment on the memset() part.

/Jorgen
 
J

Joe keane

memset and memcpy are turning up in profiles a lot.
Indeed.

Sometimes it is clear that using = to initialize a local would be
better than memset.

It's a shame if you call a function with a size parameter, when in fact
the size is a compile-time constant. You also probably know a bit about
alignment, whereas those guys have to assume the worst.
I might not gain anything, but at least there's a chance.

Please to use real data! 'gprof' is very good at this. It works [so
far as i have seen] on stdlib calls as well as your functions.

It can tell you where you're getting killed by function call overhead,
and where the copy is taking a long time, such that you may go to more
length to avoid it. It can also (by switching back to a function) tell
you where your 'optimization' does nothing except increase code size.
 
I

Ian Collins

It's a shame if you call a function with a size parameter, when in fact
the size is a compile-time constant. You also probably know a bit about
alignment, whereas those guys have to assume the worst.

A decent compiler will inline the call to memset() in this case, so
there is no call overhead. Whether the inline memset() is faster or
slower than an assignment to a const initialiser is something the OP
would have to measure.
I might not gain anything, but at least there's a chance.

Please to use real data! 'gprof' is very good at this. It works [so
far as i have seen] on stdlib calls as well as your functions.

Assuming the OP uses GNU tools...
It can tell you where you're getting killed by function call overhead,
and where the copy is taking a long time, such that you may go to more
length to avoid it. It can also (by switching back to a function) tell
you where your 'optimization' does nothing except increase code size.

Assuming there is a function call...
 
J

Jens Gustedt

Am 02/08/2012 12:58 AM, schrieb Shao Miller:
On 2/7/2012 12:02, nroberts wrote:
I'm not sure how you could gain anything unless the call to 'memset'
actually translates differently than a '= { 0 };'-style initializer.

The gain is in the knowledge of the optimizer. If you have a memset
initialization it is difficult (but not impossible) for the optimizer
to keep track of initializations. If it knows of initializations and
it encounters an assignment of a field of the struct before it is ever
read, the optimizer is allowed to omit the initialization. Modern
optimizers can be quite good in tracking individual struct or array
members.

Jens
 
T

Tim Prince

memset and memcpy are turning up in profiles a lot. I'd like to speed
things up a bit.

Sometimes it is clear that using = to initialize a local would be
better than memset. I might not gain anything, but at least there's a
chance.

However, can I gain performance improvements when zeroing out say some
global element in an array like so:

typedef struct x { int var0; char var1[20]; } X;

X gX[30];

void f(int slot)
{
X init = {0};

gX[slot] = init;

...
}

vs.
void f(int slot)
{
memset(&gX[slot], 0, sizeof(X));

...
}

Normally I wouldn't look for a micro-optimization like this but I'm
kind of stuck with the parameters I'm given.

Certain compilers make such transformations automatically; for only 30
elements, presumably with reasonable alignment (with compiler able to
see it via in-lining), in-line code may be best, but compilers may
prefer memset() to reduce code size. It may make a difference when one
or the other applies a cache bypass (IA nontemporal) when the move is
seen as large enough to need it, which 30 elements clearly is not.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top