Clearing structures with member functions.

R

roberts.noah

I ran across some code that called memset(this, 0, sizeof(*this)) in
the member function of a structure in C++. This just looked wrong to
me so I did some web searching and it does indeed seem like a
questionable thing to do. Looking around further took me to a
newsgroup discussion that said the better approach was to do something
like sv = Struct(). The thing is that both cases seem to bo "ok" so
long as there are no virtual functions and no constructor in Struct or
any of its internals. Code like this does not initialize the object to
0:

struct SOut {
int x;
struct SIn {
int y;
SIn : y() {} // or this could be a virtual function...
} in;
}

....

SOut out = SOut(); // initializes y but not x.

out = SOut(); // clears y but not x...

So, since using either approach breaks if given certain circumstances
the question of which is more correct seems unclear. Since this is a
performance issue the real question is which is the faster, memset(x,
0, sizeof(x)) or x = X()? Another question is if the above code should
be clearing the value of x...so long as SIn has no constructor or
virtual functions it works, but as soon as it does the default
constructor of SOut no longer clears x. What law in C++ governs this
behavior?

Is using memset immoral even if you KNOW that the object will never
have any virts? The only reason why I would think it might be is that
these structs have member functions...and calling memset on this
/seems/ icky. I can definately see an argument comming over this and I
would like to be sure of my position...and IANAL.

Thanks.
 
J

Josh Mcfarlane

struct SOut {
int x;
struct SIn {
int y;
SIn : y() {} // or this could be a virtual function...
} in;
}

Why not just generate a constructor for SOut then and avoid memset all
together?

-Josh McFarlane
 
M

Mike Wahler

I ran across some code that called memset(this, 0, sizeof(*this)) in
the member function of a structure in C++. This just looked wrong to
me so I did some web searching and it does indeed seem like a
questionable thing to do. Looking around further took me to a
newsgroup discussion that said the better approach was to do something
like sv = Struct(). The thing is that both cases seem to bo "ok" so
long as there are no virtual functions and no constructor in Struct or
any of its internals. Code like this does not initialize the object to
0:

And no pointer or floating point members. All-bits-zero is
not guaranteed to evaluate to NULL or 0.0
struct SOut {
int x;
struct SIn {
int y;
SIn : y() {} // or this could be a virtual function...
} in;
}

...

SOut out = SOut(); // initializes y but not x.

out = SOut(); // clears y but not x...

So, since using either approach breaks if given certain circumstances
the question of which is more correct seems unclear. Since this is a
performance issue


Are you sure? Have you proven so with valid measurements?
the real question is which is the faster, memset(x,
0, sizeof(x)) or x = X()?

One of them. Either of them. Neither of them. THe language
says nothing about performance (other than the standard library
'big-O' specs).
Another question is if the above code should
be clearing the value of x...so long as SIn has no constructor or
virtual functions it works, but as soon as it does the default
constructor of SOut no longer clears x. What law in C++ governs this
behavior?

I'll let someone else look that up, and at the same time advise
against using memset in that context.
Is using memset immoral even if you KNOW that the object will never
have any virts? The only reason why I would think it might be is that
these structs have member functions...and calling memset on this
/seems/ icky.

IMO not just 'icky', but a dangerous practice.
I can definately see an argument comming over this and I
would like to be sure of my position...and IANAL.

Thanks.

I think a better question is why is there a need to 'zero-out' all
a class' data members in one sweep? IMO each data member has a specific
meaning and purpose, along with function(s) to manipulate it. If in the
course of processing, one or more data members need to be 'reset',
this should be done in that logic, unambiguously, e.g.

{
/* etc */
x.a = 0;
x.b = 0;
/* etc */
}

Many programmers think they can optimize code by hand better than
their compiler can. They're almost always wrong.

-Mike
 
R

roberts.noah

Mike said:
And no pointer or floating point members. All-bits-zero is
not guaranteed to evaluate to NULL or 0.0

We are not concerned with the behavior of some obscure, non-existant
implementation in which NULL is not 0. In practice I have never seen
such a thing occur. As the compiler, hardware, OS, and all that is
known _I_ can guarantee that all bits zero is null and 0.0.

If I am to convince others that we need to change this practice I can't
make such arguments my sole point as for us and our targets they are
moot. Performance is given precidence over standards compliance here
so unless there is some *practical* reason the practice won't change.
If there is a piece of hardware out there that uses something other
than 0 to mean 0.0 or null then MAYBE, but afaik no such thing exists
and the x86 running Windows is certainly not one of them.

I agree that depending on an implementation is bad form, but I am
working in a Windows house and that is just plain the norm under such
situations. My OS, hardware, and compiler are not likely to change in
such a manner and nobody is concerned with compatibility with something
obscure or theoretical.
One of them. Either of them. Neither of them. THe language
says nothing about performance (other than the standard library
'big-O' specs).

I was hoping for a practical answer.
I think a better question is why is there a need to 'zero-out' all
a class' data members in one sweep?

Note that I did specify structure and not class.
 
R

red floyd

Note that I did specify structure and not class.

So? The *ONLY* difference between a struct and a class is the default
access (struct => public, class => private).

A class can be a POD, a struct can be a non-POD. The key thing is that
while memset(this,0,sizeof(*this)) may be OK in certain circumstances on
a POD, using it leads to bad habits. And if it changes so that your
struct is no longer a POD, but has virtual functions, for example,
congrats! You just clobbered your VPTR.

Instead, you should create a constructor for your POD class

struct X {
int x;
int y;
X(int x_ = 0, int y_ = 0) : x(x_), y(y_) { }
};
 
R

roberts.noah

Is using memset immoral even if you KNOW that the object will never
have any virts? The only reason why I would think it might be is that
these structs have member functions...and calling memset on this
/seems/ icky. I can definately see an argument comming over this and I
would like to be sure of my position...and IANAL.

I can make my question clearer:

Assuming I have a Struct variable "s" that contains member functions,
which is the better way to clear its values and why?

1 - memset(&s, 0, sizeof(s));
2 - s = Struct();

If 1, is doing that from inside the structure really a big deal or is
it something that can be depended upon to work given that Struct will
never have virtual members?
 
A

Andre Kostur

(e-mail address removed) wrote in

We are not concerned with the behavior of some obscure, non-existant
implementation in which NULL is not 0. In practice I have never seen
such a thing occur. As the compiler, hardware, OS, and all that is
known _I_ can guarantee that all bits zero is null and 0.0.

However... you're posting in comp.lang.c++. We're platform agnostic over
here. All we get to assume is the Standard. How a particular
implementation behaves belongs in a newsgroup dedicated to that platform.
If I am to convince others that we need to change this practice I
can't make such arguments my sole point as for us and our targets they
are moot. Performance is given precidence over standards compliance
here so unless there is some *practical* reason the practice won't
change. If there is a piece of hardware out there that uses something
other than 0 to mean 0.0 or null then MAYBE, but afaik no such thing
exists and the x86 running Windows is certainly not one of them.

Nor should you. However, you are posting in comp.lang.c++ where we don't
get to assume platform-specific behaviours. And whatever we do say that
works, will work on _any_ platform (assuming the usual things, like a
Standards-compliant compiler... a well-formed program that doesn't
exhibit Undefined Behaviour, etc...).
I agree that depending on an implementation is bad form, but I am
working in a Windows house and that is just plain the norm under such
situations. My OS, hardware, and compiler are not likely to change in
such a manner and nobody is concerned with compatibility with
something obscure or theoretical.

That's nice... but doesn't belong in comp.lang.c++.
I was hoping for a practical answer.

That would be platform-specific (and potentially implementation-
specific), and thus belongs in a newsgroup dedicated to your platform
and/or specific compiler.
 
C

Cy Edmunds

I can make my question clearer:

Assuming I have a Struct variable "s" that contains member functions,
which is the better way to clear its values and why?

1 - memset(&s, 0, sizeof(s));
2 - s = Struct();

If 1, is doing that from inside the structure really a big deal or is
it something that can be depended upon to work given that Struct will
never have virtual members?

In the current issue of C/C++ User's Journal, Koenig and Moo have an article
warning against violating abstractions. They also make the case that even
the simplest struct represents some level of abstraction unless it is
comprised of nothing but public data. I suggest you invest in a copy. (The
short form is that violating abstractions makes the code less maintainable
and less portable.)
 
O

Old Wolf

We are not concerned with the behavior of some obscure,
non-existant implementation in which NULL is not 0. In
practice I have never seen such a thing occur.

Well, that's because you only "practice" on one implementation !
Performance is given precidence over standards compliance here

I am working in a Windows house and that is just plain the norm
under such situations. My OS, hardware, and compiler are not
likely to change in such a manner and nobody is concerned with
compatibility with something obscure or theoretical.

"standards compliance" = "robust".

Unix variants run stably on hundreds of different varieties of
hardware. Windows runs on one or two variants and crashes
often by comparison. Few people complain that Unix variants
run too slowly compared to Windows on the same hardware.

The correct attitude to take is: if you can do the exact same
thing in a portable manner, then you should.
I was hoping for a practical answer.

Unless your compiler is 20 years old, or in debug mode, then it will
generate a memset assembly instruction, if your actual code is
equivalent to a memset.

If your code is not equivalent to a memset then it won't generate
a memset instruction (eg. if your class has a vtable).

The compiler can do all of this automatically and produce the
most optimal solution. Yet, you want to make your code less
robust in an attempt to second-guess your compiler. Why?

If you don't believe it, try both forms in release mode and look
at the assembler output generated.
Note that I did specify structure and not class.
"struct" and "class" do the same thing in C++, except that
structs default to public access. I think you are talking about
"POD structs" and "non-POD structs" (POD meaning
roughly equivalent to a struct in C).

Unfortunately there is no way in the C++ language to
automatically detect if a struct is POD or not, although
I think this has been proposed for a future version.
 
J

Jim Langston

I ran across some code that called memset(this, 0, sizeof(*this)) in
the member function of a structure in C++. This just looked wrong to
me so I did some web searching and it does indeed seem like a
questionable thing to do. Looking around further took me to a
newsgroup discussion that said the better approach was to do something
like sv = Struct(). The thing is that both cases seem to bo "ok" so
long as there are no virtual functions and no constructor in Struct or
any of its internals. Code like this does not initialize the object to
0:

All I can say was I was working on a program that was originally C. It did
the memset thing for it's structures and I didn't think much about it. I
made one of my own classes which initialized some data to certain values.
An existing structure was the best place to put this (which would then
really be a class but that's neither here nor there).

My code broke. Looking at my data I discovered that the variables I had
initialize in my class, which was inside the structure were 0, not the
values I initialized them to. So I had to get rid of the memset for this
structure, but it turned out that just about everything depended on the
variables being initialzed to 0.

So I then started making a constructor initializing the values to 0, and
soon discovered that this structure contained a lot more structures which
contained a lot more structures, etc... I wound up having to search out and
modify something like 15 structures and create constructors defaulting
values to zero just so I could put my class that needed to initialzied data
in the constructor inside this class.

It's not a good idea. Don't do it IMNSHO.
 
G

Greg

I can make my question clearer:

Assuming I have a Struct variable "s" that contains member functions,
which is the better way to clear its values and why?

1 - memset(&s, 0, sizeof(s));
2 - s = Struct();

If 1, is doing that from inside the structure really a big deal or is
it something that can be depended upon to work given that Struct will
never have virtual members?

Well it goes without saying that Struct will never have virtual methods
(that work) as long as there is code somewhere that is memsetting its
contents to 0. The problem with using memset in this way is that it
scatters such assumptions throughout the source code. In other words,
the definition of Struct does not express everything there is to know
about a Struct. Instead, the complete information can be found only by
examining every source file in the app, because the client has defined
the proper initialization state for a Struct.

In an object-oriented design, the object maintains its data, and
mediates both acccesses and changes to it. And the object itself
implements its initialization, not its clients. The benefits from doing
so are clear. Anyone wanting to find (or to change) how a Struct is
initialized, knows where to look for that code, and know also that it
is all in one place. The clients of Struct have a easier job as well. A
client no longer has to "know" that it's OK to call memset for a
Struct, but not, say, on a StructTwo. Such arbitrary knowledge does
nothing to produce better code, it simply makes it more difficult to
write correct code. And besides, how certain is it that Struct will be
a POD forever? How long does "never" usually last when it comes to
software design? Often not as long as one might expect - at least in my
experience.

Greg
 
P

Pete Becker

Andre said:
However... you're posting in comp.lang.c++. We're platform agnostic over
here. All we get to assume is the Standard. How a particular
implementation behaves belongs in a newsgroup dedicated to that platform.

Nevertheless, while hypothetical difficulties that, in fact, don't exist
can make for lengthy, heated discussions, they should not be a
significant factor in design decisions.
 
R

roberts.noah

Old said:
Well, that's because you only "practice" on one implementation !

That is a pretty big assumption.
The compiler can do all of this automatically and produce the
most optimal solution. Yet, you want to make your code less
robust in an attempt to second-guess your compiler. Why?

You are preaching to the choir here. Go back and reread. I am not the
lead developer nor the one that decided to use memset. I want to
remove it. I am looking for some argument a little more concrete than
the remote possibility that a hardware component could be created in
some future, an unforseeable one, where NULL is not 0. I can guarantee
I will be putting those memsets back if I can't come up with such an
argument and you guys bashing me is of 0 help.

So forget it.
 
A

Andrew Koenig

I ran across some code that called memset(this, 0, sizeof(*this)) in
the member function of a structure in C++. This just looked wrong to
me so I did some web searching and it does indeed seem like a
questionable thing to do.

Naah, it's not questionable -- it's just plain wrong.
Looking around further took me to a
newsgroup discussion that said the better approach was to do something
like sv = Struct(). The thing is that both cases seem to bo "ok" so
long as there are no virtual functions and no constructor in Struct or
any of its internals.

I don't know what you mean by "ok". There is no guarantee that using memset
in this way will have any specific effect, because there is no guarantee
that setting the bytes of an object's representation to zero sets the
object's value to zero.
Is using memset immoral even if you KNOW that the object will never
have any virts?

Yes.
 
A

Andrew Koenig

We are not concerned with the behavior of some obscure, non-existant
implementation in which NULL is not 0. In practice I have never seen
such a thing occur. As the compiler, hardware, OS, and all that is
known _I_ can guarantee that all bits zero is null and 0.0.

Ah, so you care only about writing programs that work on implementations
that are familiar to you? In that case, you can answer your own question
better than anyone else, as only you know what implementations are familiar.
If I am to convince others that we need to change this practice I can't
make such arguments my sole point as for us and our targets they are
moot. Performance is given precidence over standards compliance here
so unless there is some *practical* reason the practice won't change.

In that case, you had better measure the performance characteristics of each
piece of hardware you're using. If you don't do that, you have no accurate
way of knowing how your programs will perform. If you do do that, then you
don't need to ask anyone else for advice.
I agree that depending on an implementation is bad form, but I am
working in a Windows house and that is just plain the norm under such
situations. My OS, hardware, and compiler are not likely to change in
such a manner and nobody is concerned with compatibility with something
obscure or theoretical.

In that case, there is only one answer to your question: Measure the
various alternatives on the particular hardware you're using, then decide.

Don't pretend you're writing C++ programs, though. What you're doing is
using C++ as a tool for writing machine-language programs for a specific
machine or collection of machines.
 
N

Niklas Norrthon

That is a pretty big assumption.


You are preaching to the choir here. Go back and reread. I am not the
lead developer nor the one that decided to use memset. I want to
remove it. I am looking for some argument a little more concrete than
the remote possibility that a hardware component could be created in
some future, an unforseeable one, where NULL is not 0. I can guarantee
I will be putting those memsets back if I can't come up with such an
argument and you guys bashing me is of 0 help.

My philosophy is: It might be ok to use nonstandard extensions or
assumptions on occaitions, but every such usage must be strongly
motivated, in comments, and to the code reviewers.

For example I frequently use the non standard assumption that
CHAR_BIT == 8 when dealing with networking code. I can motivate
this, and I have a safety net in a header saying something like
#if CHAR_BIT != 8
#error "..."
#endif

Just saying: "Non standard code might be more efficient" is not
a strong enough motivation. If I were in your clothes, I'd start
with getting the managers to accept my philosophy, that stepping
away from the standard needs to be motivated.

When that goal is accomplished, I'd go back and compile some
test code using memset, and other code, using constructors,
and then examine the generated assembly, and finally run some
heavy test runs, to get a feel of how much time is gained (or
lost), using memset instead of constructors.

I did some tests on my system. The result was that constructors
resulted in inlined mov instructions until the number of members
in the struct grow large enough, when it switched to a rep stosl
instruction. Memset resulted in an inlined rep stos independant
of the number of members.

I was not able to measure any difference in speed at all.

If your tests shows the same result, I'd say your case is pretty
strong. (By the way another important factor besides "code
efficiency" should be "code readability"...)

/Niklas Norrthon
 
M

Mike Wahler

We are not concerned with the behavior of some obscure, non-existant
implementation in which NULL is not 0.

You must live in a very small world indeed.
In practice I have never seen
such a thing occur.

I'm invisible, said the kitten with his head under the pillow.
As the compiler, hardware, OS, and all that is
known _I_ can guarantee that all bits zero is null and 0.0.

Perhaps you can for your current circumstances. If you feel
that's sufficient, that's your decision.
If I am to convince others that we need to change this practice I can't
make such arguments my sole point as for us and our targets they are
moot. Performance is given precidence over standards compliance here

Have you *proven* that coding in consideration of compliance actually
does reduce performance?
so unless there is some *practical* reason the practice won't change.

My practical reason is portability.
If there is a piece of hardware out there that uses something other
than 0 to mean 0.0 or null then MAYBE,

There is much such hardware, so not maybe, but certainly.
but afaik no such thing exists

That just means your knowledge is limited.
and the x86 running Windows is certainly not one of them.

For your information, x86 is not the only platform which
hosts Windows operating systems. Also x86's are the tiny
minority of systems which can host C++ programs.
I agree that depending on an implementation is bad form, but I am
working in a Windows house and that is just plain the norm under such
situations.

Yes, it's often necessary to make trade-offs. But I'm not convinced
that even on x86 Windows, that a memset will have noticably better
performance than memberwise assignment other than in very extreme
cases. I'd have to see controlled measurements.
My OS, hardware, and compiler are not likely to change in
such a manner

Famous last words. :)
and nobody is concerned with compatibility with something
obscure or theoretical.

Nothing of what I wrote is 'obscure' or 'theoretical', but
quite practical. The issue is portability, and is of concern
to many. If it's not a concern of yours, so be it, I don't
think any less of you for that.
I was hoping for a practical answer.

It *is* a practical answer in the context of standard C++, the
topic here. Any meaningful practical answer depends upon the
platform and build tools, which are not topical here.
Note that I did specify structure and not class.

In terms of memory layout there is no difference
between a 'struct' and a 'class' other than the
spelling of the keywords.

-Mike
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,142
Latest member
arinsharma
Top