vectors and user-defined objects

J

Jim

Hi,
Just wondering which is better

vector<record *> r;
r.push_back(new record(x,y));

or

vector<record> r;
r.push_back(record(x,y));

where record is just an object with a simple constructor. I prefer the
latter as it means I don't have to use pointers, but am I taking a hit
somewhere else?

Thanks
Jim
 
J

jkherciueh

Jim said:
Hi,
Just wondering which is better

vector<record *> r;
r.push_back(new record(x,y));

or

vector<record> r;
r.push_back(record(x,y));

where record is just an object with a simple constructor. I prefer the
latter as it means I don't have to use pointers, but am I taking a hit
somewhere else?

There is no "better" in the abstract for this problem. The following
considerations may enter the picture:

(a) If you need the records in the vector to be polymorphic, then you have
to go with the first way (or some smart-pointer based variation).

(b) If you use pointers, you have to be clear on ownership issues. The first
snippet implies (morally) that the vector owns the records. Now, vectors
make for poor owners since they do not automatically clean up when they get
destroyed (which might happen unexpectedly due to an exception!).

(c) The vector itself has value semantics and can be copied freely. This,
again, does not mesh all that nicely with storing pointers as soon as
ownership becomes an issue.

(d) If the records are large in size, you might have a performance hit due
to reallocation and you might waste memory: with a simple doubling
strategy, a vector filled by a sequence of push_back() operations will use
on average 72% of the allocated memory and will do 2.56 copy-constructor
calls per oject stored. But before you resort to snippet 1 for performance
reasons, however, make sure by measuring that they really do matter (also,
you might instead consider redesigning record in this case to use COW or
some such optimization trickery).

(e) Most importantly: you rarely ever have the choice. The semantics of both
snippets differ considerably.


Best

Kai-Uwe Bux
 
J

Jim

There is no "better" in the abstract for this problem. The following
considerations may enter the picture:

(a) If you need the records in the vector to be polymorphic, then you have
to go with the first way (or some smart-pointer based variation).

(b) If you use pointers, you have to be clear on ownership issues. The first
snippet implies (morally) that the vector owns the records. Now, vectors
make for poor owners since they do not automatically clean up when they get
destroyed (which might happen unexpectedly due to an exception!).

(c) The vector itself has value semantics and can be copied freely. This,
again, does not mesh all that nicely with storing pointers as soon as
ownership becomes an issue.

(d) If the records are large in size, you might have a performance hit due
to reallocation and you might waste memory: with a simple doubling
strategy, a vector filled by a sequence of push_back() operations will use
on average 72% of the allocated memory and will do 2.56 copy-constructor
calls per oject stored. But before you resort to snippet 1 for performance
reasons, however, make sure by measuring that they really do matter (also,
you might instead consider redesigning record in this case to use COW or
some such optimization trickery).

(e) Most importantly: you rarely ever have the choice. The semantics of both
snippets differ considerably.

Best

Kai-Uwe Bux

Thank you very much, I didn't realise it was so complicated. Is there
an alternative holder which owns objects better from a memory
allocation stand-point?
 
J

jkherciueh

Jim said:
Thank you very much, I didn't realise it was so complicated.

I did not intend to create that impression.
Is there an alternative holder which owns objects better from a memory
allocation stand-point?

Well, std::deque can be used when contiguity of memory is not an issue, and
std::list can be used if random access is not needed. I think, std::deque
might have the best memory efficiency when the number of objects is large.
(Note: when the number of objects is small, it probably does not matter --
unless the number of sequences is large, in which case you are very likely
doomed anyway:)

A good way of going about problems like this is to leave the decision to
measurement (if the conceptual requirements leave some freedom of choice,
that is). One can use typedefs:

typedef std::vector< record > record_sequece;
// typedef std::deque< record > record_sequence;

In the sequel, you only use record_sequence (this is: you avoid magic types
like you avoid magic numbers). Then, with the change of a typedef, you can
test which of the alternatives yields the best performance.


Best

Kai-Uwe Bux
 
Y

Yannick Tremblay

Although Kai-uwe talks about it in his point (b), I think this needs
to be highlighted as this is fundamentally wrong: the above is a
memory leak.

Unless you are using a very complex class, in C++ a "new"'ed object
can't own itself and can't free itself.
The only way this code can be used is if before r goes out of scope,
you individually read each of the stored pointers and delete them.
The above code should never exist in isolation of its deleting loop.
If attempting to compare the two approaches, you must include the
deleting loop in the comparaison.

STL containers have value semantic and may copy their own content
freely around. STL containers do not know if they are holding
pointers or object and will never call "delete" on their content.

IMNSHO, that should be your default solution. I.e. unless you know
that this is not suitable for this particular case, use this form.

The STL containers have been designed with value semantic in mind and
it is simpler to use them that way. By using the default "simpler"
method, you can concentrate your effort on what is important. If it
turns out that the copying of objects in the vector is becoming too
expensive (following profiling, don't try to guess, don't forget to
let the compiler optimise, don't forget .reserve()) then consider
storing pointers.

BTW: simple vs complex constructors do not necessarily map to cheap vs
expensive to copy.


That's correct, both approach have + and -. However, the question
seemed asked from the perspective of a inexperience C++ programmer and
I think RAII and storing values in standard containers should be the
initial approach until one knows about all the issues listed below.

Indeed, the doubling push_back copying issue can easily be solved with
calling reserve() if size can be known at runtime.

records can use COW internally so be cheaper to copy than what one
could guess.

If you let the compiler optimise, r.push_back(record(x,y)) might well
be optimised by RVO so might not involve copies.

Once one move to the more desirable storing shared_ptr in the vector
rather than raw pointers, one might find that a well design record
class copying is not necessarily _significantly_ more expensive than
copying shared_ptr. Again proper profiling required.

Lastly, I think it's worth repeating: don't use a complex pattern
because you are guessing that it might maybe be faster. Use a simpler
pattern (while making sure you don't paint yourself in a corner),
profile and if needed, change to pattern to a more complex one that
actually gain you the required measurable performance.
Thank you very much, I didn't realise it was so complicated. Is there
an alternative holder which owns objects better from a memory
allocation stand-point?

I would say, as first port of call, try:

vector< std::tr1::shared_ptr<record> >



Yan
 
D

Daniel T.

Jim said:
Just wondering which is better

vector<record *> r;
r.push_back(new record(x,y));

or

vector<record> r;
r.push_back(record(x,y));

where record is just an object with a simple constructor. I prefer the
latter as it means I don't have to use pointers, but am I taking a hit
somewhere else?

Use the latter if you can. When can't you? When 'record' isn't copy
constructible or assignable, if you need to handle derived-types, if you
need the objects to outlive the vector.

If there are performance issues, remember there are other containers
that may be more suitable. For example list and deque aren't constantly
shifting objects around to accommodate the "must be contiguous"
requirement.
 
J

James Kanze

<ea77bb89-125d-4d44-b447-de6cae4a6...@j78g2000hsd.googlegroups.com>,
Although Kai-uwe talks about it in his point (b), I think this
needs to be highlighted as this is fundamentally wrong: the
above is a memory leak.

It depends.
Unless you are using a very complex class, in C++ a "new"'ed object
can't own itself and can't free itself.

A lot do.
The only way this code can be used is if before r goes out of
scope, you individually read each of the stored pointers and
delete them.

In most cases where I've seen this, r would either have static
lifetime or be a singleton, never destructed. The constructor
of record would insert it into r, and the destructor removes it.
It's actually a very, very common idiom for entity objects.
The above code should never exist in isolation of its deleting
loop. If attempting to compare the two approaches, you must
include the deleting loop in the comparaison.
STL containers have value semantic and may copy their own
content freely around. STL containers do not know if they are
holding pointers or object and will never call "delete" on
their content.

And? That's usually exactly what is wanted. If the container
owns the objects, you'd usually use values (and the objects
cannot have identity, because the container copies them). If
the container doesn't own the objects, and the objects have
identity, then you need pointers. There are cases where you'd
want to delete the objects because the container is going out of
scope, but they are fairly rare.
IMNSHO, that should be your default solution. I.e. unless you
know that this is not suitable for this particular case, use
this form.

Let me see if I've got that straight. You should use this form
except when you shouldn't use it:). (I'm being a bit facetious
there---I basically agree with you. But fundamentally, at least
in a number of domains, the first thing you should do is define
the semantics of your types. If the type has value semantics,
you copy it---everywhere, not just when it is in a container.
And if the type has entity semantics, you usually can't copy it,
so the question doesn't arise.)

[...]
That's correct, both approach have + and -. However, the
question seemed asked from the perspective of a inexperience
C++ programmer and I think RAII and storing values in standard
containers should be the initial approach until one knows
about all the issues listed below.

I think that learning the various "standard" categories of
objects should have precedence. Until you understand the
difference between value types and entity types, it doesn't make
sense to discuss the question.

Just a note: most (but not all) polymorphic types are entity
types, which means that you would normally just use raw
pointers. The vector is a simple tool used for navigation.
 
Y

Yannick Tremblay

It depends.

In most cases where I've seen this, r would either have static
lifetime or be a singleton, never destructed. The constructor
of record would insert it into r, and the destructor removes it.
It's actually a very, very common idiom for entity objects.

Note that the op code was:

r.push_back(new record(x,y));

That's very different from what you are suggesting above with a static
vector and record inserting themselves in it for reference.
And? That's usually exactly what is wanted. If the container
owns the objects, you'd usually use values (and the objects
cannot have identity, because the container copies them). If
the container doesn't own the objects, and the objects have
identity, then you need pointers. There are cases where you'd
want to delete the objects because the container is going out of
scope, but they are fairly rare.

My point is that the code above should not be considered in
isolation. It is only correct under a set of specific circumstances
of which you have mentionned some examples or if somewhere in the code
there's a loop that deletes every single pointers in the vector before
it goes out of scope. (or if you are lazy and this is a short
lived program, you could rely on the OS freeing all the memory when
the program exit but that's not an approach I would recommend).

So the original question should be: "Which is better:"

vector<record *> r;
r.push_back(new record(x,y));
// ... do stuff
while(!r.empty())
{
delete(r.back());
r.pop_back();
}

OR:
// do stuff

These two code snippets are much more directly comparable although we
are still missing the definition of record and the first snippet is
not exception safe. Looking at them now, I would say that even a C++
newbie would notice the additional complexity required in the first
case. If we add exception safety to the first snippet, it becomes
even more complicated.

Let me see if I've got that straight. You should use this form
except when you shouldn't use it:). (I'm being a bit facetious
there---I basically agree with you.

Exactly. Use the default solution unless you know that you shouldn't
use it in this case. e.g.:

-Use standard containers with value semantic unless you know you must
use pointers.
-Don't (micro-)optimize unless you know you must.
-Use local variable unless you know you must use a global
-Use automatic object unless you know you must use free store.
-Make reference arguments const unless you know they must not be.
-Use standard containers unless you know you they don't meet your
requirements.
-Use vector by default unless you know you shouldn't in this case.
-Drive on the normal side of the road unless you know this is a
one-way street and you can safely drive in the other lane.

(I am sure someone will jump on me for these :)
Don't get me wrong. All of these default solutions can be broken.
In fact, they very often must be broken. But if one does, one should
be able to explain why.
I think that learning the various "standard" categories of
objects should have precedence. Until you understand the
difference between value types and entity types, it doesn't make
sense to discuss the question.

That's also what I mean. Kai-uwe answer was very extensive and quite
correct but for learners, I quite like a practical and simpler
approach of considering a "standard" solution first and deviate from
it if you know it you must. Actually, even for experienced
developpers, I think it's good to first consider the obvious default
solution before going for a more complex pattern.

A la prochaine

Yan
 
J

James Kanze

Note that the op code was:
r.push_back(new record(x,y));
That's very different from what you are suggesting above with
a static vector and record inserting themselves in it for
reference.

Yes. I feel fairly sure that the OP didn't understand the
difference between entity objects and value objects, for
example. My response wasn't really geared to his problem, but
to the more general suggestions (i.e. a delete must be
associated with the destruction of the vector). In my
experience, such is rarely the case. Precisely because you do
use value semantics if the object lifetime is dependent on the
container lifetime.
My point is that the code above should not be considered in
isolation. It is only correct under a set of specific
circumstances of which you have mentionned some examples or if
somewhere in the code there's a loop that deletes every single
pointers in the vector before it goes out of scope. (or if
you are lazy and this is a short lived program, you could rely
on the OS freeing all the memory when the program exit but
that's not an approach I would recommend).

And my point is that when the above isn't correct, you should
probably be using a container of values. I think we're just
looking at it from different angles.
So the original question should be: "Which is better:"
vector<record *> r;
r.push_back(new record(x,y));
// ... do stuff
while(!r.empty())
{
delete(r.back());
r.pop_back();
}

Which is, IMHO, simply wrong. If, for some reason, you need a
collection of pointers to dynamically allocated objects whose
lifetime is associated with that of the container (something
which is very, very rare), then the *only* correct solution
involves using a container of boost::shared_ptr, or something
similar. (Note, however, that in this case, the automatic
reaction shouldn't be: "use boost::shared_ptr", but "use values,
not pointers".)
// do stuff
These two code snippets are much more directly comparable
although we are still missing the definition of record and the
first snippet is not exception safe. Looking at them now, I
would say that even a C++ newbie would notice the additional
complexity required in the first case. If we add exception
safety to the first snippet, it becomes even more complicated.

OK. If I understand you correctly, you weren't suggesting the
first as a good solution, but simply showing why you should
favor values. I'll admit that I tend to consider the issue from
an even more fundamental level: if you have values (at the
conceptual level), then use values. And things that disappear
with the container are almost always values. Even without the
added complexity and the missing exception safety, using
pointers when you have values is conceptually wrong.
Exactly. Use the default solution unless you know that you
shouldn't use it in this case. e.g.:
-Use standard containers with value semantic unless you know you must
use pointers.
-Don't (micro-)optimize unless you know you must.
-Use local variable unless you know you must use a global
-Use automatic object unless you know you must use free store.
-Make reference arguments const unless you know they must not be.
-Use standard containers unless you know you they don't meet your
requirements.
-Use vector by default unless you know you shouldn't in this case.
-Drive on the normal side of the road unless you know this is a
one-way street and you can safely drive in the other lane.
(I am sure someone will jump on me for these :)

Not at all. I think it's a very good list, and very well
presented.

What I might add is that you should design first, and whether an
object is an entity or a value (or some other category) should
be determined by the design. In a certain sense, you don't need
the first rule, since the role of the container and the role of
what it contains has been determined by design. It's a good
rule, but it's a rule at the coding level, when the decision
should have been made upstream, at the design level.

I'm tempted to say the same thing for rules 3 and 4, except
there are "working variables" which aren't handled at the design
level, but are introduced during coding. For all others,
lifetime of the object is a design decision.
Don't get me wrong. All of these default solutions can be
broken. In fact, they very often must be broken. But if one
does, one should be able to explain why.
That's also what I mean. Kai-uwe answer was very extensive
and quite correct but for learners, I quite like a practical
and simpler approach of considering a "standard" solution
first and deviate from it if you know it you must. Actually,
even for experienced developpers, I think it's good to first
consider the obvious default solution before going for a more
complex pattern.

I think the problem with regards to learners is that there is
too much to learn at once. Logically, you can't write good C++
(or good anything) until you know something of design. But it's
very difficult to teach C++ and design at the same time---you're
throwing too much at them at once. And teaching design before
they know a programming language would be, I suspect, very
frustrating---the student learns a lot of principles that he
can't apply until the next semester (by which time he'll have
forgotten them if he hasn't used them).

As for experts, I agree that having some rules of thumb is a
good idea, but I suspect that they won't often have the occasion
to apply some of the rules very often, since they are coding
rules, and the decision will have been made at design time,
before coding starts.

(I might add that there is one that I think needs some
elaboration: use vector by default. I would argue that you
should never use a standard class for a fundamental abstraction
of your application. You should design a class with a narrow
interface, which provides exactly what you need, and no more.
That way, you maintain maximum freedom to modify the
implementation. You should, however, use vector by default in
the implementation, just as you should use it by default when
you incedentally need a container, i.e. the container is not
part of the design.

And of course, that generally, you should treat strings as a
more or less primitive type, and not a container. If you want a
container of char, then it's vector<char> by default, but most
of the time, conceptually, you don't want a container of char,
but a string.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,528
Members
45,000
Latest member
MurrayKeync

Latest Threads

Top