STL vector

P

Peter Liedermann

Hi there,

Let "Jumbo" be the name of a class whose instances consume substantial
storage.

A vector of typically 1000 such Jumbo-instances is needed. It must be
sorted by a simple int criterium, but not often.

Is there a substantial performance difference between vector <Jumbo> and
vector <Jumbo *>, provided that everything else is adapted accordingly? Is
there a difference at all?

Thanx a lot,
Peter
 
M

Marcel Müller

Let "Jumbo" be the name of a class whose instances consume substantial
storage.

A vector of typically 1000 such Jumbo-instances is needed. It must be
sorted by a simple int criterium, but not often.

Is there a substantial performance difference between vector <Jumbo> and
vector <Jumbo *>, provided that everything else is adapted accordingly?
Is there a difference at all?

In general you need to copy the Jumbo objects to put them into the
vector. The copy constructor of Jumbo might be quite expensive if it is
not a POD like type. Is that what you want?

You should also think about the (re)allocations of vector<Jumbo>. A
vector will in general allocate more slots than needed. If Jumbo is
large, then this might be a reasonable amount of memory. Furthermore, a
vector might not start with it's final size. So there are some
reallocations necessary before the vector has it's final size. These
reallocations copy the entire content.

And last but not least, allocating very large objects could cause heap
fragmentation. This might be a reasonable impact, especially on 32 bit
platforms where the virtual memory may be out long before the physical
memory.

On the other side the allocation overhead of large objects is usually
negligible. So almost everything argue for a solution with some references.

As a general rule of thumb large objects should preferably used as
reference type. While small objects should be used as value types.

Personally I would prefer vector<intrusive_ptr<Jumbo>> or some other
smart pointer over vector<Jumbo*>, because it answers the ownership
question more clearly. However, details depend on your application.


Marcel
 
M

Marcel Müller

Surely allocating *small* objects is more likely to cause heap
fragmentation than allocating very large objects? It is more likely that
a very large object allocation will *fail* in a fragmented heap but this
is not what you said.

It depends on what you call 'much fragmentation'. If we are talking
about the number of free fragments, then you are right. But if we are
talking about the amount of unused memory, the large objects are often
more critical, because they likely do not to fit in any previously used
free space.

Especially if you have only one large object and if its final size is
obtained by several reallocations. If the next allocation block will
never fit into the previously freed blocks then you get almost 50%
unused address space. And this is not that unlikely. E.g. if the
allocation size grows exponentially by a factor of two in each step or
if smaller, longer lived objects are allocated between the large
objects. (A good old Matlab problem when building up large matrices
incrementally, or in REXX when concatenating a large number of small
strings.)


Marcel
 
J

Joe Gottman

Hi there,

Let "Jumbo" be the name of a class whose instances consume substantial
storage.

A vector of typically 1000 such Jumbo-instances is needed. It must be
sorted by a simple int criterium, but not often.

Is there a substantial performance difference between vector <Jumbo> and
vector <Jumbo *>, provided that everything else is adapted accordingly?
Is there a difference at all?

Thanx a lot,
Peter

1) If you know in advance the size of the vector, you can reserve the
right amount of capacity before you start filling it and thus avoid the
costs of reallocation.

2) If you can define an efficient swap() function for your class,
then sorting a vector<Jumbo> would not involve any copying whatever.


Joe Gottman.
 
P

Peter Liedermann

In general you need to copy the Jumbo objects to put them into the
vector. The copy constructor of Jumbo might be quite expensive if it is
not a POD like type. Is that what you want?

I think this is the key to my question: Jumbo * is a POD type and Jumbo
itself is clearly not. For illustration of the term POD, I found
http://www.fnal.gov/docs/working-groups/fpcltf/Pkg/ISOcxx/doc/POD.html
useful.

You should also think about the (re)allocations of vector<Jumbo>. A
vector will in general allocate more slots than needed. If Jumbo is
large, then this might be a reasonable amount of memory. Furthermore, a
vector might not start with it's final size. So there are some
reallocations necessary before the vector has it's final size. These
reallocations copy the entire content.

Since I am now determined to use Jumbo * and can exactly predict the
number of slots, this is not relevant to my situation although generally
good to know. Question: If I would use vector <Jumbo> instead and the
Jumbo objects were of varying length and I would start the vector with all
NUL entries, could I then get the reallocation/copy problem mentioned?

Personally I would prefer vector<intrusive_ptr<Jumbo>> or some other
smart pointer over vector<Jumbo*>, because it answers the ownership
question more clearly. However, details depend on your application.

Thank you for this suggestion. I will practice the use of the mentioned
smart pointers at a good opportunity. At the present project, though, I
will use vector <Jumbo*> because ownership is out opf qustion and it is
not the time to risk anything new.

Thanx,
Peter
 
P

Peter Liedermann

1) If you know in advance the size of the vector, you can reserve the
right amount of capacity before you start filling it and thus avoid the
costs of reallocation.

yes, I do
2) If you can define an efficient swap() function for your class,
then sorting a vector<Jumbo> would not involve any copying whatever.
I can certainly swap Jumbo * (pointers) easily, swapping Jumbo objects
themselves is certainly (much?) more difficult.
So I am determined to use Jumbo * as of now.
Thanx for the clarification
Peter
 
P

Peter Liedermann

Hi together,
I do not think that I presently have any problems with heap fragmentation.
Your discussion, however, is probably interesting not only to me. I would
appreciate continuation.
Regards
Peter
 
G

Goran

Hi together,
I do not think that I presently have any problems with heap fragmentation..
Your discussion, however, is probably interesting not only to me.  I would
appreciate continuation.

It's perhaps worth adding that sorting the vector<Jumbo> in itself
doesn't affect heap fragmentation, as sorting should be in-place.
Rather, any allocations that Jumbo copying makes affect it. It Jumbo
is big, but of small pieces, then it's likely you won't be affected.

Goran.
 
P

Peter Liedermann

It's perhaps worth adding that sorting the vector<Jumbo> in itself
doesn't affect heap fragmentation, as sorting should be in-place.

also if the Jumbo* instances are of varying length?

Rather, any allocations that Jumbo copying makes affect it. It Jumbo
is big, but of small pieces, then it's likely you won't be affected.

Wouldn't Jumbo big and of small pieces make fragmentation worse?

Peter
 
G

Goran

also if the Jumbo* instances are of varying length?

How can Jumbo* instances be of varying length? whatever* has pointer
size (e.g. 4 or 8). You mean, Jumbo instances are of varying sizes?
(As in, there's Jumbo-derived class instances that are bigger?) If so,
no problem, instances themselves won't be moved around, only pinters
to them. If you're using some trick to make Jumbo instances themselves
of varying size (e.g. strunc xyz { int size; data elements[]; } and
overloaded operator new etc.), then you can't put them in a
vector said:
Wouldn't Jumbo big and of small pieces make fragmentation worse?

Whoops, yes, likely. I rather meant "if Jumbo is big --because-- it's
made of many small pieces".

Goran.
 
W

Werner

Hi there,

Let "Jumbo" be the name of a class whose instances consume substantial
storage.

A vector of typically 1000 such Jumbo-instances is needed. It must be
sorted by a simple int criterium, but not often.

Is there a substantial performance difference between vector <Jumbo> and
vector <Jumbo *>, provided that everything else is adapted accordingly? Is
there a difference at all?

If, once the vector is populated, it does not change much,
one option may be to use an additional vector containing
iterators to the first. The vector containing the iterators
remain sorted (the other one not). It needs to be repopulated
under certain circumstances if items are inserted into the
first or re-allocation happens due to lack of capacity.

Another option is to look at boost::ptr_vector.
- It automatically manages memory.
- It uses a indirect interface that could simplify sorting.

Kind regards,

Werner
 
N

none

I think this is the key to my question: Jumbo * is a POD type and Jumbo
itself is clearly not. For illustration of the term POD, I found
http://www.fnal.gov/docs/working-groups/fpcltf/Pkg/ISOcxx/doc/POD.html
useful.



Since I am now determined to use Jumbo * and can exactly predict the
number of slots, this is not relevant to my situation although generally
good to know. Question: If I would use vector <Jumbo> instead and the
Jumbo objects were of varying length and I would start the vector with all
NUL entries, could I then get the reallocation/copy problem mentioned?

Typically Jumbo should be more like:

class Jumbo
{
public:
Jumbo();
// etc
private:
data * m_data;
}

So the "size" of Jumbo would not be sizeof(Jumbo) and Jumbo objects
containing containing varying amount of data would all present the
same sizeof(aJumboOject).

for such a class, as mentionned elsewhere, a very efficient swap()
function could be implemented by simply swapping the internal
pointers.
 
J

Joe Gottman

I think this is the key to my question: Jumbo * is a POD type and Jumbo
itself is clearly not. For illustration of the term POD, I found
http://www.fnal.gov/docs/working-groups/fpcltf/Pkg/ISOcxx/doc/POD.html
useful.

Depending on what Jumbo looks like, you could still sort a
vector<Jumbo> efficiently if you can define an efficient swap()
overload. For instance, suppose Jumbo is implemented as follows:

struct Jumbo
{
std::string foo;
std::vector<double> bar;
} ;


Then you can define swap() by first defining a public swap() member
function:

void swap(Jumbo &rhs)
{
//Almost all the types in the standard library have efficient
swap() overloads
swap(foo, rhs.foo);
swap(bar, rhs.bar);
}


Then, define a global swap function in the same namespace as Jumbo (this
is important!)

inline void swap(Jumbo &lhs, Jumbo &rhs)
{
lhs.swap(rhs);
}


If you follow these two steps, and Jumbo consists solely of built-in
types and standard library types, then sorting a vector of Jumbo becomes
extremely efficient, with no need to ever allocate or deallocate memory
for a temporary object.

Joe Gottman
 
P

Peter Liedermann

On 15/11/2011 19:50, Peter Liedermann wrote:
If you are going to allocate the Jumbo objects separately for use in
vector<Jumbo*> then you might as well just use std::set<Jumbo>; std::set
will keep the items sorted.

Thank you for the suggestion. However, I am not going to use set because
I occasionally have to re-sort. In my case, this resort would be quite
intense, that is, almost everything is repositioned each time. This is
definitely not a hot spot in the algorithm, but I do not want to swear
that it occurs at most so and so often. For this reason, I want to avoid
tree-like structures, which an STL set essentially is.

But BTW, this takes us back to a variation the original question: Would
you, given the use of a set, really use std::set<Jumbo> or std::set<Jumbo
*> instead and does it make a difference
(that is, does replacing "vector" by "set" in the original question make a
difference)

Regards.
Peter
 
J

Joshua Maurice

Thank you for the suggestion.  However, I am not going to use set because
I occasionally have to re-sort.

In my case, this resort would be quite
intense, that is, almost everything is repositioned each time.  This is
definitely not a hot spot in the algorithm, but I do not want to swear
that it occurs at most so and so often.  For this reason, I want to avoid
tree-like structures, which an STL set essentially is.

But BTW, this takes us back to a variation the original question: Would
you, given the use of a set, really use std::set<Jumbo> or std::set<Jumbo
*> instead and does it make a difference
(that is, does replacing "vector" by "set" in the original question make a
difference)

It depends on a lot of things. You really haven't given me enough
information to determine the answer.

If you question is "Given a pre-existing list of stuff that's
expensive to copy, how do I figure out what is the sorted sequence of
those objects?", then the answer is pretty straightforward. Define the
following class:
struct JumboPointerSortRule
{
bool operator() (Jumbo* x, Jumbo* y) const { return *x < *y; }
};
Then create a std::set<Jumbo*, JumboPointerSortRule>, and insert a
pointer of each element in the unsorted sequence to this set. Voila,
you now have a view on your original objects, where the view is in
sorted order, all with a minimum of fuss and CPU/memory usage. Unless
a profiler tells you differently, or you know up front some very
specific facts (you would know what those facts are if you knew them),
then don't worry about it and do it this way.

The answer may change if you have C++11. I'm currently too ignorant of
such things to comment.

If your question is more of a architecture question, then you haven't
given me enough information to work through how I'd do it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top