stl vector/memory question

Kenneth W Del Signore · Sep 2, 2004

Hi,

I'm working on a scheme to store several millions of records (Subdata),
each of which contains a variable number of sub records (Celldata). My concern is that
my prototype uses a lot more memory than I'm expecting. To store 1.7million Subdatas
that have an average of 7 Celldatas, I'm using 85 Mb. Thats about 50 bytes/record, where as
actual data is about 14 bytes/record (average).

Any insight is greatly appreciated!
Ken

Cliffnoted code is as follows:

class Celldata
{
public:
short cellnum:10,
time_stamp:6;

Celldata(int, int);
};

class Subdata
{
public:
vector<Celldata> cells;

int add_cell(int cell, int time);
};

class Esn_global
{
public:

vector<Subdata> sublists;
}

int
Subdata::add_cell( int cell, int time)
{
Celldata newcell(cell, time);
cells.push_back(newcell);
return 1;
}

Esn_global g_esn;

main()
{
for (loop forever)
{
-read record containing subscriber identifier and cell
- add cell to subcribers list (if not already on list)
}

}

I've trimmed some of the code out, there is another structure that takes a subscriber's
identifier (phone number) and returns an index into g_esn.sublists that corresponds to that users
data.

I'm wondering if I should be making new class instances with a 'new' and then pushing onto my
vectors?

I've also played around with resizing the vectors by 2 entries when they are full to prevent them from
doubling in size when they fill; this didn't seem to be a benefit....

I was pushing all the instances onto vectors to avoid any memory leaking and also thought it would be better
for daily audits of the data. Maybe this is my problem?

thanks

Andre Kostur · Sep 2, 2004

Hi,

I'm working on a scheme to store several millions of records
(Subdata),
each of which contains a variable number of sub records (Celldata).
My concern is that my prototype uses a lot more memory than I'm
expecting. To store 1.7million Subdatas that have an average of 7
Celldatas, I'm using 85 Mb. Thats about 50 bytes/record, where as
actual data is about 14 bytes/record (average).
[snip]

I've also played around with resizing the vectors by 2 entries when
they are full to prevent them from doubling in size when they fill;
this didn't seem to be a benefit....

Note that when one resizes a vector, it is not required that the vector
have the exact capacity that you specify, it may allocate more memory than
you specify. So if you .resize(2), the vector may actually have allocated
memory to hold 16.

Reminder that .reserve and .resize don't do the same thing....

I was pushing all the instances onto vectors to avoid any memory
leaking and also thought it would be better for daily audits of the
data. Maybe this is my problem?

I suppose if you want stricter control, you might try std::list ?

Denis Remezov · Sep 2, 2004

Kenneth said:
I'm working on a scheme to store several millions of records (Subdata),
each of which contains a variable number of sub records (Celldata). My concern is that
my prototype uses a lot more memory than I'm expecting. To store 1.7million Subdatas
that have an average of 7 Celldatas, I'm using 85 Mb. Thats about 50 bytes/record, where as
actual data is about 14 bytes/record (average).
[...]

class Celldata
{
public:
short cellnum:10,
time_stamp:6;

Celldata(int, int);
};

class Subdata
{
public:
vector<Celldata> cells;

int add_cell(int cell, int time);
};

class Esn_global
{
public:

vector<Subdata> sublists;
}

[...]

Let's assume pointer and integer size is 4 for the sake of calculations.

First, look at the dynamic part of
vector<Subdata> sublists;

You didn't account for
1.7 million * sizeof(Subdata)

where
sizeof(Subdata) is typically sizeof(vector<Celldata>) which
will possibly be 12 bytes (a vector needs to know where its reserved
storage begins and ends, as well as how many elements are actually used).

That's not the end of it yet. Unless you reserve the right number of
elements for sublists initially, the storage will need to be reallocated,
possibly multiple times. Reallocation of a large block may leave an
unused region not big enough for larger size allocation requests but
too big to be used up by all subsequently allocated small objects
combined. In addition, unless perhaps you know the element count
beforehand, the reserved storage of sublists will be larger than
the used part.

Now, the dynamic part of vector<Celldata> cells:
Add the overhead of heap allocation performed by the vector allocator
to what Andre has mentioned.

All things combined, 50 bytes per Subdata looks reasonable to me.
All of the above, of course, are just guesses. I would first find out
which part of that is actually the most problematic, then optimize
accordingly.

For example, for vector<Celldata>, you /might/ want to use some sort
of a small object allocator (instead of the default one).

For vector<Subdata> sublists, you /might/ benefit from an allocator
using its own dedicated heap. Alternatively, you could implement and
use a different kind of container (which, e.g., would allocate its
elements in pages). It may also prove totally unnecessary depending
on the results of the profiling.

Denis

Derrick Coetzee · Sep 2, 2004

Andre said:
I suppose if you want stricter control, you might try std::list ?

With 12 bytes per record, a list isn't about to solve the OP's memory
woes, except maybe with a little list unrolling. Dealing with all
operations on unrolled lists is complicated, but if there isn't an
unrolled list template out there there should be.

Ivan Vecerina · Sep 2, 2004

Denis Remezov said:
For vector<Subdata> sublists, you /might/ benefit from an allocator
using its own dedicated heap. Alternatively, you could implement and
use a different kind of container (which, e.g., would allocate its
elements in pages).

Such a container exists in the standard library: it's called std::deque,
and offers pretty much the same interface as std::vector.
std::deque<Subdata> is likely to be a good alternative here.

hth -Ivan

Denis Remezov · Sep 3, 2004

Ivan said:
Such a container exists in the standard library: it's called std::deque,
and offers pretty much the same interface as std::vector.
std::deque<Subdata> is likely to be a good alternative here.

Thanks for noting that. My only gripe is that the standard deque doesn't
provide you any way of controlling the "page" (or "node") size, but it might
be in vain (for many cases including this, it's probably not at all critical,
and the default may be just good enough).

Denis

New comer's question on STL/vector::data() over Dev-Cpp	4	Jul 1, 2013
vector, but without the overhead of dynamic memory allocation	5	Mar 9, 2011
inexplicable System.NullReferenceException when using stl vector	1	May 24, 2006
Flaw in g++ implementation of stl vector	5	Oct 18, 2005
STL vector push_backpack	4	Nov 16, 2006
STL containers in shared memory	14	May 21, 2005
can't free memory for vector in map	3	Nov 15, 2008
Debug memory allocation with STL	1	Jan 27, 2004

stl vector/memory question

Kenneth W Del Signore

Andre Kostur

Denis Remezov

Derrick Coetzee

Ivan Vecerina

Denis Remezov

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads