stl vector/memory question

  • Thread starter Kenneth W Del Signore
  • Start date
K

Kenneth W Del Signore

Hi,

I'm working on a scheme to store several millions of records (Subdata),
each of which contains a variable number of sub records (Celldata). My concern is that
my prototype uses a lot more memory than I'm expecting. To store 1.7million Subdatas
that have an average of 7 Celldatas, I'm using 85 Mb. Thats about 50 bytes/record, where as
actual data is about 14 bytes/record (average).

Any insight is greatly appreciated!
Ken

Cliffnoted code is as follows:

class Celldata
{
public:
short cellnum:10,
time_stamp:6;

Celldata(int, int);
};


class Subdata
{
public:
vector<Celldata> cells;

int add_cell(int cell, int time);
};

class Esn_global
{
public:

vector<Subdata> sublists;
}


int
Subdata::add_cell( int cell, int time)
{
Celldata newcell(cell, time);
cells.push_back(newcell);
return 1;
}


Esn_global g_esn;

main()
{
for (loop forever)
{
-read record containing subscriber identifier and cell
- add cell to subcribers list (if not already on list)
}

}


I've trimmed some of the code out, there is another structure that takes a subscriber's
identifier (phone number) and returns an index into g_esn.sublists that corresponds to that users
data.

I'm wondering if I should be making new class instances with a 'new' and then pushing onto my
vectors?

I've also played around with resizing the vectors by 2 entries when they are full to prevent them from
doubling in size when they fill; this didn't seem to be a benefit....

I was pushing all the instances onto vectors to avoid any memory leaking and also thought it would be better
for daily audits of the data. Maybe this is my problem?

thanks
 
A

Andre Kostur

Hi,

I'm working on a scheme to store several millions of records
(Subdata),
each of which contains a variable number of sub records (Celldata).
My concern is that my prototype uses a lot more memory than I'm
expecting. To store 1.7million Subdatas that have an average of 7
Celldatas, I'm using 85 Mb. Thats about 50 bytes/record, where as
actual data is about 14 bytes/record (average).
[snip]

I've also played around with resizing the vectors by 2 entries when
they are full to prevent them from doubling in size when they fill;
this didn't seem to be a benefit....

Note that when one resizes a vector, it is not required that the vector
have the exact capacity that you specify, it may allocate more memory than
you specify. So if you .resize(2), the vector may actually have allocated
memory to hold 16.

Reminder that .reserve and .resize don't do the same thing....
I was pushing all the instances onto vectors to avoid any memory
leaking and also thought it would be better for daily audits of the
data. Maybe this is my problem?

I suppose if you want stricter control, you might try std::list ?
 
D

Denis Remezov

Kenneth said:
I'm working on a scheme to store several millions of records (Subdata),
each of which contains a variable number of sub records (Celldata). My concern is that
my prototype uses a lot more memory than I'm expecting. To store 1.7million Subdatas
that have an average of 7 Celldatas, I'm using 85 Mb. Thats about 50 bytes/record, where as
actual data is about 14 bytes/record (average).
[...]


class Celldata
{
public:
short cellnum:10,
time_stamp:6;

Celldata(int, int);
};

class Subdata
{
public:
vector<Celldata> cells;

int add_cell(int cell, int time);
};

class Esn_global
{
public:

vector<Subdata> sublists;
}
[...]

Let's assume pointer and integer size is 4 for the sake of calculations.

First, look at the dynamic part of
vector<Subdata> sublists;

You didn't account for
1.7 million * sizeof(Subdata)

where
sizeof(Subdata) is typically sizeof(vector<Celldata>) which
will possibly be 12 bytes (a vector needs to know where its reserved
storage begins and ends, as well as how many elements are actually used).

That's not the end of it yet. Unless you reserve the right number of
elements for sublists initially, the storage will need to be reallocated,
possibly multiple times. Reallocation of a large block may leave an
unused region not big enough for larger size allocation requests but
too big to be used up by all subsequently allocated small objects
combined. In addition, unless perhaps you know the element count
beforehand, the reserved storage of sublists will be larger than
the used part.

Now, the dynamic part of vector<Celldata> cells:
Add the overhead of heap allocation performed by the vector allocator
to what Andre has mentioned.

All things combined, 50 bytes per Subdata looks reasonable to me.
All of the above, of course, are just guesses. I would first find out
which part of that is actually the most problematic, then optimize
accordingly.

For example, for vector<Celldata>, you /might/ want to use some sort
of a small object allocator (instead of the default one).

For vector<Subdata> sublists, you /might/ benefit from an allocator
using its own dedicated heap. Alternatively, you could implement and
use a different kind of container (which, e.g., would allocate its
elements in pages). It may also prove totally unnecessary depending
on the results of the profiling.

Denis
 
D

Derrick Coetzee

Andre said:
I suppose if you want stricter control, you might try std::list ?

With 12 bytes per record, a list isn't about to solve the OP's memory
woes, except maybe with a little list unrolling. Dealing with all
operations on unrolled lists is complicated, but if there isn't an
unrolled list template out there there should be.
 
I

Ivan Vecerina

Denis Remezov said:
For vector<Subdata> sublists, you /might/ benefit from an allocator
using its own dedicated heap. Alternatively, you could implement and
use a different kind of container (which, e.g., would allocate its
elements in pages).
Such a container exists in the standard library: it's called std::deque,
and offers pretty much the same interface as std::vector.
std::deque<Subdata> is likely to be a good alternative here.

hth -Ivan
 
D

Denis Remezov

Ivan said:
Such a container exists in the standard library: it's called std::deque,
and offers pretty much the same interface as std::vector.
std::deque<Subdata> is likely to be a good alternative here.

Thanks for noting that. My only gripe is that the standard deque doesn't
provide you any way of controlling the "page" (or "node") size, but it might
be in vain (for many cases including this, it's probably not at all critical,
and the default may be just good enough).

Denis
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,265
Latest member
TodLarocca

Latest Threads

Top