A Better Container Choice?

Mike Copeland · Aug 22, 2013

My current application has 2 large data sets that are combined into a
single data set that I must access by (part of) a string value.
Currently I have the structure declared as a map object, but after
populating the basic information I am adding information from another
database that's much larger - in a many-to-one situation.
Here's the fundamental information I use:
struct Res_Struct // Individual Event Finisher data
{
int resEvtNum; // link to Events table
int resYear; // Event Year
int resOAll; // OverAll Finish position
int resD_P; // Division Place
long resTime; // Finish Time
} resWork;
struct Hist_Fins // individual Finisher's results
{
int evtNum; // Result's Event # link
string PRF; // P/R indicator
Res_Struct histInfo; // Finisher's result(s) info
} histWork;
vector<Hist_Fins>::iterator hIter;
struct Fin_Struct // Individual Finisher data
{
long finLink; // unique Finisher (link)
char finGender; // gender
int finCount; // # Finishes by this participant
string finName; // Finisher Name (Last, First M.)
string finDoB; // (derived) DoB from event Age/Year
vector<Hist_Fins> histVect;
} finWork;
map<int, Fin_Struct> finMap;
map<int, Fin_Struct>::iterator fIter;

Yes, this seems a bit convoluted, but the application has been
growing in size and complexity, and I've not had time to redesign...
The important issue here is that I have ~160,000 records that
construct the basic information in the Fin_Struct. My other data (~
400,000 records) comprise the information that populates the "histVect"
object - 1-200 vector items in each map object. The input data files
are flat text data files (referencing some earlier posts about file I/o
efficiency).
Note that the map has an integer key value, and values range from 101
through ~160,000. I don't use the "name" as a key because I normally
scan the entire map object to look for objects that match some part of
the name value (e.g. I want to find all objects with names that start
with "WAL", etc.).
The use of an STL map doesn't seem best, because I don't use the map
in a traditional way, and the loading of the map takes a lot of time
<sigh>. Since the data objects are consecutive in an integer range, I
wonder if another container would be a better choice. I could use a
vector (and reserve a good amount of space "going in", rather than let
slow runtime grow occur), but I think I'd lose significant "load time"
by not referencing a map as I'd have to scan the vector 400,000 or more
times during the 2nd file population...
Both files contain the integer value that links them, as well as the
"name" string.
Any thoughts? TIA

Ian Collins · Aug 23, 2013

Mike said:
Both files contain the integer value that links them, as well as the
"name" string.

Experiment, measure.

Or just use a database.

Mike Copeland · Aug 23, 2013

Experiment, measure.

Actually, I have (been - for some months). Not "measure" so much,
but as I experiment I find that using containers and combining them
significantly affects performance. 8<{{
So, as I played with containers (and converted my access from file-
based indexing), I encounter slower and slower loading of my data. So
slow, in fact, that I'm considering going back to file-based I/o on my
base files. My original implementation created byte seek addresses that
supported (fairly) fast access to individual data sets, but I thought
that was crude and inelegant when compared to STL containers. To date,
I was wrong...
Which is why I asked here if there is some sort of container that
would better serve my needs - I don't have experience with any but maps,
lists and vectors.

Or just use a database.

I don't have that option. My base files are fixed format (but
constantly growing).

Jorgen Grahn · Aug 26, 2013

Actually, I have (been - for some months). Not "measure" so much,
but as I experiment I find that using containers and combining them
significantly affects performance. 8<{{
So, as I played with containers (and converted my access from file-
based indexing), I encounter slower and slower loading of my data. So
slow, in fact, that I'm considering going back to file-based I/o on my
base files. My original implementation created byte seek addresses that
supported (fairly) fast access to individual data sets, but I thought
that was crude and inelegant when compared to STL containers. To date,
I was wrong...
Which is why I asked here if there is some sort of container that
would better serve my needs - I don't have experience with any but maps,
lists and vectors.

Your question was hard to understand -- I don't think I've seen any
evidence that anyone has. Can you explain the fundamental problem
you're trying to solve, without involving your current or current-1
solutions?

- what your data looks like on a high level (the "classes", and how
they're interconnected)
- what lookups you need to do, and how often (once a microsecond, or
just once?)
- what modifications, if any, you need to do.

I think lots of people here are experienced in these things, if they
just understand what problem to solve.

/Jorgen

STL Container Choice	11	Mar 28, 2012
A Better Choice?	11	Sep 28, 2013
Building a Large Container	26	Dec 2, 2013
Implementing a Q-Learning Algorithm with Logistic Regression Normalization in C++	0	Jun 4, 2025
Sorting an STL map	1	Oct 19, 2013
heterogenous container class	12	Jan 11, 2011
design choice	1	May 27, 2008
Erratic Container Behavior (list & vector)	2	Apr 13, 2014

A Better Container Choice?

Mike Copeland

Ian Collins

Mike Copeland

Jorgen Grahn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads