A Better Container Choice?


M

Mike Copeland

My current application has 2 large data sets that are combined into a
single data set that I must access by (part of) a string value.
Currently I have the structure declared as a map object, but after
populating the basic information I am adding information from another
database that's much larger - in a many-to-one situation.
Here's the fundamental information I use:
struct Res_Struct // Individual Event Finisher data
{
int resEvtNum; // link to Events table
int resYear; // Event Year
int resOAll; // OverAll Finish position
int resD_P; // Division Place
long resTime; // Finish Time
} resWork;
struct Hist_Fins // individual Finisher's results
{
int evtNum; // Result's Event # link
string PRF; // P/R indicator
Res_Struct histInfo; // Finisher's result(s) info
} histWork;
vector<Hist_Fins>::iterator hIter;
struct Fin_Struct // Individual Finisher data
{
long finLink; // unique Finisher (link)
char finGender; // gender
int finCount; // # Finishes by this participant
string finName; // Finisher Name (Last, First M.)
string finDoB; // (derived) DoB from event Age/Year
vector<Hist_Fins> histVect;
} finWork;
map<int, Fin_Struct> finMap;
map<int, Fin_Struct>::iterator fIter;

Yes, this seems a bit convoluted, but the application has been
growing in size and complexity, and I've not had time to redesign...
The important issue here is that I have ~160,000 records that
construct the basic information in the Fin_Struct. My other data (~
400,000 records) comprise the information that populates the "histVect"
object - 1-200 vector items in each map object. The input data files
are flat text data files (referencing some earlier posts about file I/o
efficiency).
Note that the map has an integer key value, and values range from 101
through ~160,000. I don't use the "name" as a key because I normally
scan the entire map object to look for objects that match some part of
the name value (e.g. I want to find all objects with names that start
with "WAL", etc.).
The use of an STL map doesn't seem best, because I don't use the map
in a traditional way, and the loading of the map takes a lot of time
<sigh>. Since the data objects are consecutive in an integer range, I
wonder if another container would be a better choice. I could use a
vector (and reserve a good amount of space "going in", rather than let
slow runtime grow occur), but I think I'd lose significant "load time"
by not referencing a map as I'd have to scan the vector 400,000 or more
times during the 2nd file population...
Both files contain the integer value that links them, as well as the
"name" string.
Any thoughts? TIA
 
Ad

Advertisements

M

Mike Copeland

Experiment, measure.

Actually, I have (been - for some months). Not "measure" so much,
but as I experiment I find that using containers and combining them
significantly affects performance. 8<{{
So, as I played with containers (and converted my access from file-
based indexing), I encounter slower and slower loading of my data. So
slow, in fact, that I'm considering going back to file-based I/o on my
base files. My original implementation created byte seek addresses that
supported (fairly) fast access to individual data sets, but I thought
that was crude and inelegant when compared to STL containers. To date,
I was wrong...
Which is why I asked here if there is some sort of container that
would better serve my needs - I don't have experience with any but maps,
lists and vectors.
Or just use a database.
I don't have that option. My base files are fixed format (but
constantly growing).
 
Ad

Advertisements

J

Jorgen Grahn

Actually, I have (been - for some months). Not "measure" so much,
but as I experiment I find that using containers and combining them
significantly affects performance. 8<{{
So, as I played with containers (and converted my access from file-
based indexing), I encounter slower and slower loading of my data. So
slow, in fact, that I'm considering going back to file-based I/o on my
base files. My original implementation created byte seek addresses that
supported (fairly) fast access to individual data sets, but I thought
that was crude and inelegant when compared to STL containers. To date,
I was wrong...
Which is why I asked here if there is some sort of container that
would better serve my needs - I don't have experience with any but maps,
lists and vectors.

Your question was hard to understand -- I don't think I've seen any
evidence that anyone has. Can you explain the fundamental problem
you're trying to solve, without involving your current or current-1
solutions?

- what your data looks like on a high level (the "classes", and how
they're interconnected)
- what lookups you need to do, and how often (once a microsecond, or
just once?)
- what modifications, if any, you need to do.

I think lots of people here are experienced in these things, if they
just understand what problem to solve.

/Jorgen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Sorting an STL map 1
STL Container Choice 11
TF-IDF 1
Building a Large Container 26
Is It Possible...? 3
Using Class Objects 8
Question about my projects 3
How to use Flow-guided video completion (FGVC)? 0

Top