STL Container Choice

Discussion in 'C++' started by Mike Copeland, Mar 28, 2012.

  1. I have a data structure (similar to the one below) that contains a
    variety of data types, and I wish to store many objects of this type in
    a container that (1) allows fast retrieval access and (2) is reasonably
    easy to store and update. Its "access key" is an integer value (e.g.
    bibNum).
    It seems that a vector is my best choice here (fast direct access via
    the key and ease of storage), but map could be better - true? Are there
    guidelines for container selection that could help me decide, or is some
    other container type a better choice? Pleas advise. TIA

    struct FD_TYPE // .FIN/.CFF file data
    {
    int bibNum; // Bib #
    int finTime; // Finish Time
    short entDivNum; // Division
    short divFinPos; // division Finish Position
    short evtFinPos; // Event Finish Position
    char entGender; // Sex
    char phaseNum; // Phase #
    short entAge; // Age
    char rCode; // RCode
    char entType; // Entrant Type
    short numLaps; // # of Laps
    short waveNum; // Wave #
    char teamTypeCode; // Team Type Code
    short evtYear; // Event Year
    short penalty; // Penalty
    bool bOK; // usage flag
    string teamName; // Team name (or "NONE")
    string entName; // Entrant Name
    } extern FinData;
    Mike Copeland, Mar 28, 2012
    #1
    1. Advertising

  2. Mike Copeland

    Jorgen Grahn Guest

    On Wed, 2012-03-28, Mike Copeland wrote:
    > I have a data structure (similar to the one below) that contains a
    > variety of data types, and I wish to store many objects of this type in
    > a container that (1) allows fast retrieval access and (2) is reasonably
    > easy to store and update. Its "access key" is an integer value (e.g.
    > bibNum).
    > It seems that a vector is my best choice here (fast direct access via
    > the key and ease of storage), but map could be better - true?


    Not enough information.
    - if bibNum==4711 exists, do 0--4710 exist, too?
    - would it be useful for you to have them sorted?
    - how do you plan to modify the container?
    - what does "many objects" mean to you?

    The way you speak of the data in terms of <key, value> makes me think
    you really want std::map or std::tr1::unordered_map. Use the former if
    it's useful for you to access the entries sorted on bibNum.

    > Are there
    > guidelines for container selection that could help me decide, or is some
    > other container type a better choice? Pleas advise. TIA
    >
    > struct FD_TYPE // .FIN/.CFF file data
    > {
    > int bibNum; // Bib #
    > int finTime; // Finish Time
    > short entDivNum; // Division

    ....
    > string entName; // Entrant Name
    > } extern FinData;


    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
    Jorgen Grahn, Mar 28, 2012
    #2
    1. Advertising

  3. Mike Copeland

    Jorgen Grahn Guest

    On Wed, 2012-03-28, Sam wrote:
    ....
    > Mike Copeland writes:
    >
    >> I have a data structure (similar to the one below) that contains a
    >> variety of data types, and I wish to store many objects of this type in
    >> a container that (1) allows fast retrieval access and (2) is reasonably
    >> easy to store and update. Its "access key" is an integer value (e.g.
    >> bibNum).
    >> It seems that a vector is my best choice here (fast direct access via
    >> the key and ease of storage), but map could be better - true? Are there
    >> guidelines for container selection that could help me decide, or is some
    >> other container type a better choice? Pleas advise. TIA

    >
    > If your keys are generally sequential, without any large gaps, a vector
    > would be fine.


    It depends. Having to deal with elements which exist but yet don't
    exist can lead to messy code and bugs.

    > If your keys are sparse, a map is a better choice.
    >
    > You should also used a typedef to indirectly refer to your container, i.e.
    > typedef std::vector<FD_TYPE> FD_TYPE_cont_t, then always use FD_TYPE_cont_t
    > to declare the container type, and always use FD_TYPE_cont_t::iterator and
    > FD_TYPE_cont_t::const_iterator, to declare any iterators.
    >
    > In the event that you change your mind regarding the container later, having
    > an intermediate typedef saves a tremendous amount of pain.


    A matter of personal taste, of course, but I disagree. The containers
    are so different that if you switch, you would have to review all the
    code anyway.

    (I also don't see how a simple search-and-replace is a tremendous
    amount of pain. There are tools for such things, and the compiler will
    catch any mistakes.)

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
    Jorgen Grahn, Mar 29, 2012
    #3
  4. In article <>,
    says...
    > > I have a data structure (similar to the one below) that contains a
    > > variety of data types, and I wish to store many objects of this type in
    > > a container that (1) allows fast retrieval access and (2) is reasonably
    > > easy to store and update. Its "access key" is an integer value (e.g.
    > > bibNum).
    > > It seems that a vector is my best choice here (fast direct access via
    > > the key and ease of storage), but map could be better - true?

    >
    > Not enough information.
    > - if bibNum==4711 exists, do 0--4710 exist, too?

    No. Objects would randomly exist throughout, with non-existant
    objects unknown via any access process. The range of possible index
    values is 1..1,000,000,000.
    > - would it be useful for you to have them sorted?

    No. I simply want to store the object and easily/quickly retrieve
    them as needed. (A binary search is meaningless here, given the random
    distribution of object.)
    > - how do you plan to modify the container?

    I (may need to) modify individual data elements of the object.
    > - what does "many objects" mean to you?

    100-100,000 objects.
    >
    > The way you speak of the data in terms of <key, value> makes me think
    > you really want std::map or std::tr1::unordered_map. Use the former if
    > it's useful for you to access the entries sorted on bibNum.
    >
    > > Are there
    > > guidelines for container selection that could help me decide, or is some
    > > other container type a better choice? Pleas advise. TIA
    > >
    > > struct FD_TYPE // .FIN/.CFF file data
    > > {
    > > int bibNum; // Bib #
    > > int finTime; // Finish Time
    > > short entDivNum; // Division

    > ...
    > > string entName; // Entrant Name
    > > } extern FinData;
    Mike Copeland, Mar 29, 2012
    #4
  5. Mike Copeland

    Jorgen Grahn Guest

    On Wed, 2012-03-28, Mike Copeland wrote:
    > In article <>,
    > says...
    >> > I have a data structure (similar to the one below) that contains a
    >> > variety of data types, and I wish to store many objects of this type in
    >> > a container that (1) allows fast retrieval access and (2) is reasonably
    >> > easy to store and update. Its "access key" is an integer value (e.g.
    >> > bibNum).
    >> > It seems that a vector is my best choice here (fast direct access via
    >> > the key and ease of storage), but map could be better - true?

    >>
    >> Not enough information.
    >> - if bibNum==4711 exists, do 0--4710 exist, too?

    > No. Objects would randomly exist throughout, with non-existant
    > objects unknown via any access process. The range of possible index
    > values is 1..1,000,000,000.
    >> - would it be useful for you to have them sorted?

    > No. I simply want to store the object and easily/quickly retrieve
    > them as needed. (A binary search is meaningless here, given the random
    > distribution of object.)


    I don't understand that last part. Binary search in a std::vector
    sorted by key can be very useful (but insertion/removal would be very
    expensive so it's only worth considering if your container rarely
    changes.)

    Or are you saying the relative order of elements in your container
    carries information, which would be destroyed by sorting it?

    >> - how do you plan to modify the container?

    > I (may need to) modify individual data elements of the object.


    But never the container itself, apart from inserting into it?

    >> - what does "many objects" mean to you?

    > 100-100,000 objects.


    OK. Then

    >> The way you speak of the data in terms of <key, value> makes me think
    >> you really want std::map or std::tr1::unordered_map. Use the former if
    >> it's useful for you to access the entries sorted on bibNum.


    I think unordered_map is the most natural choice, closely followed by
    plain old map.

    If you know them, unordered map is pretty much the same thing as
    Perl's hash, Python's dict, and the non-standard STL hash_map.

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
    Jorgen Grahn, Mar 29, 2012
    #5
  6. In article <>,
    says...
    > > No. I simply want to store the object and easily/quickly retrieve
    > > them as needed. (A binary search is meaningless here, given the random
    > > distribution of object.)

    >
    > I don't understand that last part. Binary search in a std::vector
    > sorted by key can be very useful (but insertion/removal would be very
    > expensive so it's only worth considering if your container rarely
    > changes.)


    There container is constructed at the start of the application, not
    changed during, but the data in the objects may be adjusted (counts
    updated, etc.). The container is static otherwise.

    > Or are you saying the relative order of elements in your container
    > carries information, which would be destroyed by sorting it?


    No, sorting wouldn't affect its information. However, the data
    objects would be (very) sparsely scattered across the container's range.
    Perhaps I don't understand how a vector is allocated, but it seems to
    me that, if sorted, such a container loses its viability for a binary
    search - because the sorting doesn't "compact" the data. Am I wrong
    here?

    > >> - how do you plan to modify the container?

    > > I (may need to) modify individual data elements of the object.

    >
    > But never the container itself, apart from inserting into it?


    True.
    >
    > >> - what does "many objects" mean to you?

    > > 100-100,000 objects.

    >
    > OK. Then
    >
    > >> The way you speak of the data in terms of <key, value> makes me think
    > >> you really want std::map or std::tr1::unordered_map. Use the former if
    > >> it's useful for you to access the entries sorted on bibNum.

    >
    > I think unordered_map is the most natural choice, closely followed by
    > plain old map.


    (Not being very familiar with STL containers,) how does one declare
    an "unordered_map"? I thought the options were on;y map and multumap...
    8<{{
    Mike Copeland, Mar 29, 2012
    #6
  7. Mike Copeland

    Jorgen Grahn Guest

    On Thu, 2012-03-29, Mike Copeland wrote:
    > In article <>,
    > says...
    >> > No. I simply want to store the object and easily/quickly retrieve
    >> > them as needed. (A binary search is meaningless here, given the random
    >> > distribution of object.)

    >>
    >> I don't understand that last part. Binary search in a std::vector
    >> sorted by key can be very useful (but insertion/removal would be very
    >> expensive so it's only worth considering if your container rarely
    >> changes.)

    >
    > There container is constructed at the start of the application, not
    > changed during, but the data in the objects may be adjusted (counts
    > updated, etc.). The container is static otherwise.
    >
    >> Or are you saying the relative order of elements in your container
    >> carries information, which would be destroyed by sorting it?

    >
    > No, sorting wouldn't affect its information. However, the data
    > objects would be (very) sparsely scattered across the container's range.
    > Perhaps I don't understand how a vector is allocated, but it seems to
    > me that, if sorted, such a container loses its viability for a binary
    > search - because the sorting doesn't "compact" the data. Am I wrong
    > here?


    Kind of. I was thinking of a vector<T> where the indices are irrelevant
    and the key is still a member of T (like in your example)

    [4, 32, 33, 44, 1567, 1600, 16218]

    You can do a binary search in such a vector (there's even
    std::lower_bound() etc so you don't have to implement it yourself).

    But I suspect this is superior to unordered_map only in special cases,
    when sizeof(T) is very small and so on ...

    >> >> - how do you plan to modify the container?
    >> > I (may need to) modify individual data elements of the object.

    >>
    >> But never the container itself, apart from inserting into it?

    >
    > True.
    >>
    >> >> - what does "many objects" mean to you?
    >> > 100-100,000 objects.

    >>
    >> OK. Then
    >>
    >> >> The way you speak of the data in terms of <key, value> makes me think
    >> >> you really want std::map or std::tr1::unordered_map. Use the former if
    >> >> it's useful for you to access the entries sorted on bibNum.

    >>
    >> I think unordered_map is the most natural choice, closely followed by
    >> plain old map.

    >
    > (Not being very familiar with STL containers,) how does one declare
    > an "unordered_map"? I thought the options were on;y map and multumap...
    > 8<{{


    Check e.g. the Wikipedia. unordered_map is not available in C++98, but
    came with TR1 and is part of C++11. Even my fairly old g++ installation
    has it. It's also similar to the good old STL hash_map, which never
    made it into the standard:

    http://www.sgi.com/tech/stl/hash_map.html

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
    Jorgen Grahn, Mar 29, 2012
    #7
  8. Jorgen Grahn <> wrote:
    > But I suspect this is superior to unordered_map only in special cases,
    > when sizeof(T) is very small and so on ...


    Why would std::unordered_map be better than std::vector if sizeof(T)
    is large?
    Juha Nieminen, Mar 30, 2012
    #8
  9. Mike Copeland <> wrote:
    > There container is constructed at the start of the application, not
    > changed during, but the data in the objects may be adjusted (counts
    > updated, etc.). The container is static otherwise.


    Then use a sorted std::vector.
    Juha Nieminen, Mar 30, 2012
    #9
  10. Mike Copeland

    Jorgen Grahn Guest

    On Fri, 2012-03-30, Juha Nieminen wrote:
    > Jorgen Grahn <> wrote:
    >> But I suspect this is superior to unordered_map only in special cases,
    >> when sizeof(T) is very small and so on ...

    >
    > Why would std::unordered_map be better than std::vector if sizeof(T)
    > is large?


    Why not? If you believe binary search in a vector is faster than
    unordered_map lookup in general, then that's what we should discuss, I
    think.

    In my unscientific benchmarks, binary search only wins if the elements
    are few (less than 100, or something like that).

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
    Jorgen Grahn, Mar 30, 2012
    #10
  11. Mike Copeland

    Öö Tiib Guest

    On Mar 29, 2:19 am, Jorgen Grahn <> wrote:
    > On Wed, 2012-03-28, Sam wrote:
    >
    > > In the event that you change your mind regarding the container later, having
    > > an intermediate typedef saves a tremendous amount of pain.

    >
    > A matter of personal taste, of course, but I disagree. The containers
    > are so different that if you switch, you would have to review all the
    > code anyway.


    Some may read it that you suggest to use something like
    "std::unordered_map<A,B>" explicitly everywhere instead of some
    "BsAtAs"?

    The question "what to take?" arises usually when each of the container
    candidates have some overhead compared to what is actually needed and
    the common interface can be used. For example the similarities of
    interfaces of std::map<X,Y>, std::unordered_map<X,Y> and
    std::vector<std::pair<X,Y>> feel quite substantial in context of
    current thread.

    > (I also don't see how a simple search-and-replace is a tremendous
    > amount of pain. There are tools for such things, and the compiler will
    > catch any mistakes.)


    People are different. Building a regexp to search-and-replace
    everything from "unordered_map<X,Y>" to "std :: unordered_map < X , Y
    >" is trivial to some and pain for others. It may depend how large the

    project is and how massively the type is used and how quickly it all
    compiles.
    Öö Tiib, Apr 2, 2012
    #11
  12. Mike Copeland

    Jorgen Grahn Guest

    On Mon, 2012-04-02, Öö Tiib wrote:
    > On Mar 29, 2:19 am, Jorgen Grahn <> wrote:
    >> On Wed, 2012-03-28, Sam wrote:
    >>
    >> > In the event that you change your mind regarding the container later, having
    >> > an intermediate typedef saves a tremendous amount of pain.

    >>
    >> A matter of personal taste, of course, but I disagree. The containers
    >> are so different that if you switch, you would have to review all the
    >> code anyway.

    >
    > Some may read it that you suggest to use something like
    > "std::unordered_map<A,B>" explicitly everywhere instead of some
    > "BsAtAs"?


    Yes. (I can't see how what I wrote can be interpreted any other way.)

    But "everywhere" tends to be rather few places, in my experience.
    Perhaps that's just my programming style, but my containers tend to
    be fairly well encapsulated inside classes. E.g. a Mouth rather than a
    std::vector<Tooth>.

    > The question "what to take?" arises usually when each of the container
    > candidates have some overhead compared to what is actually needed and
    > the common interface can be used. For example the similarities of
    > interfaces of std::map<X,Y>, std::unordered_map<X,Y> and
    > std::vector<std::pair<X,Y>> feel quite substantial in context of
    > current thread.
    >
    >> (I also don't see how a simple search-and-replace is a tremendous
    >> amount of pain. There are tools for such things, and the compiler will
    >> catch any mistakes.)

    >
    > People are different. Building a regexp to search-and-replace
    > everything from "unordered_map<X,Y>" to "std :: unordered_map < X , Y
    >>" is trivial to some and pain for others.


    That's exaggerated; you probably don't use whitespace that
    inconsistently, and it doesn't matter if you miss a few occurrences
    since (see above) the compiler will point it out for you.

    > It may depend how large the
    > project is and how massively the type is used and how quickly it all
    > compiles.


    To some degree, yes.

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
    Jorgen Grahn, Apr 2, 2012
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mickey Segal
    Replies:
    0
    Views:
    856
    Mickey Segal
    Feb 2, 2004
  2. Maitre Bart
    Replies:
    2
    Views:
    512
    Maitre Bart
    Feb 11, 2004
  3. Replies:
    4
    Views:
    786
    Daniel T.
    Feb 16, 2006
  4. wolverine
    Replies:
    2
    Views:
    439
    Marcus Kwok
    Jul 24, 2006
  5. miles.jg
    Replies:
    16
    Views:
    857
    Alf P. Steinbach
    Nov 14, 2007
Loading...

Share This Page