STL vector

Discussion in 'C++' started by Peter Liedermann, Nov 15, 2011.

  1. Hi there,

    Let "Jumbo" be the name of a class whose instances consume substantial
    storage.

    A vector of typically 1000 such Jumbo-instances is needed. It must be
    sorted by a simple int criterium, but not often.

    Is there a substantial performance difference between vector <Jumbo> and
    vector <Jumbo *>, provided that everything else is adapted accordingly? Is
    there a difference at all?

    Thanx a lot,
    Peter
    Peter Liedermann, Nov 15, 2011
    #1
    1. Advertising

  2. On 15.11.2011 20:50, Peter Liedermann wrote:
    > Let "Jumbo" be the name of a class whose instances consume substantial
    > storage.
    >
    > A vector of typically 1000 such Jumbo-instances is needed. It must be
    > sorted by a simple int criterium, but not often.
    >
    > Is there a substantial performance difference between vector <Jumbo> and
    > vector <Jumbo *>, provided that everything else is adapted accordingly?
    > Is there a difference at all?


    In general you need to copy the Jumbo objects to put them into the
    vector. The copy constructor of Jumbo might be quite expensive if it is
    not a POD like type. Is that what you want?

    You should also think about the (re)allocations of vector<Jumbo>. A
    vector will in general allocate more slots than needed. If Jumbo is
    large, then this might be a reasonable amount of memory. Furthermore, a
    vector might not start with it's final size. So there are some
    reallocations necessary before the vector has it's final size. These
    reallocations copy the entire content.

    And last but not least, allocating very large objects could cause heap
    fragmentation. This might be a reasonable impact, especially on 32 bit
    platforms where the virtual memory may be out long before the physical
    memory.

    On the other side the allocation overhead of large objects is usually
    negligible. So almost everything argue for a solution with some references.

    As a general rule of thumb large objects should preferably used as
    reference type. While small objects should be used as value types.

    Personally I would prefer vector<intrusive_ptr<Jumbo>> or some other
    smart pointer over vector<Jumbo*>, because it answers the ownership
    question more clearly. However, details depend on your application.


    Marcel
    Marcel Müller, Nov 15, 2011
    #2
    1. Advertising

  3. On 15.11.2011 22:13, Leigh Johnston wrote:
    > On 15/11/2011 20:30, Marcel Müller wrote:
    >>
    >> And last but not least, allocating very large objects could cause heap
    >> fragmentation. This might be a reasonable impact, especially on 32 bit
    >> platforms where the virtual memory may be out long before the physical
    >> memory.

    >
    > Surely allocating *small* objects is more likely to cause heap
    > fragmentation than allocating very large objects? It is more likely that
    > a very large object allocation will *fail* in a fragmented heap but this
    > is not what you said.


    It depends on what you call 'much fragmentation'. If we are talking
    about the number of free fragments, then you are right. But if we are
    talking about the amount of unused memory, the large objects are often
    more critical, because they likely do not to fit in any previously used
    free space.

    Especially if you have only one large object and if its final size is
    obtained by several reallocations. If the next allocation block will
    never fit into the previously freed blocks then you get almost 50%
    unused address space. And this is not that unlikely. E.g. if the
    allocation size grows exponentially by a factor of two in each step or
    if smaller, longer lived objects are allocated between the large
    objects. (A good old Matlab problem when building up large matrices
    incrementally, or in REXX when concatenating a large number of small
    strings.)


    Marcel
    Marcel Müller, Nov 15, 2011
    #3
  4. Peter Liedermann

    Joe Gottman Guest

    On 11/15/2011 2:50 PM, Peter Liedermann wrote:
    > Hi there,
    >
    > Let "Jumbo" be the name of a class whose instances consume substantial
    > storage.
    >
    > A vector of typically 1000 such Jumbo-instances is needed. It must be
    > sorted by a simple int criterium, but not often.
    >
    > Is there a substantial performance difference between vector <Jumbo> and
    > vector <Jumbo *>, provided that everything else is adapted accordingly?
    > Is there a difference at all?
    >
    > Thanx a lot,
    > Peter
    >


    1) If you know in advance the size of the vector, you can reserve the
    right amount of capacity before you start filling it and thus avoid the
    costs of reallocation.

    2) If you can define an efficient swap() function for your class,
    then sorting a vector<Jumbo> would not involve any copying whatever.


    Joe Gottman.
    Joe Gottman, Nov 16, 2011
    #4
  5. On Tue, 15 Nov 2011 21:30:14 +0100, Marcel Müller
    <> wrote:

    > On 15.11.2011 20:50, Peter Liedermann wrote:
    >> Let "Jumbo" be the name of a class whose instances consume substantial
    >> storage.
    >>
    >> A vector of typically 1000 such Jumbo-instances is needed. It must be
    >> sorted by a simple int criterium, but not often.
    >>
    >> Is there a substantial performance difference between vector <Jumbo> and
    >> vector <Jumbo *>, provided that everything else is adapted accordingly?
    >> Is there a difference at all?

    >
    > In general you need to copy the Jumbo objects to put them into the
    > vector. The copy constructor of Jumbo might be quite expensive if it is
    > not a POD like type. Is that what you want?


    I think this is the key to my question: Jumbo * is a POD type and Jumbo
    itself is clearly not. For illustration of the term POD, I found
    http://www.fnal.gov/docs/working-groups/fpcltf/Pkg/ISOcxx/doc/POD.html
    useful.


    > You should also think about the (re)allocations of vector<Jumbo>. A
    > vector will in general allocate more slots than needed. If Jumbo is
    > large, then this might be a reasonable amount of memory. Furthermore, a
    > vector might not start with it's final size. So there are some
    > reallocations necessary before the vector has it's final size. These
    > reallocations copy the entire content.


    Since I am now determined to use Jumbo * and can exactly predict the
    number of slots, this is not relevant to my situation although generally
    good to know. Question: If I would use vector <Jumbo> instead and the
    Jumbo objects were of varying length and I would start the vector with all
    NUL entries, could I then get the reallocation/copy problem mentioned?


    > Personally I would prefer vector<intrusive_ptr<Jumbo>> or some other
    > smart pointer over vector<Jumbo*>, because it answers the ownership
    > question more clearly. However, details depend on your application.
    >


    Thank you for this suggestion. I will practice the use of the mentioned
    smart pointers at a good opportunity. At the present project, though, I
    will use vector <Jumbo*> because ownership is out opf qustion and it is
    not the time to risk anything new.

    Thanx,
    Peter
    Peter Liedermann, Nov 16, 2011
    #5
  6. On Wed, 16 Nov 2011 03:59:56 +0100, Joe Gottman
    <> wrote:

    > On 11/15/2011 2:50 PM, Peter Liedermann wrote:
    >> Hi there,
    >>
    >> Let "Jumbo" be the name of a class whose instances consume substantial
    >> storage.
    >>
    >> A vector of typically 1000 such Jumbo-instances is needed. It must be
    >> sorted by a simple int criterium, but not often.
    >>
    >> Is there a substantial performance difference between vector <Jumbo> and
    >> vector <Jumbo *>, provided that everything else is adapted accordingly?
    >> Is there a difference at all?
    >>
    >> Thanx a lot,
    >> Peter
    >>

    >
    > 1) If you know in advance the size of the vector, you can reserve the
    > right amount of capacity before you start filling it and thus avoid the
    > costs of reallocation.


    yes, I do

    >
    > 2) If you can define an efficient swap() function for your class,
    > then sorting a vector<Jumbo> would not involve any copying whatever.
    >

    I can certainly swap Jumbo * (pointers) easily, swapping Jumbo objects
    themselves is certainly (much?) more difficult.
    So I am determined to use Jumbo * as of now.
    Thanx for the clarification
    Peter
    Peter Liedermann, Nov 16, 2011
    #6
  7. Re: STL vector / Heap fragmentation

    Hi together,
    I do not think that I presently have any problems with heap fragmentation.
    Your discussion, however, is probably interesting not only to me. I would
    appreciate continuation.
    Regards
    Peter
    Peter Liedermann, Nov 16, 2011
    #7
  8. Peter Liedermann

    Goran Guest

    Re: STL vector / Heap fragmentation

    On Nov 16, 9:02 am, "Peter Liedermann" <> wrote:
    > Hi together,
    > I do not think that I presently have any problems with heap fragmentation..
    > Your discussion, however, is probably interesting not only to me.  I would
    > appreciate continuation.


    It's perhaps worth adding that sorting the vector<Jumbo> in itself
    doesn't affect heap fragmentation, as sorting should be in-place.
    Rather, any allocations that Jumbo copying makes affect it. It Jumbo
    is big, but of small pieces, then it's likely you won't be affected.

    Goran.
    Goran, Nov 16, 2011
    #8
  9. Re: STL vector / Heap fragmentation

    On Wed, 16 Nov 2011 09:39:29 +0100, Goran <> wrote:


    > It's perhaps worth adding that sorting the vector<Jumbo> in itself
    > doesn't affect heap fragmentation, as sorting should be in-place.


    also if the Jumbo* instances are of varying length?


    > Rather, any allocations that Jumbo copying makes affect it. It Jumbo
    > is big, but of small pieces, then it's likely you won't be affected.


    Wouldn't Jumbo big and of small pieces make fragmentation worse?

    Peter
    Peter Liedermann, Nov 16, 2011
    #9
  10. Peter Liedermann

    Goran Guest

    Re: STL vector / Heap fragmentation

    On Nov 16, 10:58 am, "Peter Liedermann" <> wrote:
    > On Wed, 16 Nov 2011 09:39:29 +0100, Goran <> wrote:
    > > It's perhaps worth adding that sorting the vector<Jumbo> in itself
    > > doesn't affect heap fragmentation, as sorting should be in-place.

    >
    > also if the Jumbo* instances are of varying length?


    How can Jumbo* instances be of varying length? whatever* has pointer
    size (e.g. 4 or 8). You mean, Jumbo instances are of varying sizes?
    (As in, there's Jumbo-derived class instances that are bigger?) If so,
    no problem, instances themselves won't be moved around, only pinters
    to them. If you're using some trick to make Jumbo instances themselves
    of varying size (e.g. strunc xyz { int size; data elements[]; } and
    overloaded operator new etc.), then you can't put them in a
    vector<Jumbo>.
    >
    > > Rather, any allocations that Jumbo copying makes affect it. It Jumbo
    > > is big, but of small pieces, then it's likely you won't be affected.

    >
    > Wouldn't Jumbo big and of small pieces make fragmentation worse?


    Whoops, yes, likely. I rather meant "if Jumbo is big --because-- it's
    made of many small pieces".

    Goran.
    Goran, Nov 16, 2011
    #10
  11. Peter Liedermann

    Werner Guest

    On Nov 15, 9:50 pm, "Peter Liedermann" <> wrote:
    > Hi there,
    >
    > Let "Jumbo" be the name of a class whose instances consume substantial
    > storage.
    >
    > A vector of typically 1000 such Jumbo-instances is needed. It must be
    > sorted by a simple int criterium, but not often.
    >
    > Is there a substantial performance difference between vector <Jumbo> and
    > vector <Jumbo *>, provided that everything else is adapted accordingly? Is
    > there a difference at all?


    If, once the vector is populated, it does not change much,
    one option may be to use an additional vector containing
    iterators to the first. The vector containing the iterators
    remain sorted (the other one not). It needs to be repopulated
    under certain circumstances if items are inserted into the
    first or re-allocation happens due to lack of capacity.

    Another option is to look at boost::ptr_vector.
    - It automatically manages memory.
    - It uses a indirect interface that could simplify sorting.

    Kind regards,

    Werner
    Werner, Nov 16, 2011
    #11
  12. Peter Liedermann

    none Guest

    In article <>,
    Peter Liedermann <> wrote:
    >On Tue, 15 Nov 2011 21:30:14 +0100, Marcel Müller
    ><> wrote:
    >
    >> On 15.11.2011 20:50, Peter Liedermann wrote:
    >>> Let "Jumbo" be the name of a class whose instances consume substantial
    >>> storage.
    >>>
    >>> A vector of typically 1000 such Jumbo-instances is needed. It must be
    >>> sorted by a simple int criterium, but not often.
    >>>
    >>> Is there a substantial performance difference between vector <Jumbo> and
    >>> vector <Jumbo *>, provided that everything else is adapted accordingly?
    >>> Is there a difference at all?

    >>
    >> In general you need to copy the Jumbo objects to put them into the
    >> vector. The copy constructor of Jumbo might be quite expensive if it is
    >> not a POD like type. Is that what you want?

    >
    >I think this is the key to my question: Jumbo * is a POD type and Jumbo
    >itself is clearly not. For illustration of the term POD, I found
    >http://www.fnal.gov/docs/working-groups/fpcltf/Pkg/ISOcxx/doc/POD.html
    >useful.
    >
    >
    >> You should also think about the (re)allocations of vector<Jumbo>. A
    >> vector will in general allocate more slots than needed. If Jumbo is
    >> large, then this might be a reasonable amount of memory. Furthermore, a
    >> vector might not start with it's final size. So there are some
    >> reallocations necessary before the vector has it's final size. These
    >> reallocations copy the entire content.

    >
    >Since I am now determined to use Jumbo * and can exactly predict the
    >number of slots, this is not relevant to my situation although generally
    >good to know. Question: If I would use vector <Jumbo> instead and the
    >Jumbo objects were of varying length and I would start the vector with all
    >NUL entries, could I then get the reallocation/copy problem mentioned?


    Typically Jumbo should be more like:

    class Jumbo
    {
    public:
    Jumbo();
    // etc
    private:
    data * m_data;
    }

    So the "size" of Jumbo would not be sizeof(Jumbo) and Jumbo objects
    containing containing varying amount of data would all present the
    same sizeof(aJumboOject).

    for such a class, as mentionned elsewhere, a very efficient swap()
    function could be implemented by simply swapping the internal
    pointers.
    none, Nov 16, 2011
    #12
  13. Peter Liedermann

    Joe Gottman Guest

    On 11/16/2011 2:55 AM, Peter Liedermann wrote:
    > On Tue, 15 Nov 2011 21:30:14 +0100, Marcel Müller
    > <> wrote:
    >
    >> On 15.11.2011 20:50, Peter Liedermann wrote:
    >>> Let "Jumbo" be the name of a class whose instances consume substantial
    >>> storage.
    >>>
    >>> A vector of typically 1000 such Jumbo-instances is needed. It must be
    >>> sorted by a simple int criterium, but not often.
    >>>
    >>> Is there a substantial performance difference between vector <Jumbo> and
    >>> vector <Jumbo *>, provided that everything else is adapted accordingly?
    >>> Is there a difference at all?

    >>
    >> In general you need to copy the Jumbo objects to put them into the
    >> vector. The copy constructor of Jumbo might be quite expensive if it
    >> is not a POD like type. Is that what you want?

    >
    > I think this is the key to my question: Jumbo * is a POD type and Jumbo
    > itself is clearly not. For illustration of the term POD, I found
    > http://www.fnal.gov/docs/working-groups/fpcltf/Pkg/ISOcxx/doc/POD.html
    > useful.


    Depending on what Jumbo looks like, you could still sort a
    vector<Jumbo> efficiently if you can define an efficient swap()
    overload. For instance, suppose Jumbo is implemented as follows:

    struct Jumbo
    {
    std::string foo;
    std::vector<double> bar;
    } ;


    Then you can define swap() by first defining a public swap() member
    function:

    void swap(Jumbo &rhs)
    {
    //Almost all the types in the standard library have efficient
    swap() overloads
    swap(foo, rhs.foo);
    swap(bar, rhs.bar);
    }


    Then, define a global swap function in the same namespace as Jumbo (this
    is important!)

    inline void swap(Jumbo &lhs, Jumbo &rhs)
    {
    lhs.swap(rhs);
    }


    If you follow these two steps, and Jumbo consists solely of built-in
    types and standard library types, then sorting a vector of Jumbo becomes
    extremely efficient, with no need to ever allocate or deallocate memory
    for a temporary object.

    Joe Gottman
    Joe Gottman, Nov 16, 2011
    #13
  14. On Wed, 16 Nov 2011 15:02:44 +0100, Leigh Johnston <> wrote:

    > On 15/11/2011 19:50, Peter Liedermann wrote:


    > If you are going to allocate the Jumbo objects separately for use in
    > vector<Jumbo*> then you might as well just use std::set<Jumbo>; std::set
    > will keep the items sorted.
    >


    Thank you for the suggestion. However, I am not going to use set because
    I occasionally have to re-sort. In my case, this resort would be quite
    intense, that is, almost everything is repositioned each time. This is
    definitely not a hot spot in the algorithm, but I do not want to swear
    that it occurs at most so and so often. For this reason, I want to avoid
    tree-like structures, which an STL set essentially is.

    But BTW, this takes us back to a variation the original question: Would
    you, given the use of a set, really use std::set<Jumbo> or std::set<Jumbo
    *> instead and does it make a difference
    (that is, does replacing "vector" by "set" in the original question make a
    difference)

    Regards.
    Peter
    Peter Liedermann, Nov 17, 2011
    #14
  15. On Nov 17, 7:45 am, "Peter Liedermann" <> wrote:
    > On Wed, 16 Nov 2011 15:02:44 +0100, Leigh Johnston <> wrote:
    > > On 15/11/2011 19:50, Peter Liedermann wrote:
    > > If you are going to allocate the Jumbo objects separately for use in
    > > vector<Jumbo*> then you might as well just use std::set<Jumbo>; std::set
    > > will keep the items sorted.

    >
    > Thank you for the suggestion.  However, I am not going to use set because
    > I occasionally have to re-sort.
    >
    > In my case, this resort would be quite
    > intense, that is, almost everything is repositioned each time.  This is
    > definitely not a hot spot in the algorithm, but I do not want to swear
    > that it occurs at most so and so often.  For this reason, I want to avoid
    > tree-like structures, which an STL set essentially is.
    >
    > But BTW, this takes us back to a variation the original question: Would
    > you, given the use of a set, really use std::set<Jumbo> or std::set<Jumbo
    > *> instead and does it make a difference
    > (that is, does replacing "vector" by "set" in the original question make a
    > difference)


    It depends on a lot of things. You really haven't given me enough
    information to determine the answer.

    If you question is "Given a pre-existing list of stuff that's
    expensive to copy, how do I figure out what is the sorted sequence of
    those objects?", then the answer is pretty straightforward. Define the
    following class:
    struct JumboPointerSortRule
    {
    bool operator() (Jumbo* x, Jumbo* y) const { return *x < *y; }
    };
    Then create a std::set<Jumbo*, JumboPointerSortRule>, and insert a
    pointer of each element in the unsorted sequence to this set. Voila,
    you now have a view on your original objects, where the view is in
    sorted order, all with a minimum of fuss and CPU/memory usage. Unless
    a profiler tells you differently, or you know up front some very
    specific facts (you would know what those facts are if you knew them),
    then don't worry about it and do it this way.

    The answer may change if you have C++11. I'm currently too ignorant of
    such things to comment.

    If your question is more of a architecture question, then you haven't
    given me enough information to work through how I'd do it.
    Joshua Maurice, Nov 17, 2011
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Dennis
    Replies:
    1
    Views:
    2,580
    Dennis
    Jun 7, 2004
  2. CD
    Replies:
    2
    Views:
    805
    Victor Bazarov
    Oct 5, 2004
  3. Replies:
    2
    Views:
    414
  4. Replies:
    8
    Views:
    1,914
    Csaba
    Feb 18, 2006
  5. Luca Risolia

    STL map to STL vector

    Luca Risolia, Jan 13, 2014, in forum: C++
    Replies:
    32
    Views:
    362
    Seungbeom Kim
    Jan 18, 2014
Loading...

Share This Page