Is it better idea to do resize of vector before calling push_back?

Discussion in 'C++' started by Tarun, Jan 16, 2007.

  1. Tarun

    Tarun Guest

    Hello,

    I am facing problem sometimes while I am trying to do push_back on a
    vector. Currently I am doing resize of the vector increasing the size
    by one and then push_back and seems like the code is working fine.

    Is it a better idea to do resize befoire calling push_back?

    Regards,
    Tarun
    Tarun, Jan 16, 2007
    #1
    1. Advertising

  2. Tarun

    peter koch Guest

    Tarun skrev:
    > Hello,
    >
    > I am facing problem sometimes while I am trying to do push_back on a
    > vector. Currently I am doing resize of the vector increasing the size
    > by one and then push_back and seems like the code is working fine.
    >
    > Is it a better idea to do resize befoire calling push_back?
    >


    No reason to do that. Use resize if you are going to push_back many
    elements and you know how many elements (approximately) you will add.

    /Peter
    peter koch, Jan 16, 2007
    #2
    1. Advertising

  3. Tarun

    Daniel T. Guest

    "peter koch" <> wrote:
    > Tarun skrev:


    > > I am facing problem sometimes while I am trying to do push_back on a
    > > vector. Currently I am doing resize of the vector increasing the size
    > > by one and then push_back and seems like the code is working fine.
    > >
    > > Is it a better idea to do resize befoire calling push_back?

    >
    > No reason to do that. Use resize if you are going to push_back many
    > elements and you know how many elements (approximately) you will add.


    I'm going to assume you guys are talking about reserve and not resize.
    Otherwise I agree with peter. Only use reserve when profiling has shown
    there is a bottleneck.
    Daniel T., Jan 16, 2007
    #3
  4. Daniel T. wrote:
    > "peter koch" <> wrote:
    >> Tarun skrev:

    >
    >>> I am facing problem sometimes while I am trying to do push_back on a
    >>> vector. Currently I am doing resize of the vector increasing the
    >>> size by one and then push_back and seems like the code is working
    >>> fine.
    >>>
    >>> Is it a better idea to do resize befoire calling push_back?

    >>
    >> No reason to do that. Use resize if you are going to push_back many
    >> elements and you know how many elements (approximately) you will add.

    >
    > I'm going to assume you guys are talking about reserve and not resize.
    > Otherwise I agree with peter. Only use reserve when profiling has
    > shown there is a bottleneck.


    Performance of the insertions is not necessarily the trouble when using
    'push_back'. Frequent reallocations from smaller size to larger can
    cause heap fragmentation which may impede the performance later and not
    even show up on profiling radars. Use 'reserve' any time you know (for
    sure) what the size will be after all insertions. And if that's the
    case (i.e. you know what the size is), then why not just resize and use
    the assignment instead?

    V
    --
    Please remove capital 'A's when replying by e-mail
    I do not respond to top-posted replies, please don't ask
    Victor Bazarov, Jan 16, 2007
    #4
  5. Tarun

    peter koch Guest

    Daniel T. skrev:
    > "peter koch" <> wrote:
    > > Tarun skrev:

    >
    > > > I am facing problem sometimes while I am trying to do push_back on a
    > > > vector. Currently I am doing resize of the vector increasing the size
    > > > by one and then push_back and seems like the code is working fine.
    > > >
    > > > Is it a better idea to do resize befoire calling push_back?

    > >
    > > No reason to do that. Use resize if you are going to push_back many
    > > elements and you know how many elements (approximately) you will add.

    >
    > I'm going to assume you guys are talking about reserve and not resize.
    > Otherwise I agree with peter. Only use reserve when profiling has shown
    > there is a bottleneck.


    I certainly had reserve in mind. resize is an entirely different beast.

    /Peter
    peter koch, Jan 16, 2007
    #5
  6. On Tue, 16 Jan 2007 09:41:58 -0500, "Victor Bazarov" wrote:
    >Performance of the insertions is not necessarily the trouble when using
    >'push_back'.


    not the only trouble :)

    >Frequent reallocations from smaller size to larger can
    >cause heap fragmentation which may impede the performance later and not
    >even show up on profiling radars.


    To bolster your argument I wrote a small test application to check how
    (amazingly) often reallocation happens:

    #include <iostream>
    #include <vector>
    using namespace std;

    int main() {
    vector<int> v;
    v.push_back (0);
    int* p = &v[0];

    for (int i = 1; i < 100; ++i) {
    v.push_back (i);
    if (&v[0] != p) {
    cout << "reallocated at: " << i << endl;
    p = &v[0];
    }
    }
    }

    >Use 'reserve' any time you know (for
    >sure) what the size will be after all insertions.


    agreed.

    >And if that's the
    >case (i.e. you know what the size is), then why not just resize and use
    >the assignment instead?


    because the elements are expensive to create and/or assign? E.g.
    std::string.

    Best regards,
    Roland Pibinger
    Roland Pibinger, Jan 16, 2007
    #6
  7. Tarun

    Bo Persson Guest

    Roland Pibinger wrote:
    > On Tue, 16 Jan 2007 09:41:58 -0500, "Victor Bazarov" wrote:
    >> Performance of the insertions is not necessarily the trouble when
    >> using 'push_back'.

    >
    > not the only trouble :)
    >
    >> Frequent reallocations from smaller size to larger can
    >> cause heap fragmentation which may impede the performance later
    >> and not even show up on profiling radars.

    >
    > To bolster your argument I wrote a small test application to check
    > how (amazingly) often reallocation happens:
    >
    > #include <iostream>
    > #include <vector>
    > using namespace std;
    >
    > int main() {
    > vector<int> v;
    > v.push_back (0);
    > int* p = &v[0];
    >
    > for (int i = 1; i < 100; ++i) {
    > v.push_back (i);
    > if (&v[0] != p) {
    > cout << "reallocated at: " << i << endl;
    > p = &v[0];
    > }
    > }
    > }
    >


    And what did your implementation show?

    On my system I ran it for a million iterations instead:
    reallocated at: 1024
    reallocated at: 1536
    reallocated at: 2304
    reallocated at: 3456
    reallocated at: 5184
    reallocated at: 7776
    reallocated at: 11664
    reallocated at: 17496
    reallocated at: 26244
    reallocated at: 39366
    reallocated at: 59049
    reallocated at: 88573
    reallocated at: 132859
    reallocated at: 199288
    reallocated at: 298932
    reallocated at: 448398
    reallocated at: 672597

    >> Use 'reserve' any time you know (for
    >> sure) what the size will be after all insertions.

    >
    > agreed.


    Reserve helps even if you only have a good guess at the final size. The
    closer guess the better, of course, but any hint helps.

    With reserve(500000) I get:
    reallocated at: 500000
    reallocated at: 750000

    Not too bad!


    Bo Persson
    Bo Persson, Jan 16, 2007
    #7
  8. Tarun

    Philipp Reh Guest

    Re: Is it better idea to do resize of vector before callingpush_back?

    On Tue, 16 Jan 2007 09:41:58 -0500, Victor Bazarov wrote:

    > Daniel T. wrote:

    [snip]
    > And if that's the
    > case (i.e. you know what the size is), then why not just resize and use
    > the assignment instead?
    >
    > V


    Because resize has to construct all new elements to a default value.
    Philipp Reh, Jan 16, 2007
    #8
  9. On Tue, 16 Jan 2007 18:04:59 +0100, "Bo Persson" wrote:
    >Roland Pibinger wrote:
    >> #include <iostream>
    >> #include <vector>
    >> using namespace std;
    >>
    >> int main() {
    >> vector<int> v;
    >> v.push_back (0);
    >> int* p = &v[0];
    >>
    >> for (int i = 1; i < 100; ++i) {
    >> v.push_back (i);
    >> if (&v[0] != p) {
    >> cout << "reallocated at: " << i << endl;
    >> p = &v[0];
    >> }
    >> }
    >> }
    >>

    >
    >And what did your implementation show?
    >
    >On my system I ran it for a million iterations instead:
    >reallocated at: 1024
    >reallocated at: 1536
    >reallocated at: 2304
    >reallocated at: 3456
    >reallocated at: 5184
    >reallocated at: 7776
    >reallocated at: 11664
    >reallocated at: 17496
    >reallocated at: 26244
    >reallocated at: 39366
    >reallocated at: 59049
    >reallocated at: 88573
    >reallocated at: 132859
    >reallocated at: 199288
    >reallocated at: 298932
    >reallocated at: 448398
    >reallocated at: 672597


    The 'reallocation pattern' seems to be obvious. BTW, g++ 3.4 uses n*2.
    Sum the numbers and you get the amount of unnecessary copies.

    >>> Use 'reserve' any time you know (for
    >>> sure) what the size will be after all insertions.

    >>
    >> agreed.

    >
    >Reserve helps even if you only have a good guess at the final size. The
    >closer guess the better, of course, but any hint helps.
    >
    >With reserve(500000) I get:
    >reallocated at: 500000
    >reallocated at: 750000
    >
    >Not too bad!


    Hmm, 500000 + 750000 unnecessary copies for 1000000 entries! Better
    significantly overstimate the reserved space.

    Best wishes,
    Roland Pibinger
    Roland Pibinger, Jan 16, 2007
    #9
  10. Philipp Reh wrote:
    > On Tue, 16 Jan 2007 09:41:58 -0500, Victor Bazarov wrote:
    >
    >> Daniel T. wrote:

    > [snip]
    >> And if that's the
    >> case (i.e. you know what the size is), then why not just resize and
    >> use the assignment instead?
    >>
    >> V

    >
    > Because resize has to construct all new elements to a default value.


    So? 'push_back' has to copy-construct them. If you know for sure that
    the combination of default-construction with assignment is slower than
    copy-construction, then you still have to decide what's more important
    to you, speed *now* or less fragmented heap *later*. Keep in mind that
    even if your heap manager defragments the heap with deallocations,
    defragmenting the heap also requires time.

    I often found that for simple element types, like char or double, doing

    vector<double> blah(somany);
    for (int i = 0; i < somany; ++i)
    blah = ...

    is *better* than

    vector<double> blah;
    for (int i = 0; i < somany; ++i)
    blah(push_back(...

    [I already explained the reasons] Of course, it's all empirical, YMMV.

    V
    --
    Please remove capital 'A's when replying by e-mail
    I do not respond to top-posted replies, please don't ask
    Victor Bazarov, Jan 16, 2007
    #10
  11. Tarun

    Joe Gottman Guest

    Victor Bazarov wrote:
    > Performance of the insertions is not necessarily the trouble when using
    > 'push_back'. Frequent reallocations from smaller size to larger can
    > cause heap fragmentation which may impede the performance later and not
    > even show up on profiling radars. Use 'reserve' any time you know (for
    > sure) what the size will be after all insertions. And if that's the
    > case (i.e. you know what the size is), then why not just resize and use
    > the assignment instead?



    Two reasons to use reserve() and push_back() rather than resize()
    and assignment:

    1) reserve() and push_back() requires only using one copy constructor,
    while resize() and assignment requires using a default constructor
    followed by an assignment operator. The first is often more efficient.

    2) reserve() followed by push_back() is useful when you don't know the
    exact size of your vector but you have a definite upper bound on its size.

    Joe Gottman
    Joe Gottman, Jan 17, 2007
    #11
  12. Tarun

    Mirek Fidler Guest

    Roland Pibinger wrote:
    > On Tue, 16 Jan 2007 09:41:58 -0500, "Victor Bazarov" wrote:
    > >Performance of the insertions is not necessarily the trouble when using
    > >'push_back'.

    >
    > not the only trouble :)
    >
    > >Frequent reallocations from smaller size to larger can
    > >cause heap fragmentation which may impede the performance later and not
    > >even show up on profiling radars.

    >
    > To bolster your argument I wrote a small test application to check how
    > (amazingly) often reallocation happens:


    You need to see things in context. In reality, if vector has 100%
    growth factor, *average* number of copies per element is 2 - first is
    in push_back, second is due to reallocation.

    In fact, you are likely to spend a little bit more time in push_back
    (because batch operations are always faster). Especially if you are
    using Upp::Vector instead of std::vector (and reallocations are
    therefore performed by memmove).

    That is why using reserve has so little impact on performance (widely
    known fact).

    Mirek
    Mirek Fidler, Jan 17, 2007
    #12
  13. Tarun

    Mirek Fidler Guest

    Victor Bazarov wrote:
    > Philipp Reh wrote:
    > > On Tue, 16 Jan 2007 09:41:58 -0500, Victor Bazarov wrote:
    > >
    > >> Daniel T. wrote:

    > > [snip]
    > >> And if that's the
    > >> case (i.e. you know what the size is), then why not just resize and
    > >> use the assignment instead?
    > >>
    > >> V

    > >
    > > Because resize has to construct all new elements to a default value.

    >
    > So? 'push_back' has to copy-construct them. If you know for sure that
    > the combination of default-construction with assignment is slower than
    > copy-construction, then you still have to decide what's more important
    > to you, speed *now* or less fragmented heap *later*.


    If you know that, just use reserve - same results for heap
    framentation, default construction avoided.

    Mirek
    Mirek Fidler, Jan 17, 2007
    #13
  14. On 17 Jan 2007 03:44:24 -0800, "Mirek Fidler" wrote:
    >You need to see things in context. In reality, if vector has 100%
    >growth factor, *average* number of copies per element is 2 - first is
    >in push_back, second is due to reallocation.


    The context here is to compare push_back performance with and without
    calling reserve before. You have 1 copy per element with reserve and 3
    without (e.g. for Dinkumware with growth factor 1,5).

    >That is why using reserve has so little impact on performance (widely
    >known fact).


    That depends of course on the number and 'weight' of the vector
    elements.

    Best regards,
    Roland Pibinger
    Roland Pibinger, Jan 17, 2007
    #14
  15. Tarun

    Tarun Guest

    Thanks a lot friends. I am working on it with all of yours suggestion
    in mind. Will getback to you people again.

    Regards,
    Tarun.

    Roland Pibinger wrote:
    > On 17 Jan 2007 03:44:24 -0800, "Mirek Fidler" wrote:
    > >You need to see things in context. In reality, if vector has 100%
    > >growth factor, *average* number of copies per element is 2 - first is
    > >in push_back, second is due to reallocation.

    >
    > The context here is to compare push_back performance with and without
    > calling reserve before. You have 1 copy per element with reserve and 3
    > without (e.g. for Dinkumware with growth factor 1,5).
    >
    > >That is why using reserve has so little impact on performance (widely
    > >known fact).

    >
    > That depends of course on the number and 'weight' of the vector
    > elements.
    >
    > Best regards,
    > Roland Pibinger
    Tarun, Jan 18, 2007
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ganesh Gella
    Replies:
    5
    Views:
    1,175
    John Harrison
    Jul 14, 2003
  2. Hitesh Bhatiya
    Replies:
    4
    Views:
    4,524
    John Harrison
    Aug 13, 2003
  3. Avi Bercovich
    Replies:
    5
    Views:
    23,622
    Evan Carew
    Jan 14, 2004
  4. Alex Vinokur
    Replies:
    9
    Views:
    525
    Alex Vinokur
    Apr 11, 2004
  5. Replies:
    8
    Views:
    1,914
    Csaba
    Feb 18, 2006
Loading...

Share This Page