vector::push_back performance

Discussion in 'C++' started by Antonios Christofides, Sep 28, 2004.

  1. Hi,

    As I read in the archives, the performance problem caused by memory
    reallocations during vector::push_back is a common one. My first C++
    program is suffering from it: 300 thousand push_backs result,
    according to the profiler, in 20 reallocations; these 20 reallocations
    account for 3.6 seconds (Celeron 1.3G), which is 40% of total
    execution time.

    What I don't understand: why is the reallocation code so complex? I
    studied the library source and I have a hard time understanding it,
    but it seems to be copying the vector item by item in each
    reallocation. Why wouldn't a "realloc" suffice?

    And, given that I don't know the vector size beforehand, is there
    anything else I can do other than trying deqeue or a guessed
    vector::reserve?

    In case it matters, I'm using gcc 3.3 with its standard c++ library on
    a Debian sarge, but portability is also an issue.

    Thanks!
     
    Antonios Christofides, Sep 28, 2004
    #1
    1. Advertising

  2. Antonios Christofides

    Rolf Magnus Guest

    Antonios Christofides wrote:

    > Hi,
    >
    > As I read in the archives, the performance problem caused by memory
    > reallocations during vector::push_back is a common one. My first C++
    > program is suffering from it: 300 thousand push_backs result,
    > according to the profiler, in 20 reallocations; these 20 reallocations
    > account for 3.6 seconds (Celeron 1.3G), which is 40% of total
    > execution time.
    >
    > What I don't understand: why is the reallocation code so complex? I
    > studied the library source and I have a hard time understanding it,
    > but it seems to be copying the vector item by item in each
    > reallocation. Why wouldn't a "realloc" suffice?


    That's what a "realloc" does, too. You usually can't easily make an already
    allocated memory block bigger (what would you do with data after it?), so a
    new block must be allocated and the data be copied over to it, then the old
    one destroyed.

    > And, given that I don't know the vector size beforehand, is there
    > anything else I can do other than trying deqeue or a guessed
    > vector::reserve?


    Not much.

    > In case it matters, I'm using gcc 3.3 with its standard c++ library on
    > a Debian sarge, but portability is also an issue.
     
    Rolf Magnus, Sep 28, 2004
    #2
    1. Advertising

  3. Antonios Christofides wrote:
    > As I read in the archives, the performance problem caused by memory
    > reallocations during vector::push_back is a common one. My first C++
    > program is suffering from it: 300 thousand push_backs result,
    > according to the profiler, in 20 reallocations; these 20 reallocations
    > account for 3.6 seconds (Celeron 1.3G), which is 40% of total
    > execution time.


    Three hundred thousand push_backs into a vector without reserve? Seems
    unjustified.

    > What I don't understand: why is the reallocation code so complex? I
    > studied the library source and I have a hard time understanding it,
    > but it seems to be copying the vector item by item in each
    > reallocation. Why wouldn't a "realloc" suffice?


    What would 'realloc' do? Place the objects each at a different memory
    location without letting the object know? That's not right. Objects
    may need to know where they have been constructed. They might want to
    let other classes or objects know of their location, etc.

    > And, given that I don't know the vector size beforehand, is there
    > anything else I can do other than trying deqeue or a guessed
    > vector::reserve?


    Nope. Using standard containers requires acknowledging the trade-offs.
    If you need fast push_back and you don't know the size, you should
    probably use 'std::list' or 'std::deque'. If you need random access
    afterwards, don't push-back without reserving. I bet any decent book
    on Standard containers talks about how to pick the container well-suited
    for your task. Of course, that assumes that you know what your task is.

    > In case it matters, I'm using gcc 3.3 with its standard c++ library on
    > a Debian sarge, but portability is also an issue.


    It doesn't matter, at least not here.

    V
     
    Victor Bazarov, Sep 28, 2004
    #3
  4. Antonios Christofides

    Phlip Guest

    Antonios Christofides wrote:

    > As I read in the archives, the performance problem caused by memory
    > reallocations during vector::push_back is a common one. My first C++
    > program is suffering from it: 300 thousand push_backs result,
    > according to the profiler, in 20 reallocations; these 20 reallocations
    > account for 3.6 seconds (Celeron 1.3G), which is 40% of total
    > execution time.


    Do you think you could go to an algorithm where you push less back?

    > What I don't understand: why is the reallocation code so complex? I
    > studied the library source and I have a hard time understanding it,
    > but it seems to be copying the vector item by item in each
    > reallocation. Why wouldn't a "realloc" suffice?


    Read Herb Sutter's way-cool GOTW series, and his books /Exceptional C++/. He
    impugns the container class of choice should usually be std::deque<>, not
    std::vector<>. It frags not memory like std::list<>, and it's optimal to
    push things to both the beginning and end.

    --
    Phlip
    http://industrialxp.org/community/bin/view/Main/TestFirstUserInterfaces
     
    Phlip, Sep 29, 2004
    #4
  5. "Antonios Christofides" <> wrote in message
    news:dc5.4159da31.d0edf@voltaire...

    > As I read in the archives, the performance problem caused by memory
    > reallocations during vector::push_back is a common one. My first C++
    > program is suffering from it: 300 thousand push_backs result,
    > according to the profiler, in 20 reallocations; these 20 reallocations
    > account for 3.6 seconds (Celeron 1.3G), which is 40% of total
    > execution time.


    I'm skeptical.

    Here's a little program:

    #include <vector>

    int main()
    {
    std::vector<int> v;
    for (std::vector<int>::size_type i = 0; i != 1000000; ++i)
    v.push_back(i);
    return 0;
    }

    When I run this program on my machine (admittedly faster than 1.3G, but no
    more than twice as fast), it runs in three *hundredths* of a second. And it
    calls push_back a million times, not 300,000 times.

    This behavior suggests to me that your vector must contain objects of a
    class that is much more expensive to copy than int.

    So I think we need to see more information about your program in order to
    understand the source of the performance problem.
     
    Andrew Koenig, Sep 29, 2004
    #5
  6. Antonios Christofides

    Method Man Guest

    > That's what a "realloc" does, too. You usually can't easily make an
    already
    > allocated memory block bigger (what would you do with data after it?), so

    a
    > new block must be allocated and the data be copied over to it, then the

    old
    > one destroyed.
    >


    All the texts I have read state that, when a dynamic array needs to be
    extended, it is better/faster to 'realloc' instead of creating a new array
    and copying the elements manually.

    So why is 'realloc' more efficient?
     
    Method Man, Sep 29, 2004
    #6
  7. "Method Man" <> wrote...
    >> That's what a "realloc" does, too. You usually can't easily make an

    > already
    >> allocated memory block bigger (what would you do with data after it?), so

    > a
    >> new block must be allocated and the data be copied over to it, then the

    > old
    >> one destroyed.
    >>

    >
    > All the texts I have read state that, when a dynamic array needs to be
    > extended, it is better/faster to 'realloc' instead of creating a new array
    > and copying the elements manually.


    You've been reading too many C texts, haven't you?

    > So why is 'realloc' more efficient?


    I am not sure how to answer this question. Imagine you have a tree which
    has grown too far up and needs trimming. You decide to trim it a foot off
    the ground because it's too inefficient to climb up and trim every branch
    that could use a trimming. You lose the tree. Why is cutting it down more
    efficient than doing it right? There is no answer. Another analogy: a TV
    set and a need to deliver it from the 7th floor to the truck parked outside.
    It can be done on an elevator, it could be done on the stairs. Or, somebody
    might decide that it's more efficient to lower it down through the window.
    Without ropes. Hey, a couple of seconds and it's down on the ground, no?
    Why is throwing it down more efficient than using the elevator (or stairs)?

    For POD you can do realloc. For non-POD classes (general case) realloc will
    simply not work. Efficient or not.

    V
     
    Victor Bazarov, Sep 29, 2004
    #7
  8. Antonios Christofides

    Siemel Naran Guest

    "Method Man" <> wrote in message news:eek:Jq6d.4788

    > All the texts I have read state that, when a dynamic array needs to be
    > extended, it is better/faster to 'realloc' instead of creating a new array
    > and copying the elements manually.
    >
    > So why is 'realloc' more efficient?


    Because if more space exists at the end of the existing array to yield a
    larger array, then realloc will just claim that space. Say you do
    malloc(512u) and the program reserves bytes 0x101 to 0x300 for your array.
    Say bytes 0x301 to 0x0500 are free for anyone to use. If no other objects
    request this space and you realloc the array to 1024u bytes, then the system
    may just mark bytes 0x0101 to 0x0500 as in use by your array (so no-one else
    can claim it).

    There's no garuantee that space is available, but if it is, we save copying
    lots of bytes.
     
    Siemel Naran, Sep 29, 2004
    #8
  9. Antonios Christofides

    Siemel Naran Guest

    "Antonios Christofides" <> wrote in message

    > What I don't understand: why is the reallocation code so complex? I
    > studied the library source and I have a hard time understanding it,
    > but it seems to be copying the vector item by item in each
    > reallocation. Why wouldn't a "realloc" suffice?


    For user types, especially those managing dynamic memory like std::string or
    std::deque, we have to call the overloaded copy constructor or operator= to
    the copy.

    > And, given that I don't know the vector size beforehand, is there
    > anything else I can do other than trying deqeue or a guessed
    > vector::reserve?


    What about std::list? What is wrong with std::deque?

    If you have a large object, it might be a good idea to make it reference
    counted. You can use boost::shared_ptr or the like.
     
    Siemel Naran, Sep 29, 2004
    #9
  10. Antonios Christofides

    Method Man Guest

    "Victor Bazarov" <> wrote in message
    news:tEr6d.172009$3l3.160912@attbi_s03...
    > "Method Man" <> wrote...
    > >> That's what a "realloc" does, too. You usually can't easily make an

    > > already
    > >> allocated memory block bigger (what would you do with data after it?),

    so
    > > a
    > >> new block must be allocated and the data be copied over to it, then the

    > > old
    > >> one destroyed.
    > >>

    > >
    > > All the texts I have read state that, when a dynamic array needs to be
    > > extended, it is better/faster to 'realloc' instead of creating a new

    array
    > > and copying the elements manually.

    >
    > You've been reading too many C texts, haven't you?
    >


    Well yea.

    I was talking about arrays of POD types, but I didn't make that clear in my
    post. Of course non-POD types can have constructors and overloaded
    assignment operators, so my question wouldn't make sense in that case.

    > > So why is 'realloc' more efficient?

    >
    > I am not sure how to answer this question. Imagine you have a tree which
    > has grown too far up and needs trimming. You decide to trim it a foot off
    > the ground because it's too inefficient to climb up and trim every branch
    > that could use a trimming. You lose the tree. Why is cutting it down

    more
    > efficient than doing it right? There is no answer. Another analogy: a TV
    > set and a need to deliver it from the 7th floor to the truck parked

    outside.
    > It can be done on an elevator, it could be done on the stairs. Or,

    somebody
    > might decide that it's more efficient to lower it down through the window.
    > Without ropes. Hey, a couple of seconds and it's down on the ground, no?
    > Why is throwing it down more efficient than using the elevator (or

    stairs)?
    >
    > For POD you can do realloc. For non-POD classes (general case) realloc

    will
    > simply not work. Efficient or not.
    >


    Your analogies didn't really help in my understanding of realloc. I was
    looking for something like -- 'realloc' is never/sometimes/always more
    efficient than malloc'ing a new array and manually copying from the old
    array (of PODs). Then justify the choice.
     
    Method Man, Sep 29, 2004
    #10
  11. > > All the texts I have read state that, when a dynamic array needs
    > > to be extended, it is better/faster to 'realloc' instead of
    > > creating a new array and copying the elements manually.
    > >
    > > So why is 'realloc' more efficient?


    Except for what was already said, I believe that even in the cases
    when realloc needs to allocate a new memory block rather than extend
    the existing one, it uses memmove or some similar operation which will
    be faster than manually copying the elements if its implementation
    uses specialized mass-byte-copying CPU instructions.
     
    Antonios Christofides, Sep 29, 2004
    #11
  12. Antonios Christofides

    Magnus Guest

    "Andrew Koenig" <> skrev i melding
    news:dop6d.644518$...
    >
    > I'm skeptical.
    >
    > Here's a little program:
    >
    > #include <vector>
    >
    > int main()
    > {
    > std::vector<int> v;
    > for (std::vector<int>::size_type i = 0; i != 1000000; ++i)
    > v.push_back(i);
    > return 0;
    > }
    >
    > When I run this program on my machine (admittedly faster than 1.3G, but no
    > more than twice as fast), it runs in three *hundredths* of a second. And

    it
    > calls push_back a million times, not 300,000 times.
    >
    > This behavior suggests to me that your vector must contain objects of a
    > class that is much more expensive to copy than int.
    >
    > So I think we need to see more information about your program in order to
    > understand the source of the performance problem.
    >
    >


    How do you time your execution time? Is there _one_ way to time this, or
    might your method differ from the OP method?

    - Magnus
     
    Magnus, Sep 29, 2004
    #12
  13. Siemel Naran wrote:
    > For user types, especially those managing dynamic memory like
    > std::string or std::deque, we have to call the overloaded copy
    > constructor or operator= to the copy.


    Thank you for your responses. The contained type is indeed a class
    that contains, among other things, a string and another user-class
    object. I'll go back to Stroustrup to re-read about shallow and deep
    copies and copy constructors and so on, and I'll come back either to
    summarize or to ask more questions (the latter seems more likely :)
     
    Antonios Christofides, Sep 29, 2004
    #13
  14. Antonios Christofides

    Rolf Magnus Guest

    Method Man wrote:

    >> That's what a "realloc" does, too. You usually can't easily make an
    >> already allocated memory block bigger (what would you do with data after
    >> it?), so a new block must be allocated and the data be copied over to it,
    >> then the old one destroyed.

    >
    > All the texts I have read state that, when a dynamic array needs to be
    > extended, it is better/faster to 'realloc' instead of creating a new array
    > and copying the elements manually.
    >
    > So why is 'realloc' more efficient?


    It might just behave like push_back() and allocate more memory than
    requested so that the next realloc doesn't need to copy the data to a new
    block.
     
    Rolf Magnus, Sep 29, 2004
    #14
  15. Antonios Christofides

    Rolf Magnus Guest

    Victor Bazarov wrote:

    > Antonios Christofides wrote:
    >> As I read in the archives, the performance problem caused by memory
    >> reallocations during vector::push_back is a common one. My first C++
    >> program is suffering from it: 300 thousand push_backs result,
    >> according to the profiler, in 20 reallocations; these 20 reallocations
    >> account for 3.6 seconds (Celeron 1.3G), which is 40% of total
    >> execution time.

    >
    > Three hundred thousand push_backs into a vector without reserve? Seems
    > unjustified.


    From all we know, the number of elements could be between 1 and 30 Million.
    After all, the OP said he doesn't know the number of elements beforehand.
    So how much would you reserve?
     
    Rolf Magnus, Sep 29, 2004
    #15
  16. Method Man wrote:
    > [...]
    > Your analogies didn't really help in my understanding of realloc.


    They were intended to show the pitfalls of using 'realloc' with generic
    types. Since now we're talking specifically POD, they are moot.

    > I was
    > looking for something like -- 'realloc' is never/sometimes/always more
    > efficient than malloc'ing a new array and manually copying from the old
    > array (of PODs). Then justify the choice.


    'realloc' works in an implementation-defined way. There is always some
    possibility that a memory block allocated for an array can be simply
    extended without the need to copy. There is always some possibility
    that mere calling malloc and then some kind of copying (even memcpy)
    can be less efficient when it's done from within your code than if it
    is done in the library written (and optimised) specifically for your
    hardware. So, in general 'realloc' will _always_ be at least as fast
    as you can emulate it with your own 'malloc' and 'memcpy'.

    More on C standard library - in comp.lang.c.

    Victor
     
    Victor Bazarov, Sep 29, 2004
    #16
  17. Antonios Christofides

    Jeff Flinn Guest

    "Victor Bazarov" <> wrote in message
    news:guy6d.3759$09.us.to.verio.net...
    > Method Man wrote:
    > > [...]
    > > Your analogies didn't really help in my understanding of realloc.

    >
    > They were intended to show the pitfalls of using 'realloc' with generic
    > types. Since now we're talking specifically POD, they are moot.


    Aah, but elsewhere in this thread to OP informs us that in fact he is
    dealing with a user defined class. so your analogies were very apt.

    Jeff F
     
    Jeff Flinn, Sep 29, 2004
    #17
  18. Method Man wrote:

    > All the texts I have read state that, when a dynamic array needs to be
    > extended, it is better/faster to 'realloc' instead of creating a new array
    > and copying the elements manually.
    >
    > So why is 'realloc' more efficient?



    Regarding vector, nowhere is required that all elements are copied in a
    new location after some vector::push_back(), vector::resize() or some
    other modifier. In all these case, an operation similar to realloc() is
    assumed.


    However, as with realloc(), objects may be moved.



    --
    Ioannis Vranos

    http://www23.brinkster.com/noicys
     
    Ioannis Vranos, Sep 29, 2004
    #18
  19. Method Man wrote:

    > Your analogies didn't really help in my understanding of realloc. I was
    > looking for something like -- 'realloc' is never/sometimes/always more
    > efficient than malloc'ing a new array and manually copying from the old
    > array (of PODs). Then justify the choice.



    As Victor said in a follow-up message,

    "So, in general 'realloc' will _always_ be at least as fast as you can
    emulate it with your own 'malloc' and 'memcpy'."


    However your messages are not comprehensible. I haven't understood
    exactly what you want to learn since the beginning of this thread.



    --
    Ioannis Vranos

    http://www23.brinkster.com/noicys
     
    Ioannis Vranos, Sep 29, 2004
    #19
  20. > How do you time your execution time? Is there _one_ way to time this, or
    > might your method differ from the OP method?


    A factor of 100 difference? Hardly likely.

    Try the program yourself and see.

    On my machine it runs so fast that I don't even have time to get my finger
    off the enter button before it finishes.
     
    Andrew Koenig, Sep 30, 2004
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ganesh Gella
    Replies:
    5
    Views:
    1,249
    John Harrison
    Jul 14, 2003
  2. Hitesh Bhatiya
    Replies:
    4
    Views:
    4,565
    John Harrison
    Aug 13, 2003
  3. Avi Bercovich
    Replies:
    5
    Views:
    23,703
    Evan Carew
    Jan 14, 2004
  4. Alex Vinokur
    Replies:
    9
    Views:
    535
    Alex Vinokur
    Apr 11, 2004
  5. Replies:
    8
    Views:
    1,983
    Csaba
    Feb 18, 2006
Loading...

Share This Page