reserve of vector

Discussion in 'C++' started by ab2305@gmail.com, Oct 14, 2008.

  1. Guest

    Does standard mandates that the reserve call should initialize a
    container's values to its defaults, hence vector<int> intVec;
    intVec.reserve(100) would make intVec[0]=intVec[1]....intVec[99]=0 ?

    Thanks
    , Oct 14, 2008
    #1
    1. Advertising

  2. Kai-Uwe Bux Guest

    wrote:

    > Does standard mandates that the reserve call should initialize a
    > container's values to its defaults, hence vector<int> intVec;
    > intVec.reserve(100) would make intVec[0]=intVec[1]....intVec[99]=0 ?


    No. In fact, size() remains unchanged so that referencing those elements is
    undefined behavior.

    The resize() method, in this case, will grow the vector and initialize the
    missing elements.


    Best

    Kai-Uwe Bux
    Kai-Uwe Bux, Oct 14, 2008
    #2
    1. Advertising

  3. On Mon, 13 Oct 2008 23:08:12 -0400, Kai-Uwe Bux <>
    wrote:

    > wrote:
    >
    >> Does standard mandates that the reserve call should initialize a
    >> container's values to its defaults, hence vector<int> intVec;
    >> intVec.reserve(100) would make intVec[0]=intVec[1]....intVec[99]=0 ?

    >
    >No. In fact, size() remains unchanged so that referencing those elements is
    >undefined behavior.


    Just to expand on that, reserve is for managing memory. If you know
    that you're about to add thousands of items to a vector, using a
    reserve to pre-grow the memory allocation can make it more efficient,
    since the memory allocation only gets grown once - not potentially
    dozens or hundreds of times.

    IIRC sequentially adding n items to the end of a vector takes O(n^2)
    time. This is because the memory gets grown O(n) times - every k
    items, say, where k is a constant - and each growing can involve
    copying the whole vector which is on average O(n) again.

    If you can reserve the required space first, sequentially adding n
    items is O(n). There is only one memory reallocation, and only one
    copying - at most - so its the actual appending of values that
    determines the time taken, not repeated memory reallocation and
    copying.

    There's also a scheme where the reserved memory is doubled on each
    reallocation, and this also gives O(n), though it's still slower than
    full preallocation.
    Stephen Horne, Oct 14, 2008
    #3
  4. Guest

    On Oct 13, 9:44 pm, Stephen Horne <> wrote:

    > Just to expand on that, reserve is for managing memory. If you know
    > that you're about to add thousands of items to a vector, using a
    > reserve to pre-grow the memory allocation can make it more efficient,


    I've read that reserve() doesn't help much. (I remember reading from
    Bjarne Stroustrup himself that he stopped calling reserve() at some
    point; but cannot find any references for that.)

    > IIRC sequentially adding n items to the end of a vector takes O(n^2)
    > time. This is because the memory gets grown O(n) times - every k
    > items, say, where k is a constant - and each growing can involve
    > copying the whole vector which is on average O(n) again.


    The standard requires that push_back has amortized constant time; so
    the implementers cannot use the complexity that you describe.

    > There's also a scheme where the reserved memory is doubled on each
    > reallocation, and this also gives O(n), though it's still slower than
    > full preallocation.


    Nowadays the implementers use a growth factor of 50%, which reportedly
    works better with pool allocators.

    Ali
    , Oct 14, 2008
    #4
  5. Guest

    On Oct 13, 9:51 pm, wrote:

    > I've read that reserve() doesn't help much. (I remember reading from
    > Bjarne Stroustrup himself that he stopped calling reserve() at some
    > point; but cannot find any references for that.)


    I found it:

    http://www.research.att.com/~bs/bs_faq2.html#slow-containers

    There, Stroustrup says:

    <quote>
    People sometimes worry about the cost of std::vector growing
    incrementally. I used to worry about that and used reserve() to
    optimize the growth. After measuring my code and repeatedly having
    trouble finding the performance benefits of reserve() in real
    programs, I stopped using it except where it is needed to avoid
    iterator invalidation (a rare case in my code). Again: measure before
    you optimize.
    </quote>

    Ali
    , Oct 14, 2008
    #5
  6. Kai-Uwe Bux Guest

    Stephen Horne wrote:

    > On Mon, 13 Oct 2008 23:08:12 -0400, Kai-Uwe Bux <>
    > wrote:
    >
    >> wrote:
    >>
    >>> Does standard mandates that the reserve call should initialize a
    >>> container's values to its defaults, hence vector<int> intVec;
    >>> intVec.reserve(100) would make intVec[0]=intVec[1]....intVec[99]=0 ?

    >>
    >>No. In fact, size() remains unchanged so that referencing those elements
    >>is undefined behavior.

    >
    > Just to expand on that, reserve is for managing memory. If you know
    > that you're about to add thousands of items to a vector, using a
    > reserve to pre-grow the memory allocation can make it more efficient,
    > since the memory allocation only gets grown once - not potentially
    > dozens or hundreds of times.
    >
    > IIRC sequentially adding n items to the end of a vector takes O(n^2)
    > time. This is because the memory gets grown O(n) times - every k
    > items, say, where k is a constant - and each growing can involve
    > copying the whole vector which is on average O(n) again.


    No. Appending to a vector is amortized constant time. The vector only
    moves when size() and capacity() conincide. At those points, the capacity
    jumps by a certain factor and not by an additive constant (your k).

    BTW: this is not a quality of implementation issue. The standard makes this
    requirement in clause [23.1.1/12].


    > If you can reserve the required space first, sequentially adding n
    > items is O(n). There is only one memory reallocation, and only one
    > copying - at most - so its the actual appending of values that
    > determines the time taken, not repeated memory reallocation and
    > copying.
    >
    > There's also a scheme where the reserved memory is doubled on each
    > reallocation, and this also gives O(n), though it's still slower than
    > full preallocation.


    What matters is not the precise factor but that the constant is
    multiplicative rather than additive. I think, many implementations out
    there use the golden ratio or some other constant smaller than 2.

    Nonetheless, preallocation will beat naive sequential push_back()
    performance wise.


    Best

    Kai-Uwe Bux
    Kai-Uwe Bux, Oct 14, 2008
    #6
  7. On Mon, 13 Oct 2008 21:55:04 -0700 (PDT), wrote:

    >On Oct 13, 9:51 pm, wrote:
    >
    >> I've read that reserve() doesn't help much. (I remember reading from
    >> Bjarne Stroustrup himself that he stopped calling reserve() at some
    >> point; but cannot find any references for that.)

    >
    >I found it:


    I agree - I didn't say anyone should use it ;-)

    Personally, I don't have a single use of it in my current working copy
    - I just checked. Plenty of resizes, but no reserves. If I need a big
    enough collection that it's an issue, I normally end up using a
    different kind of container anyway, though not specifically to avoid
    this.

    I have my own vector ADT based on a multiway tree, though only because
    it was convenient given the map, set etc ADTs based on the same
    underlying code - they all support log n subscripting. Writing the one
    extra wrapper seemed prudent at the time, but I don't think I've ever
    used it. If I wanted a huge vector (perhaps a very large audio file) I
    might use this since it supports efficient inserts anywhere in the
    container - though I'd have to add some bulk ops first.

    For video, I wouldn't use it - I'd use a normal vector of pointers,
    either to frames or GOPs. An hours video is only in the order of
    100,000 frames anyway. And I might just use an vector of pointers to
    chunks even for the huge audio file case.

    The word "thousands" in my earlier post was out by several orders, I
    guess.

    I needed a 64 million item 3/4 GB array recently. But it was
    fixed-size, so I used a fixed size array, not a vector. If I had used
    a vector, I would probably have used reserve.

    Strictly I did use a vector with millions of items at one stage -
    underlying a priority queue that held up to approx. two million items.
    Copy overheads *were* an issue here, but not so much from memory
    allocation as from the items flowing through the queue. I solved it by
    using a more specialised queuing method, putting items into the
    location in the main array where they'd be needed (with some collision
    handling).

    But even that was for a programming challenge thing that I was just
    doing for fun.

    http://projecteuler.net/

    Problem 211 - it turns out I solved it the hard way, but what the
    hell.

    Wish I could justify more time for these problems. I've had a plan for
    problem 210 (but no code yet) for about a week. This time doing it the
    easy way ;-)
    Stephen Horne, Oct 14, 2008
    #7
  8. On Tue, 14 Oct 2008 01:23:56 -0400, Kai-Uwe Bux <>
    wrote:

    >No. Appending to a vector is amortized constant time. The vector only
    >moves when size() and capacity() conincide. At those points, the capacity
    >jumps by a certain factor and not by an additive constant (your k).


    That's good to know. Thanks.

    I could swear that there used to be fixed increment - even that the
    interface allowed the increment to be changed - but there you go.

    I didn't think they'd impose that because 50% worst-case memory
    overhead (often 75% when you allow for deletes), but its no bad thing.
    A memory overhead on that scale isn't usually a big deal these days.

    >What matters is not the precise factor but that the constant is
    >multiplicative rather than additive. I think, many implementations out
    >there use the golden ratio or some other constant smaller than 2.


    Reducing the worst-case memory overhead. Makes sense.
    Stephen Horne, Oct 14, 2008
    #8
  9. James Kanze Guest

    On Oct 14, 6:44 am, Stephen Horne <> wrote:
    > On Mon, 13 Oct 2008 23:08:12 -0400, Kai-Uwe Bux <>
    > wrote:


    > > wrote:


    > >> Does standard mandates that the reserve call should
    > >> initialize a container's values to its defaults, hence
    > >> vector<int> intVec; intVec.reserve(100) would make
    > >> intVec[0]=intVec[1]....intVec[99]=0 ?


    > >No. In fact, size() remains unchanged so that referencing
    > >those elements is undefined behavior.


    > Just to expand on that, reserve is for managing memory. If you
    > know that you're about to add thousands of items to a vector,
    > using a reserve to pre-grow the memory allocation can make it
    > more efficient, since the memory allocation only gets grown
    > once - not potentially dozens or hundreds of times.


    It's rarely necessary for optimization reasons (although it
    doesn't hurt). The main reason for using reserve() is to avoid
    invalidating iterators, pointers and references into the vector.
    (Appending to a vector will not invalidate iterators unless the
    capacity() increases.) Another reason, particularly with very
    large vectors, is to reduce total memory consumption.

    > IIRC sequentially adding n items to the end of a vector takes
    > O(n^2) time. This is because the memory gets grown O(n) times
    > - every k items, say, where k is a constant - and each growing
    > can involve copying the whole vector which is on average O(n)
    > again.


    The standard requires that appending to a vector be "amortized
    constant time". In practice, this means that memory growth must
    be exponential---older implementations generally doubled the
    size each time, but I think the general consensus today is that
    1.5 times is preferrable.

    > If you can reserve the required space first, sequentially adding n
    > items is O(n).


    Creating a vector of n objects using push_back is required by
    the standard to be O(n). All reserve() does is reduce the
    constant factor.

    > There is only one memory reallocation, and only one copying -
    > at most - so its the actual appending of values that
    > determines the time taken, not repeated memory reallocation
    > and copying.


    > There's also a scheme where the reserved memory is doubled on
    > each reallocation, and this also gives O(n), though it's still
    > slower than full preallocation.


    The real problem with it is memory consumption. You can easily
    end up consuming three times more memory than you are actually
    using.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Oct 14, 2008
    #9
  10. James Kanze Guest

    On Oct 14, 7:23 am, Kai-Uwe Bux <> wrote:
    > Stephen Horne wrote:


    [...]
    > What matters is not the precise factor but that the constant
    > is multiplicative rather than additive. I think, many
    > implementations out there use the golden ratio or some other
    > constant smaller than 2.


    I'd be surprised if they used exactly the golden ratio, but 1.5
    is an easy to calculate approximation.

    The motivation for this is simple: if you double each time, the
    sum of the memory you've previously freed is always less than
    the amount of memory you're requesting. A little bit of
    mathematical analysis shows that this is the case any time the
    multiplier is greater than the golden ration.

    > Nonetheless, preallocation will beat naive sequential
    > push_back() performance wise.


    Slightly.

    I have a couple of cases (maybe two in the last ten years) where
    I've used reserve(). Always to avoid invalidating iterators,
    however. (Most of the time, however, if the size of the vector
    is changing, I'll save the index, not an iterator.)

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Oct 14, 2008
    #10
  11. Kai-Uwe Bux Guest

    James Kanze wrote:

    > On Oct 14, 7:23 am, Kai-Uwe Bux <> wrote:
    >> Stephen Horne wrote:

    >
    > [...]
    >> What matters is not the precise factor but that the constant
    >> is multiplicative rather than additive. I think, many
    >> implementations out there use the golden ratio or some other
    >> constant smaller than 2.

    >
    > I'd be surprised if they used exactly the golden ratio, but 1.5
    > is an easy to calculate approximation.
    >
    > The motivation for this is simple: if you double each time, the
    > sum of the memory you've previously freed is always less than
    > the amount of memory you're requesting. A little bit of
    > mathematical analysis shows that this is the case any time the
    > multiplier is greater than the golden ration.


    An implementation could easily store the capacity and the previous capacity.
    The new capacity would simply the the sum of both. That will satisfy the
    motivation and converge to the golden ratio.

    As for the motivation, I think that might be good for some allocators but I
    find it hard to believe that it matters all that much. It is highly
    unlikely that memory gets reused by the same vector. More likely, a
    different vector is using that block. In any case, that strategy is easy to
    turn into a policy and one can play and measure whether it has any impact.


    [snip]


    Best

    Kai-Uwe Bux
    Kai-Uwe Bux, Oct 14, 2008
    #11
  12. wrote:
    > I found it:
    >
    > http://www.research.att.com/~bs/bs_faq2.html#slow-containers
    >
    > There, Stroustrup says:
    >
    > <quote>
    > People sometimes worry about the cost of std::vector growing
    > incrementally. I used to worry about that and used reserve() to
    > optimize the growth. After measuring my code and repeatedly having
    > trouble finding the performance benefits of reserve() in real
    > programs, I stopped using it except where it is needed to avoid
    > iterator invalidation (a rare case in my code). Again: measure before
    > you optimize.
    > </quote>


    I find it very odd that Stroustrup is only concerned about the *speed*
    of std::vector here.

    The major problem with std::vector is not its speed, but the memory
    fragmentation it causes. In the worst cases memory fragmentation can
    almost double the memory usage of the entire program (I do have actual
    personal experience of this). If you know the amount of elements you are
    going to push_back(), reserve() will eliminate the memory fragmentation
    caused by them.
    Juha Nieminen, Oct 14, 2008
    #12
  13. wrote:
    > Does standard mandates that the reserve call should initialize a
    > container's values to its defaults, hence vector<int> intVec;
    > intVec.reserve(100) would make intVec[0]=intVec[1]....intVec[99]=0 ?


    No, the idea with reserve() is precisely that it allocates memory for
    that many elements *without* initializing them (because initialization
    can be heavy and you don't necessarily want to do that).

    If you want the elements to be initialized (and accessible) use
    resize() rather than reserve().
    Juha Nieminen, Oct 14, 2008
    #13
  14. On 2008-10-14 14:20, Juha Nieminen wrote:
    > wrote:
    >> I found it:
    >>
    >> http://www.research.att.com/~bs/bs_faq2.html#slow-containers
    >>
    >> There, Stroustrup says:
    >>
    >> <quote>
    >> People sometimes worry about the cost of std::vector growing
    >> incrementally. I used to worry about that and used reserve() to
    >> optimize the growth. After measuring my code and repeatedly having
    >> trouble finding the performance benefits of reserve() in real
    >> programs, I stopped using it except where it is needed to avoid
    >> iterator invalidation (a rare case in my code). Again: measure before
    >> you optimize.
    >> </quote>

    >
    > I find it very odd that Stroustrup is only concerned about the *speed*
    > of std::vector here.
    >
    > The major problem with std::vector is not its speed, but the memory
    > fragmentation it causes. In the worst cases memory fragmentation can
    > almost double the memory usage of the entire program (I do have actual
    > personal experience of this). If you know the amount of elements you are
    > going to push_back(), reserve() will eliminate the memory fragmentation
    > caused by them.


    And then there is the over-allocation, depending on the growing strategy
    used I would guess that one would end up with somewhere around 25-50%
    over-allocation on average.

    --
    Erik Wikström
    Erik Wikström, Oct 14, 2008
    #14
  15. Kai-Uwe Bux Guest

    Erik Wikström wrote:

    > On 2008-10-14 14:20, Juha Nieminen wrote:
    >> wrote:
    >>> I found it:
    >>>
    >>> http://www.research.att.com/~bs/bs_faq2.html#slow-containers
    >>>
    >>> There, Stroustrup says:
    >>>
    >>> <quote>
    >>> People sometimes worry about the cost of std::vector growing
    >>> incrementally. I used to worry about that and used reserve() to
    >>> optimize the growth. After measuring my code and repeatedly having
    >>> trouble finding the performance benefits of reserve() in real
    >>> programs, I stopped using it except where it is needed to avoid
    >>> iterator invalidation (a rare case in my code). Again: measure before
    >>> you optimize.
    >>> </quote>

    >>
    >> I find it very odd that Stroustrup is only concerned about the *speed*
    >> of std::vector here.
    >>
    >> The major problem with std::vector is not its speed, but the memory
    >> fragmentation it causes. In the worst cases memory fragmentation can
    >> almost double the memory usage of the entire program (I do have actual
    >> personal experience of this). If you know the amount of elements you are
    >> going to push_back(), reserve() will eliminate the memory fragmentation
    >> caused by them.

    >
    > And then there is the over-allocation, depending on the growing strategy
    > used I would guess that one would end up with somewhere around 25-50%
    > over-allocation on average.


    For a reallocation strategy with growth factor d and a sequence of
    push_back() of random length, I get:

    expected memory usage=

    d - 1
    m = --------- (for d = 2, approx = 0.72)
    d ln(d)


    expected number of internal copy constructions (not counting the initial
    one)=

    1
    c = ------- (for d = 2, approx = 1.44)
    ln(d)



    Here is a little table:

    d 1.5 1.6 2.0
    m 0.822 0.798 0.721
    c 2.466 2.127 1.443


    Best

    Kai-Uwe Bux
    Kai-Uwe Bux, Oct 14, 2008
    #15
  16. On Tue, 14 Oct 2008 02:42:59 -0700 (PDT), James Kanze
    <> wrote:

    >I have a couple of cases (maybe two in the last ten years) where
    >I've used reserve(). Always to avoid invalidating iterators,
    >however. (Most of the time, however, if the size of the vector
    >is changing, I'll save the index, not an iterator.)


    As I understand it, that may avoid invalidating the iterator in
    practice, but not according to the standard.

    That is, I thought that any modification of a standard library
    container (other than a simple in-place overwrite of a value) is
    considered to invalidate all iterators that refer into that container.

    It's one of the reasons I use my own associative containers for a lot
    of high-level work. High level code shouldn't be worrying about these
    kinds of safety issues - it should just get on with the task at hand.
    A (very small) performance penalty is worth paying to have safe
    iterators that remain valid when the container is modified.

    Not necessarily true in low level code, of course.
    Stephen Horne, Oct 14, 2008
    #16
  17. On Tue, 14 Oct 2008 02:33:57 -0700 (PDT), James Kanze
    <> wrote:

    >> If you can reserve the required space first, sequentially adding n
    >> items is O(n).

    >
    >Creating a vector of n objects using push_back is required by
    >the standard to be O(n). All reserve() does is reduce the
    >constant factor.


    There is a difference between O(n) and amortized O(n) which can be
    significant (eg. for real-time), but I suspect this is genuine O(n)
    here, despite the amortized O(1) for each individual push_back call?

    >The real problem with it is memory consumption. You can easily
    >end up consuming three times more memory than you are actually
    >using.


    Yes, but as people keep telling me when I talk about (e.g.)
    implementing data structures for large mutable containers in Haskell
    where the data might need to be kept mainly on disk, what's wrong with
    letting the virtual memory deal with that? Memory is cheap - and disk
    space is even cheaper.

    Of course I know the answers to that, but even so, it's a compelling
    argument a lot of the time. The part of that vector that you're not
    using will (on *some* platforms) never take up any actual memory at
    all, and may never hit the disk either - though of course it's still
    wasting a big chunk of your addressing space which can be significant
    in itself.

    The question then becomes "just how many of these huge vectors do you
    have?"
    Stephen Horne, Oct 14, 2008
    #17
  18. Kai-Uwe Bux Guest

    Stephen Horne wrote:

    > On Tue, 14 Oct 2008 02:42:59 -0700 (PDT), James Kanze
    > <> wrote:
    >
    >>I have a couple of cases (maybe two in the last ten years) where
    >>I've used reserve(). Always to avoid invalidating iterators,
    >>however. (Most of the time, however, if the size of the vector
    >>is changing, I'll save the index, not an iterator.)

    >
    > As I understand it, that may avoid invalidating the iterator in
    > practice, but not according to the standard.
    >
    > That is, I thought that any modification of a standard library
    > container (other than a simple in-place overwrite of a value) is
    > considered to invalidate all iterators that refer into that container.

    [snip]

    Not true in general. The standard containers make very different guarantees
    about validity of iterators, pointers, and references. For std::vector<>,
    you get from [23.2.4.3/1] you get a guarantee for insert that iterators and
    references to elements before the point of insertion remain valid as long
    as the capacity is greater than the new size.

    The best guarantees makes std::list, which is one of the foremost reasons to
    use it.



    Best

    Kai-Uwe Bux
    Kai-Uwe Bux, Oct 14, 2008
    #18
  19. red floyd Guest

    Kai-Uwe Bux wrote:

    > The best guarantees makes std::list, which is one of the foremost reasons to
    > use it.
    >



    We had a contract where the coding spec said flat-out "NO use of new
    whatsoever". So we used std::list instead, and returned pointers to
    the list elements when needed instead.
    red floyd, Oct 14, 2008
    #19
  20. peter koch Guest

    On 14 Okt., 23:38, red floyd <> wrote:
    > Kai-Uwe Bux wrote:
    > > The best guarantees makes std::list, which is one of the foremost reasons to
    > > use it.

    >
    > We had a contract where the coding spec said flat-out "NO use of new
    > whatsoever".   So we used std::list instead, and returned pointers to
    > the list elements when needed instead.


    I hope the contract did allow the library code to use new? ;-)
    Apart from that, such a contract is just silly - but you knew that
    already.

    /Peter
    peter koch, Oct 14, 2008
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. john smith

    vector.reserve

    john smith, Jul 25, 2003, in forum: C++
    Replies:
    5
    Views:
    690
    John Harrison
    Jul 25, 2003
  2. Alex Vinokur

    vector : reserve(LONG_MAX)

    Alex Vinokur, Apr 22, 2004, in forum: C++
    Replies:
    2
    Views:
    448
    Pete Becker
    Apr 22, 2004
  3. Matthias Kaeppler

    seg-fault on vector-auto-reserve

    Matthias Kaeppler, Feb 27, 2005, in forum: C++
    Replies:
    2
    Views:
    463
    Victor Bazarov
    Feb 27, 2005
  4. Gary Kuehn
    Replies:
    2
    Views:
    459
    Gary Kuehn
    Jul 19, 2005
  5. Replies:
    8
    Views:
    1,914
    Csaba
    Feb 18, 2006
Loading...

Share This Page