reserve of vector

A

ab2305

Does standard mandates that the reserve call should initialize a
container's values to its defaults, hence vector<int> intVec;
intVec.reserve(100) would make intVec[0]=intVec[1]....intVec[99]=0 ?

Thanks
 
K

Kai-Uwe Bux

Does standard mandates that the reserve call should initialize a
container's values to its defaults, hence vector<int> intVec;
intVec.reserve(100) would make intVec[0]=intVec[1]....intVec[99]=0 ?

No. In fact, size() remains unchanged so that referencing those elements is
undefined behavior.

The resize() method, in this case, will grow the vector and initialize the
missing elements.


Best

Kai-Uwe Bux
 
S

Stephen Horne

Does standard mandates that the reserve call should initialize a
container's values to its defaults, hence vector<int> intVec;
intVec.reserve(100) would make intVec[0]=intVec[1]....intVec[99]=0 ?

No. In fact, size() remains unchanged so that referencing those elements is
undefined behavior.

Just to expand on that, reserve is for managing memory. If you know
that you're about to add thousands of items to a vector, using a
reserve to pre-grow the memory allocation can make it more efficient,
since the memory allocation only gets grown once - not potentially
dozens or hundreds of times.

IIRC sequentially adding n items to the end of a vector takes O(n^2)
time. This is because the memory gets grown O(n) times - every k
items, say, where k is a constant - and each growing can involve
copying the whole vector which is on average O(n) again.

If you can reserve the required space first, sequentially adding n
items is O(n). There is only one memory reallocation, and only one
copying - at most - so its the actual appending of values that
determines the time taken, not repeated memory reallocation and
copying.

There's also a scheme where the reserved memory is doubled on each
reallocation, and this also gives O(n), though it's still slower than
full preallocation.
 
A

acehreli

Just to expand on that, reserve is for managing memory. If you know
that you're about to add thousands of items to a vector, using a
reserve to pre-grow the memory allocation can make it more efficient,

I've read that reserve() doesn't help much. (I remember reading from
Bjarne Stroustrup himself that he stopped calling reserve() at some
point; but cannot find any references for that.)
IIRC sequentially adding n items to the end of a vector takes O(n^2)
time. This is because the memory gets grown O(n) times - every k
items, say, where k is a constant - and each growing can involve
copying the whole vector which is on average O(n) again.

The standard requires that push_back has amortized constant time; so
the implementers cannot use the complexity that you describe.
There's also a scheme where the reserved memory is doubled on each
reallocation, and this also gives O(n), though it's still slower than
full preallocation.

Nowadays the implementers use a growth factor of 50%, which reportedly
works better with pool allocators.

Ali
 
A

acehreli

I've read that reserve() doesn't help much. (I remember reading from
Bjarne Stroustrup himself that he stopped calling reserve() at some
point; but cannot find any references for that.)

I found it:

http://www.research.att.com/~bs/bs_faq2.html#slow-containers

There, Stroustrup says:

<quote>
People sometimes worry about the cost of std::vector growing
incrementally. I used to worry about that and used reserve() to
optimize the growth. After measuring my code and repeatedly having
trouble finding the performance benefits of reserve() in real
programs, I stopped using it except where it is needed to avoid
iterator invalidation (a rare case in my code). Again: measure before
you optimize.
</quote>

Ali
 
K

Kai-Uwe Bux

Stephen said:
Does standard mandates that the reserve call should initialize a
container's values to its defaults, hence vector<int> intVec;
intVec.reserve(100) would make intVec[0]=intVec[1]....intVec[99]=0 ?

No. In fact, size() remains unchanged so that referencing those elements
is undefined behavior.

Just to expand on that, reserve is for managing memory. If you know
that you're about to add thousands of items to a vector, using a
reserve to pre-grow the memory allocation can make it more efficient,
since the memory allocation only gets grown once - not potentially
dozens or hundreds of times.

IIRC sequentially adding n items to the end of a vector takes O(n^2)
time. This is because the memory gets grown O(n) times - every k
items, say, where k is a constant - and each growing can involve
copying the whole vector which is on average O(n) again.

No. Appending to a vector is amortized constant time. The vector only
moves when size() and capacity() conincide. At those points, the capacity
jumps by a certain factor and not by an additive constant (your k).

BTW: this is not a quality of implementation issue. The standard makes this
requirement in clause [23.1.1/12].

If you can reserve the required space first, sequentially adding n
items is O(n). There is only one memory reallocation, and only one
copying - at most - so its the actual appending of values that
determines the time taken, not repeated memory reallocation and
copying.

There's also a scheme where the reserved memory is doubled on each
reallocation, and this also gives O(n), though it's still slower than
full preallocation.

What matters is not the precise factor but that the constant is
multiplicative rather than additive. I think, many implementations out
there use the golden ratio or some other constant smaller than 2.

Nonetheless, preallocation will beat naive sequential push_back()
performance wise.


Best

Kai-Uwe Bux
 
S

Stephen Horne

I found it:

I agree - I didn't say anyone should use it ;-)

Personally, I don't have a single use of it in my current working copy
- I just checked. Plenty of resizes, but no reserves. If I need a big
enough collection that it's an issue, I normally end up using a
different kind of container anyway, though not specifically to avoid
this.

I have my own vector ADT based on a multiway tree, though only because
it was convenient given the map, set etc ADTs based on the same
underlying code - they all support log n subscripting. Writing the one
extra wrapper seemed prudent at the time, but I don't think I've ever
used it. If I wanted a huge vector (perhaps a very large audio file) I
might use this since it supports efficient inserts anywhere in the
container - though I'd have to add some bulk ops first.

For video, I wouldn't use it - I'd use a normal vector of pointers,
either to frames or GOPs. An hours video is only in the order of
100,000 frames anyway. And I might just use an vector of pointers to
chunks even for the huge audio file case.

The word "thousands" in my earlier post was out by several orders, I
guess.

I needed a 64 million item 3/4 GB array recently. But it was
fixed-size, so I used a fixed size array, not a vector. If I had used
a vector, I would probably have used reserve.

Strictly I did use a vector with millions of items at one stage -
underlying a priority queue that held up to approx. two million items.
Copy overheads *were* an issue here, but not so much from memory
allocation as from the items flowing through the queue. I solved it by
using a more specialised queuing method, putting items into the
location in the main array where they'd be needed (with some collision
handling).

But even that was for a programming challenge thing that I was just
doing for fun.

http://projecteuler.net/

Problem 211 - it turns out I solved it the hard way, but what the
hell.

Wish I could justify more time for these problems. I've had a plan for
problem 210 (but no code yet) for about a week. This time doing it the
easy way ;-)
 
S

Stephen Horne

No. Appending to a vector is amortized constant time. The vector only
moves when size() and capacity() conincide. At those points, the capacity
jumps by a certain factor and not by an additive constant (your k).

That's good to know. Thanks.

I could swear that there used to be fixed increment - even that the
interface allowed the increment to be changed - but there you go.

I didn't think they'd impose that because 50% worst-case memory
overhead (often 75% when you allow for deletes), but its no bad thing.
A memory overhead on that scale isn't usually a big deal these days.
What matters is not the precise factor but that the constant is
multiplicative rather than additive. I think, many implementations out
there use the golden ratio or some other constant smaller than 2.

Reducing the worst-case memory overhead. Makes sense.
 
J

James Kanze

(e-mail address removed) wrote:
Does standard mandates that the reserve call should
initialize a container's values to its defaults, hence
vector<int> intVec; intVec.reserve(100) would make
intVec[0]=intVec[1]....intVec[99]=0 ?
No. In fact, size() remains unchanged so that referencing
those elements is undefined behavior.
Just to expand on that, reserve is for managing memory. If you
know that you're about to add thousands of items to a vector,
using a reserve to pre-grow the memory allocation can make it
more efficient, since the memory allocation only gets grown
once - not potentially dozens or hundreds of times.

It's rarely necessary for optimization reasons (although it
doesn't hurt). The main reason for using reserve() is to avoid
invalidating iterators, pointers and references into the vector.
(Appending to a vector will not invalidate iterators unless the
capacity() increases.) Another reason, particularly with very
large vectors, is to reduce total memory consumption.
IIRC sequentially adding n items to the end of a vector takes
O(n^2) time. This is because the memory gets grown O(n) times
- every k items, say, where k is a constant - and each growing
can involve copying the whole vector which is on average O(n)
again.

The standard requires that appending to a vector be "amortized
constant time". In practice, this means that memory growth must
be exponential---older implementations generally doubled the
size each time, but I think the general consensus today is that
1.5 times is preferrable.
If you can reserve the required space first, sequentially adding n
items is O(n).

Creating a vector of n objects using push_back is required by
the standard to be O(n). All reserve() does is reduce the
constant factor.
There is only one memory reallocation, and only one copying -
at most - so its the actual appending of values that
determines the time taken, not repeated memory reallocation
and copying.
There's also a scheme where the reserved memory is doubled on
each reallocation, and this also gives O(n), though it's still
slower than full preallocation.

The real problem with it is memory consumption. You can easily
end up consuming three times more memory than you are actually
using.
 
J

James Kanze

Stephen Horne wrote:

[...]
What matters is not the precise factor but that the constant
is multiplicative rather than additive. I think, many
implementations out there use the golden ratio or some other
constant smaller than 2.

I'd be surprised if they used exactly the golden ratio, but 1.5
is an easy to calculate approximation.

The motivation for this is simple: if you double each time, the
sum of the memory you've previously freed is always less than
the amount of memory you're requesting. A little bit of
mathematical analysis shows that this is the case any time the
multiplier is greater than the golden ration.
Nonetheless, preallocation will beat naive sequential
push_back() performance wise.

Slightly.

I have a couple of cases (maybe two in the last ten years) where
I've used reserve(). Always to avoid invalidating iterators,
however. (Most of the time, however, if the size of the vector
is changing, I'll save the index, not an iterator.)
 
K

Kai-Uwe Bux

James said:
Stephen Horne wrote:
[...]
What matters is not the precise factor but that the constant
is multiplicative rather than additive. I think, many
implementations out there use the golden ratio or some other
constant smaller than 2.

I'd be surprised if they used exactly the golden ratio, but 1.5
is an easy to calculate approximation.

The motivation for this is simple: if you double each time, the
sum of the memory you've previously freed is always less than
the amount of memory you're requesting. A little bit of
mathematical analysis shows that this is the case any time the
multiplier is greater than the golden ration.

An implementation could easily store the capacity and the previous capacity.
The new capacity would simply the the sum of both. That will satisfy the
motivation and converge to the golden ratio.

As for the motivation, I think that might be good for some allocators but I
find it hard to believe that it matters all that much. It is highly
unlikely that memory gets reused by the same vector. More likely, a
different vector is using that block. In any case, that strategy is easy to
turn into a policy and one can play and measure whether it has any impact.


[snip]


Best

Kai-Uwe Bux
 
J

Juha Nieminen

I found it:

http://www.research.att.com/~bs/bs_faq2.html#slow-containers

There, Stroustrup says:

<quote>
People sometimes worry about the cost of std::vector growing
incrementally. I used to worry about that and used reserve() to
optimize the growth. After measuring my code and repeatedly having
trouble finding the performance benefits of reserve() in real
programs, I stopped using it except where it is needed to avoid
iterator invalidation (a rare case in my code). Again: measure before
you optimize.
</quote>

I find it very odd that Stroustrup is only concerned about the *speed*
of std::vector here.

The major problem with std::vector is not its speed, but the memory
fragmentation it causes. In the worst cases memory fragmentation can
almost double the memory usage of the entire program (I do have actual
personal experience of this). If you know the amount of elements you are
going to push_back(), reserve() will eliminate the memory fragmentation
caused by them.
 
J

Juha Nieminen

Does standard mandates that the reserve call should initialize a
container's values to its defaults, hence vector<int> intVec;
intVec.reserve(100) would make intVec[0]=intVec[1]....intVec[99]=0 ?

No, the idea with reserve() is precisely that it allocates memory for
that many elements *without* initializing them (because initialization
can be heavy and you don't necessarily want to do that).

If you want the elements to be initialized (and accessible) use
resize() rather than reserve().
 
E

Erik Wikström

I find it very odd that Stroustrup is only concerned about the *speed*
of std::vector here.

The major problem with std::vector is not its speed, but the memory
fragmentation it causes. In the worst cases memory fragmentation can
almost double the memory usage of the entire program (I do have actual
personal experience of this). If you know the amount of elements you are
going to push_back(), reserve() will eliminate the memory fragmentation
caused by them.

And then there is the over-allocation, depending on the growing strategy
used I would guess that one would end up with somewhere around 25-50%
over-allocation on average.
 
K

Kai-Uwe Bux

Erik said:
And then there is the over-allocation, depending on the growing strategy
used I would guess that one would end up with somewhere around 25-50%
over-allocation on average.

For a reallocation strategy with growth factor d and a sequence of
push_back() of random length, I get:

expected memory usage=

d - 1
m = --------- (for d = 2, approx = 0.72)
d ln(d)


expected number of internal copy constructions (not counting the initial
one)=

1
c = ------- (for d = 2, approx = 1.44)
ln(d)



Here is a little table:

d 1.5 1.6 2.0
m 0.822 0.798 0.721
c 2.466 2.127 1.443


Best

Kai-Uwe Bux
 
S

Stephen Horne

I have a couple of cases (maybe two in the last ten years) where
I've used reserve(). Always to avoid invalidating iterators,
however. (Most of the time, however, if the size of the vector
is changing, I'll save the index, not an iterator.)

As I understand it, that may avoid invalidating the iterator in
practice, but not according to the standard.

That is, I thought that any modification of a standard library
container (other than a simple in-place overwrite of a value) is
considered to invalidate all iterators that refer into that container.

It's one of the reasons I use my own associative containers for a lot
of high-level work. High level code shouldn't be worrying about these
kinds of safety issues - it should just get on with the task at hand.
A (very small) performance penalty is worth paying to have safe
iterators that remain valid when the container is modified.

Not necessarily true in low level code, of course.
 
S

Stephen Horne

Creating a vector of n objects using push_back is required by
the standard to be O(n). All reserve() does is reduce the
constant factor.

There is a difference between O(n) and amortized O(n) which can be
significant (eg. for real-time), but I suspect this is genuine O(n)
here, despite the amortized O(1) for each individual push_back call?
The real problem with it is memory consumption. You can easily
end up consuming three times more memory than you are actually
using.

Yes, but as people keep telling me when I talk about (e.g.)
implementing data structures for large mutable containers in Haskell
where the data might need to be kept mainly on disk, what's wrong with
letting the virtual memory deal with that? Memory is cheap - and disk
space is even cheaper.

Of course I know the answers to that, but even so, it's a compelling
argument a lot of the time. The part of that vector that you're not
using will (on *some* platforms) never take up any actual memory at
all, and may never hit the disk either - though of course it's still
wasting a big chunk of your addressing space which can be significant
in itself.

The question then becomes "just how many of these huge vectors do you
have?"
 
K

Kai-Uwe Bux

Stephen said:
As I understand it, that may avoid invalidating the iterator in
practice, but not according to the standard.

That is, I thought that any modification of a standard library
container (other than a simple in-place overwrite of a value) is
considered to invalidate all iterators that refer into that container.
[snip]

Not true in general. The standard containers make very different guarantees
about validity of iterators, pointers, and references. For std::vector<>,
you get from [23.2.4.3/1] you get a guarantee for insert that iterators and
references to elements before the point of insertion remain valid as long
as the capacity is greater than the new size.

The best guarantees makes std::list, which is one of the foremost reasons to
use it.



Best

Kai-Uwe Bux
 
R

red floyd

Kai-Uwe Bux said:
The best guarantees makes std::list, which is one of the foremost reasons to
use it.


We had a contract where the coding spec said flat-out "NO use of new
whatsoever". So we used std::list instead, and returned pointers to
the list elements when needed instead.
 
P

peter koch

We had a contract where the coding spec said flat-out "NO use of new
whatsoever".   So we used std::list instead, and returned pointers to
the list elements when needed instead.

I hope the contract did allow the library code to use new? ;-)
Apart from that, such a contract is just silly - but you knew that
already.

/Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top