Is it better idea to do resize of vector before calling push_back?

T

Tarun

Hello,

I am facing problem sometimes while I am trying to do push_back on a
vector. Currently I am doing resize of the vector increasing the size
by one and then push_back and seems like the code is working fine.

Is it a better idea to do resize befoire calling push_back?

Regards,
Tarun
 
P

peter koch

Tarun skrev:
Hello,

I am facing problem sometimes while I am trying to do push_back on a
vector. Currently I am doing resize of the vector increasing the size
by one and then push_back and seems like the code is working fine.

Is it a better idea to do resize befoire calling push_back?

No reason to do that. Use resize if you are going to push_back many
elements and you know how many elements (approximately) you will add.

/Peter
 
D

Daniel T.

peter koch said:
Tarun skrev:

No reason to do that. Use resize if you are going to push_back many
elements and you know how many elements (approximately) you will add.

I'm going to assume you guys are talking about reserve and not resize.
Otherwise I agree with peter. Only use reserve when profiling has shown
there is a bottleneck.
 
V

Victor Bazarov

Daniel said:
I'm going to assume you guys are talking about reserve and not resize.
Otherwise I agree with peter. Only use reserve when profiling has
shown there is a bottleneck.

Performance of the insertions is not necessarily the trouble when using
'push_back'. Frequent reallocations from smaller size to larger can
cause heap fragmentation which may impede the performance later and not
even show up on profiling radars. Use 'reserve' any time you know (for
sure) what the size will be after all insertions. And if that's the
case (i.e. you know what the size is), then why not just resize and use
the assignment instead?

V
 
P

peter koch

Daniel T. skrev:
I'm going to assume you guys are talking about reserve and not resize.
Otherwise I agree with peter. Only use reserve when profiling has shown
there is a bottleneck.

I certainly had reserve in mind. resize is an entirely different beast.

/Peter
 
R

Roland Pibinger

Performance of the insertions is not necessarily the trouble when using
'push_back'.

not the only trouble :)
Frequent reallocations from smaller size to larger can
cause heap fragmentation which may impede the performance later and not
even show up on profiling radars.

To bolster your argument I wrote a small test application to check how
(amazingly) often reallocation happens:

#include <iostream>
#include <vector>
using namespace std;

int main() {
vector<int> v;
v.push_back (0);
int* p = &v[0];

for (int i = 1; i < 100; ++i) {
v.push_back (i);
if (&v[0] != p) {
cout << "reallocated at: " << i << endl;
p = &v[0];
}
}
}
Use 'reserve' any time you know (for
sure) what the size will be after all insertions.
agreed.

And if that's the
case (i.e. you know what the size is), then why not just resize and use
the assignment instead?

because the elements are expensive to create and/or assign? E.g.
std::string.

Best regards,
Roland Pibinger
 
B

Bo Persson

Roland said:
Performance of the insertions is not necessarily the trouble when
using 'push_back'.

not the only trouble :)
Frequent reallocations from smaller size to larger can
cause heap fragmentation which may impede the performance later
and not even show up on profiling radars.

To bolster your argument I wrote a small test application to check
how (amazingly) often reallocation happens:

#include <iostream>
#include <vector>
using namespace std;

int main() {
vector<int> v;
v.push_back (0);
int* p = &v[0];

for (int i = 1; i < 100; ++i) {
v.push_back (i);
if (&v[0] != p) {
cout << "reallocated at: " << i << endl;
p = &v[0];
}
}
}

And what did your implementation show?

On my system I ran it for a million iterations instead:
reallocated at: 1024
reallocated at: 1536
reallocated at: 2304
reallocated at: 3456
reallocated at: 5184
reallocated at: 7776
reallocated at: 11664
reallocated at: 17496
reallocated at: 26244
reallocated at: 39366
reallocated at: 59049
reallocated at: 88573
reallocated at: 132859
reallocated at: 199288
reallocated at: 298932
reallocated at: 448398
reallocated at: 672597

Reserve helps even if you only have a good guess at the final size. The
closer guess the better, of course, but any hint helps.

With reserve(500000) I get:
reallocated at: 500000
reallocated at: 750000

Not too bad!


Bo Persson
 
P

Philipp Reh

Daniel T. wrote: [snip]
And if that's the
case (i.e. you know what the size is), then why not just resize and use
the assignment instead?

V

Because resize has to construct all new elements to a default value.
 
R

Roland Pibinger

Roland said:
#include <iostream>
#include <vector>
using namespace std;

int main() {
vector<int> v;
v.push_back (0);
int* p = &v[0];

for (int i = 1; i < 100; ++i) {
v.push_back (i);
if (&v[0] != p) {
cout << "reallocated at: " << i << endl;
p = &v[0];
}
}
}

And what did your implementation show?

On my system I ran it for a million iterations instead:
reallocated at: 1024
reallocated at: 1536
reallocated at: 2304
reallocated at: 3456
reallocated at: 5184
reallocated at: 7776
reallocated at: 11664
reallocated at: 17496
reallocated at: 26244
reallocated at: 39366
reallocated at: 59049
reallocated at: 88573
reallocated at: 132859
reallocated at: 199288
reallocated at: 298932
reallocated at: 448398
reallocated at: 672597

The 'reallocation pattern' seems to be obvious. BTW, g++ 3.4 uses n*2.
Sum the numbers and you get the amount of unnecessary copies.
Reserve helps even if you only have a good guess at the final size. The
closer guess the better, of course, but any hint helps.

With reserve(500000) I get:
reallocated at: 500000
reallocated at: 750000

Not too bad!

Hmm, 500000 + 750000 unnecessary copies for 1000000 entries! Better
significantly overstimate the reserved space.

Best wishes,
Roland Pibinger
 
V

Victor Bazarov

Philipp said:
Daniel T. wrote: [snip]
And if that's the
case (i.e. you know what the size is), then why not just resize and
use the assignment instead?

V

Because resize has to construct all new elements to a default value.

So? 'push_back' has to copy-construct them. If you know for sure that
the combination of default-construction with assignment is slower than
copy-construction, then you still have to decide what's more important
to you, speed *now* or less fragmented heap *later*. Keep in mind that
even if your heap manager defragments the heap with deallocations,
defragmenting the heap also requires time.

I often found that for simple element types, like char or double, doing

vector<double> blah(somany);
for (int i = 0; i < somany; ++i)
blah = ...

is *better* than

vector<double> blah;
for (int i = 0; i < somany; ++i)
blah(push_back(...

[I already explained the reasons] Of course, it's all empirical, YMMV.

V
 
J

Joe Gottman

Victor said:
Performance of the insertions is not necessarily the trouble when using
'push_back'. Frequent reallocations from smaller size to larger can
cause heap fragmentation which may impede the performance later and not
even show up on profiling radars. Use 'reserve' any time you know (for
sure) what the size will be after all insertions. And if that's the
case (i.e. you know what the size is), then why not just resize and use
the assignment instead?


Two reasons to use reserve() and push_back() rather than resize()
and assignment:

1) reserve() and push_back() requires only using one copy constructor,
while resize() and assignment requires using a default constructor
followed by an assignment operator. The first is often more efficient.

2) reserve() followed by push_back() is useful when you don't know the
exact size of your vector but you have a definite upper bound on its size.

Joe Gottman
 
M

Mirek Fidler

Roland said:
not the only trouble :)


To bolster your argument I wrote a small test application to check how
(amazingly) often reallocation happens:

You need to see things in context. In reality, if vector has 100%
growth factor, *average* number of copies per element is 2 - first is
in push_back, second is due to reallocation.

In fact, you are likely to spend a little bit more time in push_back
(because batch operations are always faster). Especially if you are
using Upp::Vector instead of std::vector (and reallocations are
therefore performed by memmove).

That is why using reserve has so little impact on performance (widely
known fact).

Mirek
 
M

Mirek Fidler

Victor said:
Philipp said:
Daniel T. wrote: [snip]
And if that's the
case (i.e. you know what the size is), then why not just resize and
use the assignment instead?

V

Because resize has to construct all new elements to a default value.

So? 'push_back' has to copy-construct them. If you know for sure that
the combination of default-construction with assignment is slower than
copy-construction, then you still have to decide what's more important
to you, speed *now* or less fragmented heap *later*.

If you know that, just use reserve - same results for heap
framentation, default construction avoided.

Mirek
 
R

Roland Pibinger

You need to see things in context. In reality, if vector has 100%
growth factor, *average* number of copies per element is 2 - first is
in push_back, second is due to reallocation.

The context here is to compare push_back performance with and without
calling reserve before. You have 1 copy per element with reserve and 3
without (e.g. for Dinkumware with growth factor 1,5).
That is why using reserve has so little impact on performance (widely
known fact).

That depends of course on the number and 'weight' of the vector
elements.

Best regards,
Roland Pibinger
 
T

Tarun

Thanks a lot friends. I am working on it with all of yours suggestion
in mind. Will getback to you people again.

Regards,
Tarun.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,763
Messages
2,569,562
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top