vector::push_back performance

  • Thread starter Antonios Christofides
  • Start date
A

Antonios Christofides

Hi,

As I read in the archives, the performance problem caused by memory
reallocations during vector::push_back is a common one. My first C++
program is suffering from it: 300 thousand push_backs result,
according to the profiler, in 20 reallocations; these 20 reallocations
account for 3.6 seconds (Celeron 1.3G), which is 40% of total
execution time.

What I don't understand: why is the reallocation code so complex? I
studied the library source and I have a hard time understanding it,
but it seems to be copying the vector item by item in each
reallocation. Why wouldn't a "realloc" suffice?

And, given that I don't know the vector size beforehand, is there
anything else I can do other than trying deqeue or a guessed
vector::reserve?

In case it matters, I'm using gcc 3.3 with its standard c++ library on
a Debian sarge, but portability is also an issue.

Thanks!
 
R

Rolf Magnus

Antonios said:
Hi,

As I read in the archives, the performance problem caused by memory
reallocations during vector::push_back is a common one. My first C++
program is suffering from it: 300 thousand push_backs result,
according to the profiler, in 20 reallocations; these 20 reallocations
account for 3.6 seconds (Celeron 1.3G), which is 40% of total
execution time.

What I don't understand: why is the reallocation code so complex? I
studied the library source and I have a hard time understanding it,
but it seems to be copying the vector item by item in each
reallocation. Why wouldn't a "realloc" suffice?

That's what a "realloc" does, too. You usually can't easily make an already
allocated memory block bigger (what would you do with data after it?), so a
new block must be allocated and the data be copied over to it, then the old
one destroyed.
And, given that I don't know the vector size beforehand, is there
anything else I can do other than trying deqeue or a guessed
vector::reserve?

Not much.
 
V

Victor Bazarov

Antonios said:
As I read in the archives, the performance problem caused by memory
reallocations during vector::push_back is a common one. My first C++
program is suffering from it: 300 thousand push_backs result,
according to the profiler, in 20 reallocations; these 20 reallocations
account for 3.6 seconds (Celeron 1.3G), which is 40% of total
execution time.

Three hundred thousand push_backs into a vector without reserve? Seems
unjustified.
What I don't understand: why is the reallocation code so complex? I
studied the library source and I have a hard time understanding it,
but it seems to be copying the vector item by item in each
reallocation. Why wouldn't a "realloc" suffice?

What would 'realloc' do? Place the objects each at a different memory
location without letting the object know? That's not right. Objects
may need to know where they have been constructed. They might want to
let other classes or objects know of their location, etc.
And, given that I don't know the vector size beforehand, is there
anything else I can do other than trying deqeue or a guessed
vector::reserve?

Nope. Using standard containers requires acknowledging the trade-offs.
If you need fast push_back and you don't know the size, you should
probably use 'std::list' or 'std::deque'. If you need random access
afterwards, don't push-back without reserving. I bet any decent book
on Standard containers talks about how to pick the container well-suited
for your task. Of course, that assumes that you know what your task is.
In case it matters, I'm using gcc 3.3 with its standard c++ library on
a Debian sarge, but portability is also an issue.

It doesn't matter, at least not here.

V
 
P

Phlip

Antonios said:
As I read in the archives, the performance problem caused by memory
reallocations during vector::push_back is a common one. My first C++
program is suffering from it: 300 thousand push_backs result,
according to the profiler, in 20 reallocations; these 20 reallocations
account for 3.6 seconds (Celeron 1.3G), which is 40% of total
execution time.

Do you think you could go to an algorithm where you push less back?
What I don't understand: why is the reallocation code so complex? I
studied the library source and I have a hard time understanding it,
but it seems to be copying the vector item by item in each
reallocation. Why wouldn't a "realloc" suffice?

Read Herb Sutter's way-cool GOTW series, and his books /Exceptional C++/. He
impugns the container class of choice should usually be std::deque<>, not
std::vector<>. It frags not memory like std::list<>, and it's optimal to
push things to both the beginning and end.
 
A

Andrew Koenig

As I read in the archives, the performance problem caused by memory
reallocations during vector::push_back is a common one. My first C++
program is suffering from it: 300 thousand push_backs result,
according to the profiler, in 20 reallocations; these 20 reallocations
account for 3.6 seconds (Celeron 1.3G), which is 40% of total
execution time.

I'm skeptical.

Here's a little program:

#include <vector>

int main()
{
std::vector<int> v;
for (std::vector<int>::size_type i = 0; i != 1000000; ++i)
v.push_back(i);
return 0;
}

When I run this program on my machine (admittedly faster than 1.3G, but no
more than twice as fast), it runs in three *hundredths* of a second. And it
calls push_back a million times, not 300,000 times.

This behavior suggests to me that your vector must contain objects of a
class that is much more expensive to copy than int.

So I think we need to see more information about your program in order to
understand the source of the performance problem.
 
M

Method Man

That's what a "realloc" does, too. You usually can't easily make an
already
allocated memory block bigger (what would you do with data after it?), so a
new block must be allocated and the data be copied over to it, then the old
one destroyed.

All the texts I have read state that, when a dynamic array needs to be
extended, it is better/faster to 'realloc' instead of creating a new array
and copying the elements manually.

So why is 'realloc' more efficient?
 
V

Victor Bazarov

Method Man said:
All the texts I have read state that, when a dynamic array needs to be
extended, it is better/faster to 'realloc' instead of creating a new array
and copying the elements manually.

You've been reading too many C texts, haven't you?
So why is 'realloc' more efficient?

I am not sure how to answer this question. Imagine you have a tree which
has grown too far up and needs trimming. You decide to trim it a foot off
the ground because it's too inefficient to climb up and trim every branch
that could use a trimming. You lose the tree. Why is cutting it down more
efficient than doing it right? There is no answer. Another analogy: a TV
set and a need to deliver it from the 7th floor to the truck parked outside.
It can be done on an elevator, it could be done on the stairs. Or, somebody
might decide that it's more efficient to lower it down through the window.
Without ropes. Hey, a couple of seconds and it's down on the ground, no?
Why is throwing it down more efficient than using the elevator (or stairs)?

For POD you can do realloc. For non-POD classes (general case) realloc will
simply not work. Efficient or not.

V
 
S

Siemel Naran

Method Man said:
All the texts I have read state that, when a dynamic array needs to be
extended, it is better/faster to 'realloc' instead of creating a new array
and copying the elements manually.

So why is 'realloc' more efficient?

Because if more space exists at the end of the existing array to yield a
larger array, then realloc will just claim that space. Say you do
malloc(512u) and the program reserves bytes 0x101 to 0x300 for your array.
Say bytes 0x301 to 0x0500 are free for anyone to use. If no other objects
request this space and you realloc the array to 1024u bytes, then the system
may just mark bytes 0x0101 to 0x0500 as in use by your array (so no-one else
can claim it).

There's no garuantee that space is available, but if it is, we save copying
lots of bytes.
 
S

Siemel Naran

Antonios Christofides said:
What I don't understand: why is the reallocation code so complex? I
studied the library source and I have a hard time understanding it,
but it seems to be copying the vector item by item in each
reallocation. Why wouldn't a "realloc" suffice?

For user types, especially those managing dynamic memory like std::string or
std::deque, we have to call the overloaded copy constructor or operator= to
the copy.
And, given that I don't know the vector size beforehand, is there
anything else I can do other than trying deqeue or a guessed
vector::reserve?

What about std::list? What is wrong with std::deque?

If you have a large object, it might be a good idea to make it reference
counted. You can use boost::shared_ptr or the like.
 
M

Method Man

Victor Bazarov said:
You've been reading too many C texts, haven't you?

Well yea.

I was talking about arrays of POD types, but I didn't make that clear in my
post. Of course non-POD types can have constructors and overloaded
assignment operators, so my question wouldn't make sense in that case.
I am not sure how to answer this question. Imagine you have a tree which
has grown too far up and needs trimming. You decide to trim it a foot off
the ground because it's too inefficient to climb up and trim every branch
that could use a trimming. You lose the tree. Why is cutting it down more
efficient than doing it right? There is no answer. Another analogy: a TV
set and a need to deliver it from the 7th floor to the truck parked outside.
It can be done on an elevator, it could be done on the stairs. Or, somebody
might decide that it's more efficient to lower it down through the window.
Without ropes. Hey, a couple of seconds and it's down on the ground, no?
Why is throwing it down more efficient than using the elevator (or stairs)?

For POD you can do realloc. For non-POD classes (general case) realloc will
simply not work. Efficient or not.

Your analogies didn't really help in my understanding of realloc. I was
looking for something like -- 'realloc' is never/sometimes/always more
efficient than malloc'ing a new array and manually copying from the old
array (of PODs). Then justify the choice.
 
A

Antonios Christofides

All the texts I have read state that, when a dynamic array needs
Except for what was already said, I believe that even in the cases
when realloc needs to allocate a new memory block rather than extend
the existing one, it uses memmove or some similar operation which will
be faster than manually copying the elements if its implementation
uses specialized mass-byte-copying CPU instructions.
 
M

Magnus

Andrew Koenig said:
I'm skeptical.

Here's a little program:

#include <vector>

int main()
{
std::vector<int> v;
for (std::vector<int>::size_type i = 0; i != 1000000; ++i)
v.push_back(i);
return 0;
}

When I run this program on my machine (admittedly faster than 1.3G, but no
more than twice as fast), it runs in three *hundredths* of a second. And it
calls push_back a million times, not 300,000 times.

This behavior suggests to me that your vector must contain objects of a
class that is much more expensive to copy than int.

So I think we need to see more information about your program in order to
understand the source of the performance problem.

How do you time your execution time? Is there _one_ way to time this, or
might your method differ from the OP method?

- Magnus
 
A

Antonios Christofides

Siemel said:
For user types, especially those managing dynamic memory like
std::string or std::deque, we have to call the overloaded copy
constructor or operator= to the copy.

Thank you for your responses. The contained type is indeed a class
that contains, among other things, a string and another user-class
object. I'll go back to Stroustrup to re-read about shallow and deep
copies and copy constructors and so on, and I'll come back either to
summarize or to ask more questions (the latter seems more likely :)
 
R

Rolf Magnus

Method said:
All the texts I have read state that, when a dynamic array needs to be
extended, it is better/faster to 'realloc' instead of creating a new array
and copying the elements manually.

So why is 'realloc' more efficient?

It might just behave like push_back() and allocate more memory than
requested so that the next realloc doesn't need to copy the data to a new
block.
 
R

Rolf Magnus

Victor said:
Three hundred thousand push_backs into a vector without reserve? Seems
unjustified.

From all we know, the number of elements could be between 1 and 30 Million.
After all, the OP said he doesn't know the number of elements beforehand.
So how much would you reserve?
 
V

Victor Bazarov

Method said:
[...]
Your analogies didn't really help in my understanding of realloc.

They were intended to show the pitfalls of using 'realloc' with generic
types. Since now we're talking specifically POD, they are moot.
I was
looking for something like -- 'realloc' is never/sometimes/always more
efficient than malloc'ing a new array and manually copying from the old
array (of PODs). Then justify the choice.

'realloc' works in an implementation-defined way. There is always some
possibility that a memory block allocated for an array can be simply
extended without the need to copy. There is always some possibility
that mere calling malloc and then some kind of copying (even memcpy)
can be less efficient when it's done from within your code than if it
is done in the library written (and optimised) specifically for your
hardware. So, in general 'realloc' will _always_ be at least as fast
as you can emulate it with your own 'malloc' and 'memcpy'.

More on C standard library - in comp.lang.c.

Victor
 
J

Jeff Flinn

Victor Bazarov said:
Method said:
[...]
Your analogies didn't really help in my understanding of realloc.

They were intended to show the pitfalls of using 'realloc' with generic
types. Since now we're talking specifically POD, they are moot.

Aah, but elsewhere in this thread to OP informs us that in fact he is
dealing with a user defined class. so your analogies were very apt.

Jeff F
 
I

Ioannis Vranos

Method said:
All the texts I have read state that, when a dynamic array needs to be
extended, it is better/faster to 'realloc' instead of creating a new array
and copying the elements manually.

So why is 'realloc' more efficient?


Regarding vector, nowhere is required that all elements are copied in a
new location after some vector::push_back(), vector::resize() or some
other modifier. In all these case, an operation similar to realloc() is
assumed.


However, as with realloc(), objects may be moved.
 
I

Ioannis Vranos

Method said:
Your analogies didn't really help in my understanding of realloc. I was
looking for something like -- 'realloc' is never/sometimes/always more
efficient than malloc'ing a new array and manually copying from the old
array (of PODs). Then justify the choice.


As Victor said in a follow-up message,

"So, in general 'realloc' will _always_ be at least as fast as you can
emulate it with your own 'malloc' and 'memcpy'."


However your messages are not comprehensible. I haven't understood
exactly what you want to learn since the beginning of this thread.
 
A

Andrew Koenig

How do you time your execution time? Is there _one_ way to time this, or
might your method differ from the OP method?

A factor of 100 difference? Hardly likely.

Try the program yourself and see.

On my machine it runs so fast that I don't even have time to get my finger
off the enter button before it finishes.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top