std::string and std::ostringstream performances

B

Bala2508

Hi,

I have a C++ application that extensively uses std::string and
std::eek:stringstream in somewhat similar manner as below

std::string msgHeader;

msgHeader = "<";
msgHeader += a;
msgHeader += "><";

msgHeader += b;
msgHeader += "><";

msgHeader += c;
msgHeader += ">";

Similarly it uses ostringstream as well and the function that uses
this gets called almost on every message that my application gets on
the socket. I am using this to precisely construct a XML Message to
be sent to another application.

What we observed when we ran a collect/analyzer on the application is
that it shows majority of the CPU spent in trying to deal with these 2
datatypes, their memory allocation using std::allocator and other
stuff. The CPU goes as high as 100% sometimes.

I would like to get an advice/suggestion on the following points
1. Is there a better way to use std::string / std::eek:stringstream than
the way I have been using it?
2. AM I using the wrong datatype for such kind of operations and
should move on to use something else? Any suggestions what the
datatype should be?

I eventually need these datatypes because the external library that I
am using to send this data out needs it in std::string /
std::eek:stringstream formats.

Would like to have some suggestions to bring down the CPU utilization.

Thanks,
Bala
 
J

Jim Langston

Bala2508 said:
Hi,

I have a C++ application that extensively uses std::string and
std::eek:stringstream in somewhat similar manner as below

std::string msgHeader;

msgHeader = "<";
msgHeader += a;
msgHeader += "><";

msgHeader += b;
msgHeader += "><";

msgHeader += c;
msgHeader += ">";

Similarly it uses ostringstream as well and the function that uses
this gets called almost on every message that my application gets on
the socket. I am using this to precisely construct a XML Message to
be sent to another application.

What we observed when we ran a collect/analyzer on the application is
that it shows majority of the CPU spent in trying to deal with these 2
datatypes, their memory allocation using std::allocator and other
stuff. The CPU goes as high as 100% sometimes.

I would like to get an advice/suggestion on the following points
1. Is there a better way to use std::string / std::eek:stringstream than
the way I have been using it?
2. AM I using the wrong datatype for such kind of operations and
should move on to use something else? Any suggestions what the
datatype should be?

I eventually need these datatypes because the external library that I
am using to send this data out needs it in std::string /
std::eek:stringstream formats.

Would like to have some suggestions to bring down the CPU utilization.

One suggestion would be .reserve(). I E.
std::string msgHeader;
msgHeader.reserve( 100 );

That way the string msgHeader wouldn't need to try to allocate more memory
until it has used the initial 100 characters allocated. Some compilers are
better at preallocating a default number of bytes than others. Sometimes
they have to be given a hint. Figure out a good size to reserve (one big
enough where you won't need to be doing reallocatings, one small enough that
you're not running out of memory) and then try profiling it again and see if
it helps.
 
B

Bala

One suggestion would be .reserve(). I E.
std::string msgHeader;
msgHeader.reserve( 100 );

That way the string msgHeader wouldn't need to try to allocate more memory
until it has used the initial 100 characters allocated. Some compilers are
better at preallocating a default number of bytes than others. Sometimes
they have to be given a hint. Figure out a good size to reserve (one big
enough where you won't need to be doing reallocatings, one small enough that
you're not running out of memory) and then try profiling it again and see if
it helps.- Hide quoted text -

- Show quoted text -

I also clear the string using msgHeader.str("") method once i am done
with the sending of the message. Then again when this method gets
called, the same sequence of events happen. Wouldnt it clear the
allocated memory once i do a msgHeader.str("")? How do reserving
essentially help in this scenario?
 
G

Guest

I also clear the string using msgHeader.str("") method once i am done
with the sending of the message. Then again when this method gets
called, the same sequence of events happen. Wouldnt it clear the
allocated memory once i do a msgHeader.str("")? How do reserving
essentially help in this scenario?

To clear the string use clear() instead, that is what it is meant for.
clear() will not affect the capacity of the string so if you do
something like

std::string str;
str.reserve(100);
str.clear();

you will still be able to put 100 characters into the string before it
needs to reallocate.

Of course, if msgHeader is declared in the function that gets called it
will go out of scope when the function returns and will be reallocated
when it is called again, in which case a new string will be constructed
in which case the operations on the string will have not effect over two
different calls. If msgHeader on the other hand is external to the
function then you will probably benefit from using reserve. BTW, when
calling reserve() with an argument that is smaller than or equal to the
current capacity no action is taken.
 
B

Bala

To clear the string use clear() instead, that is what it is meant for.
clear() will not affect the capacity of the string so if you do
something like

std::string str;
str.reserve(100);
str.clear();

you will still be able to put 100 characters into the string before it
needs to reallocate.

Of course, if msgHeader is declared in the function that gets called it
will go out of scope when the function returns and will be reallocated
when it is called again, in which case a new string will be constructed
in which case the operations on the string will have not effect over two
different calls. If msgHeader on the other hand is external to the
function then you will probably benefit from using reserve. BTW, when
calling reserve() with an argument that is smaller than or equal to the
current capacity no action is taken.

msgHeader is local to the function. And the maximum size would not be
more than a 100 bytes. So I plan to modify my code to use reserve and
clear as you suggested and will try a hand on the performance. I hope
it helps.

BTW, a general question.
If i dont use reserve and my function looks somewhat like this below

void somefunction ()
{
std::string msgHeader
msgHeader = "<";
msgHeader += a;
msgHeader += "><";

msgHeader += b;
msgHeader += "><";

msgHeader += c;
msgHeader += ">";
}

How is the actual memory allocation done? My understanding is that
the string library tries to reallocate memory on every statement.
That is, initially when it finds the statement "msgHeader = "<";", it
allocates say 1 byte to the msgHeader.
Then at the next statement it reallocates msgHeader as sizeof (a) +
current memory of msgHeader and so on.
Is this correct? If yes, then I am sure using reserve would improve
the performance dramatically.

Thanks,
Bala
 
G

Guest

msgHeader is local to the function. And the maximum size would not be
more than a 100 bytes. So I plan to modify my code to use reserve and
clear as you suggested and will try a hand on the performance. I hope
it helps.

BTW, a general question.
If i dont use reserve and my function looks somewhat like this below

void somefunction ()
{
std::string msgHeader
msgHeader = "<";
msgHeader += a;
msgHeader += "><";

msgHeader += b;
msgHeader += "><";

msgHeader += c;
msgHeader += ">";
}

How is the actual memory allocation done? My understanding is that
the string library tries to reallocate memory on every statement.
That is, initially when it finds the statement "msgHeader = "<";", it
allocates say 1 byte to the msgHeader.
Then at the next statement it reallocates msgHeader as sizeof (a) +
current memory of msgHeader and so on.
Is this correct? If yes, then I am sure using reserve would improve
the performance dramatically.

I do not know, and I do not think the standard says anything about it.
But a good implementation will probably use a resizing scheme similar to
the one used for vectors, such as (at least) doubling the capacity every
time it resizes.
 
J

James Kanze

Of course, if msgHeader is declared in the function that gets called it
will go out of scope when the function returns and will be reallocated
when it is called again, in which case a new string will be constructed
in which case the operations on the string will have not effect over two
different calls. If msgHeader on the other hand is external to the
function then you will probably benefit from using reserve. BTW, when
calling reserve() with an argument that is smaller than or equal to the
current capacity no action is taken.

If you (re-)use a string with static lifetime, you probably
don't need reserve. It will very quickly reach the capacity of
the largest header, and never shrink.

In my own work, I tend to use std::vector<char> a lot for this
sort of thing, using ostrstream (initialized to use the space in
the vector) for formatting. My own experience is that the
implementations of std::vector tend to be better optimized that
those of std::string, and you have a lot more guarantees
concerning the allocation strategy. In my case, this is rather
simple, since I am dealing with fixed length records and fields
(so something like:

size_t pos = v.size() ;
v.resize( v.size() + fieldSize ) ;
ostrstream formatter( &v[0] + pos, fieldSize ) ;
formatter << ... ;

works perfectly). But it's something that may be worth
considering. (If called with arguments, ostrstream will format
directly in place, with no dynamic allocation.)

Although not currently guaranteed, something similar using
std::string will actually work with all current implementations,
and will be guaranteed in the next version of the standard.
 
J

James Kanze

[...]
If i dont use reserve and my function looks somewhat like this below
void somefunction ()
{
std::string msgHeader
msgHeader = "<";
msgHeader += a;
msgHeader += "><";
msgHeader += b;
msgHeader += "><";
msgHeader += c;
msgHeader += ">";
}
How is the actual memory allocation done?

However the implementation wants. There are no real
requirements.

In practice, I think most implementations today do something
similar to what they do in std::vector (which requires some sort
of exponential growth strategy). Many implementations also use
the small string optimization---there is no dynamic allocation
whatsoever if the string is small enough (typically something
between 8 and 32 bytes).
My understanding is that the string library tries to
reallocate memory on every statement.

It might, but it probably doesn't.

You can find out by tracing the capacity of the string after
each +=. (If the capacity of the empty string immediatly after
construction is greater than 0, then the implementation probably
uses the small string optimization.)
That is, initially when it finds the statement "msgHeader = "<";", it
allocates say 1 byte to the msgHeader.
Then at the next statement it reallocates msgHeader as sizeof (a) +
current memory of msgHeader and so on.
Is this correct? If yes, then I am sure using reserve would improve
the performance dramatically.

Anytime you can set a reasonable maximum for the length, reserve
is likely to help. How much depends largely on the
implementation, however.
 
J

James Kanze

[...]
I do not know, and I do not think the standard says anything
about it. But a good implementation will probably use a
resizing scheme similar to the one used for vectors, such as
(at least) doubling the capacity every time it resizes.

Doubling is actually not a very good strategy; multiplying by
say 1.5 is considerably better. (As a general rule, the
multiplier should be less that (1+sqrt(5))/2---about 1.6. 1.5
is close enough, and easy to calculate.) In memory tight
situations, of course, the multiplier should be even smaller.

The original STL implementation did use 2, and I suspect that
many implementations still do, even though we now know that it
isn't such a good idea.
 
R

Roland Pibinger

How is the actual memory allocation done? My understanding is that
the string library tries to reallocate memory on every statement.
That is, initially when it finds the statement "msgHeader =3D "<";", it
allocates say 1 byte to the msgHeader.

Probably yes.
Then at the next statement it reallocates msgHeader as sizeof (a) +
current memory of msgHeader and so on.
Is this correct? If yes, then I am sure using reserve would improve
the performance dramatically.

Probably for string but you cannot call reserve() for ostringstream.
Both std::string and std::eek:stringstream are not meant to be utilized
"extensively". Consider to use a library for writing XML, e.g.
http://www.tbray.org/ongoing/When/200x/2004/02/20/GenxStatus
 
A

Alf P. Steinbach

* James Kanze:
[...]
I do not know, and I do not think the standard says anything
about it. But a good implementation will probably use a
resizing scheme similar to the one used for vectors, such as
(at least) doubling the capacity every time it resizes.

Doubling is actually not a very good strategy; multiplying by
say 1.5 is considerably better. (As a general rule, the
multiplier should be less that (1+sqrt(5))/2---about 1.6. 1.5
is close enough, and easy to calculate.) In memory tight
situations, of course, the multiplier should be even smaller.

The original STL implementation did use 2, and I suspect that
many implementations still do, even though we now know that it
isn't such a good idea.

Could you elaborate, specifically on where the golden ratio enters the
picture?

It seems like a number drawn out of thin air, like e.g. a factor 1.7 for
a hash table.

Generally it's an optimization question, and the answer for optimization
is to measure if it matters.

Cheers,

- Alf
 
B

Bala

* James Kanze:




[...]
I do not know, and I do not think the standard says anything
about it. But a good implementation will probably use a
resizing scheme similar to the one used for vectors, such as
(at least) doubling the capacity every time it resizes.
Doubling is actually not a very good strategy; multiplying by
say 1.5 is considerably better. (As a general rule, the
multiplier should be less that (1+sqrt(5))/2---about 1.6. 1.5
is close enough, and easy to calculate.) In memory tight
situations, of course, the multiplier should be even smaller.
The original STL implementation did use 2, and I suspect that
many implementations still do, even though we now know that it
isn't such a good idea.

Could you elaborate, specifically on where the golden ratio enters the
picture?

It seems like a number drawn out of thin air, like e.g. a factor 1.7 for
a hash table.

Generally it's an optimization question, and the answer for optimization
is to measure if it matters.

Cheers,

- Alf

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?- Hide quoted text -

- Show quoted text -

Yeah it matters because the process uses 1 full CPU of 8 (12.5%).
Every message that it gets on its socket, it creates this kind of a
header and forwards it to a library for further processing.

The rates that we supplied were around a 100,000 messages per second
and the requirement is that it performs with say maximum 60%
utilization instead of 100%.
 
T

Tadeusz Kopec

James said:
If you (re-)use a string with static lifetime, you probably
don't need reserve. It will very quickly reach the capacity of
the largest header, and never shrink.

On the other hand the function won't be thread safe.
 
B

Bala

On the other hand the function won't be thread safe.

Yes you are right, it wont be thread safe. In this current scenario, i
am not using it with a static lifetime just to avoid locking and
unlocking while creating the headers, hence its an automatic
variable. In the meanwhile, im gettin myself tuned to run the test
with the suggested changes for reserve. Hoping that it helps, because
cant think of anything more as a solution at this moment.
 
J

Jim Langston

[...]
I do not know, and I do not think the standard says anything
about it. But a good implementation will probably use a
resizing scheme similar to the one used for vectors, such as
(at least) doubling the capacity every time it resizes.

Doubling is actually not a very good strategy; multiplying by
say 1.5 is considerably better. (As a general rule, the
multiplier should be less that (1+sqrt(5))/2---about 1.6. 1.5
is close enough, and easy to calculate.) In memory tight
situations, of course, the multiplier should be even smaller.

The original STL implementation did use 2, and I suspect that
many implementations still do, even though we now know that it
isn't such a good idea.

=====

I deciced to test my implementation so I wrote this:

#include <iostream>
#include <string>

int main()
{
std::string Foo;

std::string::size_type LastCapacity = Foo.capacity();
std::cout << "Initial Capacity:" << LastCapacity << "\n";
for ( int i = 0; i < 100; ++i )
{
Foo += "x";
if ( Foo.capacity() != LastCapacity )
{
std::cout << "Size:" << Foo.size() << " " << "Capacity:" <<
Foo.capacity() << "\n";
LastCapacity = Foo.capacity();
}
}
}

The output for my system is:

Initial Capacity:15
Size:16 Capacity:31
Size:32 Capacity:47
Size:48 Capacity:70
Size:71 Capacity:105

So as we can see, on my system if I did not initially .reserve() and added 1
character at a time then I would wind up with 4 extra memory reallocations.

I'm on Windows XP with Microsoft Visual C++ .net 2005. Not sure that the OS
matters, just the compiler.
 
G

Guest

[...]
I do not know, and I do not think the standard says anything
about it. But a good implementation will probably use a
resizing scheme similar to the one used for vectors, such as
(at least) doubling the capacity every time it resizes.

Doubling is actually not a very good strategy; multiplying by
say 1.5 is considerably better. (As a general rule, the
multiplier should be less that (1+sqrt(5))/2---about 1.6. 1.5
is close enough, and easy to calculate.) In memory tight
situations, of course, the multiplier should be even smaller.

The original STL implementation did use 2, and I suspect that
many implementations still do, even though we now know that it
isn't such a good idea.

=====

I deciced to test my implementation so I wrote this:

#include <iostream>
#include <string>

int main()
{
std::string Foo;

std::string::size_type LastCapacity = Foo.capacity();
std::cout << "Initial Capacity:" << LastCapacity << "\n";
for ( int i = 0; i < 100; ++i )
{
Foo += "x";
if ( Foo.capacity() != LastCapacity )
{
std::cout << "Size:" << Foo.size() << " " << "Capacity:" <<
Foo.capacity() << "\n";
LastCapacity = Foo.capacity();
}
}
}

The output for my system is:

Initial Capacity:15
Size:16 Capacity:31
Size:32 Capacity:47
Size:48 Capacity:70
Size:71 Capacity:105

So as we can see, on my system if I did not initially .reserve() and added 1
character at a time then I would wind up with 4 extra memory reallocations.

And for those too lazy to do the math themselves the numbers means that
the capacity is multiplied by 1.5 on each resize.

To James Kanze: I too would be interested in hearing about the source of
the "less than 1.6 multiplication" rule. I have tried to google for it
but I do not even know about where to start.
 
J

Jim Langston

Erik Wikström said:
[...]
I do not know, and I do not think the standard says anything
about it. But a good implementation will probably use a
resizing scheme similar to the one used for vectors, such as
(at least) doubling the capacity every time it resizes.

Doubling is actually not a very good strategy; multiplying by
say 1.5 is considerably better. (As a general rule, the
multiplier should be less that (1+sqrt(5))/2---about 1.6. 1.5
is close enough, and easy to calculate.) In memory tight
situations, of course, the multiplier should be even smaller.

The original STL implementation did use 2, and I suspect that
many implementations still do, even though we now know that it
isn't such a good idea.

=====

I deciced to test my implementation so I wrote this:

#include <iostream>
#include <string>

int main()
{
std::string Foo;

std::string::size_type LastCapacity = Foo.capacity();
std::cout << "Initial Capacity:" << LastCapacity << "\n";
for ( int i = 0; i < 100; ++i )
{
Foo += "x";
if ( Foo.capacity() != LastCapacity )
{
std::cout << "Size:" << Foo.size() << " " << "Capacity:" <<
Foo.capacity() << "\n";
LastCapacity = Foo.capacity();
}
}
}

The output for my system is:

Initial Capacity:15
Size:16 Capacity:31
Size:32 Capacity:47
Size:48 Capacity:70
Size:71 Capacity:105

So as we can see, on my system if I did not initially .reserve() and
added 1
character at a time then I would wind up with 4 extra memory
reallocations.

And for those too lazy to do the math themselves the numbers means that
the capacity is multiplied by 1.5 on each resize.

To James Kanze: I too would be interested in hearing about the source of
the "less than 1.6 multiplication" rule. I have tried to google for it
but I do not even know about where to start.

It was explained to me once. Consider a string with an initial capacity of
100. There are 100 bytes of memory allocated. Then if you double it, there
is a hole in the first 100 bytes, and the next 200 bytes are allocated. Now
you double it again to 400. 100 is not enough so it can't use the hole, so
it allocates 400 bytes,leaving a 300 byte hole. Double again, 800, 300 is
not enough, it allcoates 800, leaving a 700 byte hole. As we can see, it
can never reuse the memory from the previous allocations since they have to
be continguous.

So that's where the 1.6 comes in. That with future allocations eventually
the system will be able to reuse previous memory allocated for the string.
 
B

Bala

[...]
I do not know, and I do not think the standard says anything
about it. But a good implementation will probably use a
resizing scheme similar to the one used for vectors, such as
(at least) doubling the capacity every time it resizes.
Doubling is actually not a very good strategy; multiplying by
say 1.5 is considerably better. (As a general rule, the
multiplier should be less that (1+sqrt(5))/2---about 1.6. 1.5
is close enough, and easy to calculate.) In memory tight
situations, of course, the multiplier should be even smaller.
The original STL implementation did use 2, and I suspect that
many implementations still do, even though we now know that it
isn't such a good idea.
=====
I deciced to test my implementation so I wrote this:
#include <iostream>
#include <string>
int main()
{
std::string Foo;
std::string::size_type LastCapacity = Foo.capacity();
std::cout << "Initial Capacity:" << LastCapacity << "\n";
for ( int i = 0; i < 100; ++i )
{
Foo += "x";
if ( Foo.capacity() != LastCapacity )
{
std::cout << "Size:" << Foo.size() << " " << "Capacity:" <<
Foo.capacity() << "\n";
LastCapacity = Foo.capacity();
}
}
}
The output for my system is:
Initial Capacity:15
Size:16 Capacity:31
Size:32 Capacity:47
Size:48 Capacity:70
Size:71 Capacity:105
So as we can see, on my system if I did not initially .reserve() and
added 1
character at a time then I would wind up with 4 extra memory
reallocations.
And for those too lazy to do the math themselves the numbers means that
the capacity is multiplied by 1.5 on each resize.
To James Kanze: I too would be interested in hearing about the source of
the "less than 1.6 multiplication" rule. I have tried to google for it
but I do not even know about where to start.

It was explained to me once. Consider a string with an initial capacity of
100. There are 100 bytes of memory allocated. Then if you double it, there
is a hole in the first 100 bytes, and the next 200 bytes are allocated. Now
you double it again to 400. 100 is not enough so it can't use the hole, so
it allocates 400 bytes,leaving a 300 byte hole. Double again, 800, 300 is
not enough, it allcoates 800, leaving a 700 byte hole. As we can see, it
can never reuse the memory from the previous allocations since they have to
be continguous.

So that's where the 1.6 comes in. That with future allocations eventually
the system will be able to reuse previous memory allocated for the string..- Hide quoted text -

- Show quoted text -

I wrote a similar test application on Solaris 10 Sparc with g++3.4.3.
It clearly shows, that the reserve thing is surely going to help my
performance.

Thanks a ton for your inputs. Will let you know once the test is
successful.

I started appending 2 characters a time in a loop of 10.

This is the output without reserve

Start With :0
Current Size:2 Current Capacity:2
Current Size:4 Current Capacity:4
Current Size:6 Current Capacity:6
Current Size:8 Current Capacity:8
Current Size:10 Current Capacity:10
Current Size:12 Current Capacity:12
Current Size:14 Current Capacity:14
Current Size:16 Current Capacity:16
Current Size:18 Current Capacity:18
Current Size:20 Current Capacity:20

This is the output with reserve of 20

Start With :20

Thanks,
Bala
 
T

Tristan Wibberley

It was explained to me once. Consider a string with an initial capacity of
100. There are 100 bytes of memory allocated. Then if you double it, there
is a hole in the first 100 bytes, and the next 200 bytes are allocated. Now
you double it again to 400. 100 is not enough so it can't use the hole, so
it allocates 400 bytes,leaving a 300 byte hole. Double again, 800, 300 is
not enough, it allcoates 800, leaving a 700 byte hole. As we can see, it
can never reuse the memory from the previous allocations since they have to
be continguous.

So that's where the 1.6 comes in. That with future allocations eventually
the system will be able to reuse previous memory allocated for the string.

Sounds like something Knuth would have noticed, but I can't find it in
TAoCP. Recognising its Knuthiness and the use of the golden ratio, it
looks like a Fibonacci thing:

+-+
|X|
+-+

+---+-+
|X X| |
+---+-+

+-----+
|X X X|
+-----+

+---------+-----+
|X X X X X| |
+---------+-----+

+---------------+
|X X X X X X X X|
+---------------+

+-------------------------+---------------+
|X X X X X X X X X X X X X |
+-------------------------+---------------+

+-----------------------------------------+
|X X X X X X X X X X X X X X X X X X X X X|
+-----------------------------------------+

But I wonder how useful this is in practise. It seems to be a real
best-case thing - if you can't merely extend, then get the next
Fibonacci number size *before* the previous allocation and copy there.
If you don't happen to get the space before the current storage, then
you still get fragmentation - although I suppose being *strictly*
Fibonacci probably makes that average out to be the lowest of a general
purpose system as adjacent free blocks will tend to support Fibonacci
sized allocations more often.

The ratio of adjacent pairs in the Fibonacci sequence of course
approaches the golden ratio as the noise due to being rational
approaches zero. Being that memory allocations must have integer size -
would you be better storing the previous capacity and adding the current
capacity to determine the next one - and *really* go Fibonacci?

--
Tristan Wibberley

Any opinion expressed is mine (or else I'm playing devils advocate for
the sake of a good argument). My employer had nothing to do with this
communication.
 
J

Jim Langston

[...]
I do not know, and I do not think the standard says anything
about it. But a good implementation will probably use a
resizing scheme similar to the one used for vectors, such as
(at least) doubling the capacity every time it resizes.
Doubling is actually not a very good strategy; multiplying by
say 1.5 is considerably better. (As a general rule, the
multiplier should be less that (1+sqrt(5))/2---about 1.6. 1.5
is close enough, and easy to calculate.) In memory tight
situations, of course, the multiplier should be even smaller.
The original STL implementation did use 2, and I suspect that
many implementations still do, even though we now know that it
isn't such a good idea.
=====
I deciced to test my implementation so I wrote this:
#include <iostream>
#include <string>
int main()
{
std::string Foo;
std::string::size_type LastCapacity = Foo.capacity();
std::cout << "Initial Capacity:" << LastCapacity << "\n";
for ( int i = 0; i < 100; ++i )
{
Foo += "x";
if ( Foo.capacity() != LastCapacity )
{
std::cout << "Size:" << Foo.size() << " " << "Capacity:" <<
Foo.capacity() << "\n";
LastCapacity = Foo.capacity();
}
}
}
The output for my system is:
Initial Capacity:15
Size:16 Capacity:31
Size:32 Capacity:47
Size:48 Capacity:70
Size:71 Capacity:105
So as we can see, on my system if I did not initially .reserve() and
added 1
character at a time then I would wind up with 4 extra memory
reallocations.
And for those too lazy to do the math themselves the numbers means that
the capacity is multiplied by 1.5 on each resize.
To James Kanze: I too would be interested in hearing about the source of
the "less than 1.6 multiplication" rule. I have tried to google for it
but I do not even know about where to start.

It was explained to me once. Consider a string with an initial capacity
of
100. There are 100 bytes of memory allocated. Then if you double it,
there
is a hole in the first 100 bytes, and the next 200 bytes are allocated.
Now
you double it again to 400. 100 is not enough so it can't use the hole,
so
it allocates 400 bytes,leaving a 300 byte hole. Double again, 800, 300 is
not enough, it allcoates 800, leaving a 700 byte hole. As we can see, it
can never reuse the memory from the previous allocations since they have
to
be continguous.

So that's where the 1.6 comes in. That with future allocations eventually
the system will be able to reuse previous memory allocated for the
string.- Hide quoted text -

- Show quoted text -

I wrote a similar test application on Solaris 10 Sparc with g++3.4.3.
It clearly shows, that the reserve thing is surely going to help my
performance.

Thanks a ton for your inputs. Will let you know once the test is
successful.

I started appending 2 characters a time in a loop of 10.

This is the output without reserve

Start With :0
Current Size:2 Current Capacity:2
Current Size:4 Current Capacity:4
Current Size:6 Current Capacity:6
Current Size:8 Current Capacity:8
Current Size:10 Current Capacity:10
Current Size:12 Current Capacity:12
Current Size:14 Current Capacity:14
Current Size:16 Current Capacity:16
Current Size:18 Current Capacity:18
Current Size:20 Current Capacity:20

This is the output with reserve of 20

Start With :20

Thanks,
Bala

======================

OUCH! No wonder you are having performance issues! Your platform isn't
preallocating ANY extra space! This means every time you add a character it
has to reallocate memory. Very bad. Yes, in your case .reserve() should
help TREMENDOUSLY.

That, in my opionion, is a very bad implementation of std::string.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,050
Latest member
AngelS122

Latest Threads

Top