std::string and std::ostringstream performances

Discussion in 'C++' started by Bala2508, Oct 31, 2007.

  1. Bala2508

    Bala2508 Guest

    Hi,

    I have a C++ application that extensively uses std::string and
    std::eek:stringstream in somewhat similar manner as below

    std::string msgHeader;

    msgHeader = "<";
    msgHeader += a;
    msgHeader += "><";

    msgHeader += b;
    msgHeader += "><";

    msgHeader += c;
    msgHeader += ">";

    Similarly it uses ostringstream as well and the function that uses
    this gets called almost on every message that my application gets on
    the socket. I am using this to precisely construct a XML Message to
    be sent to another application.

    What we observed when we ran a collect/analyzer on the application is
    that it shows majority of the CPU spent in trying to deal with these 2
    datatypes, their memory allocation using std::allocator and other
    stuff. The CPU goes as high as 100% sometimes.

    I would like to get an advice/suggestion on the following points
    1. Is there a better way to use std::string / std::eek:stringstream than
    the way I have been using it?
    2. AM I using the wrong datatype for such kind of operations and
    should move on to use something else? Any suggestions what the
    datatype should be?

    I eventually need these datatypes because the external library that I
    am using to send this data out needs it in std::string /
    std::eek:stringstream formats.

    Would like to have some suggestions to bring down the CPU utilization.

    Thanks,
    Bala
    Bala2508, Oct 31, 2007
    #1
    1. Advertising

  2. Bala2508

    Jim Langston Guest

    "Bala2508" <> wrote in message
    news:...
    > Hi,
    >
    > I have a C++ application that extensively uses std::string and
    > std::eek:stringstream in somewhat similar manner as below
    >
    > std::string msgHeader;
    >
    > msgHeader = "<";
    > msgHeader += a;
    > msgHeader += "><";
    >
    > msgHeader += b;
    > msgHeader += "><";
    >
    > msgHeader += c;
    > msgHeader += ">";
    >
    > Similarly it uses ostringstream as well and the function that uses
    > this gets called almost on every message that my application gets on
    > the socket. I am using this to precisely construct a XML Message to
    > be sent to another application.
    >
    > What we observed when we ran a collect/analyzer on the application is
    > that it shows majority of the CPU spent in trying to deal with these 2
    > datatypes, their memory allocation using std::allocator and other
    > stuff. The CPU goes as high as 100% sometimes.
    >
    > I would like to get an advice/suggestion on the following points
    > 1. Is there a better way to use std::string / std::eek:stringstream than
    > the way I have been using it?
    > 2. AM I using the wrong datatype for such kind of operations and
    > should move on to use something else? Any suggestions what the
    > datatype should be?
    >
    > I eventually need these datatypes because the external library that I
    > am using to send this data out needs it in std::string /
    > std::eek:stringstream formats.
    >
    > Would like to have some suggestions to bring down the CPU utilization.


    One suggestion would be .reserve(). I E.
    std::string msgHeader;
    msgHeader.reserve( 100 );

    That way the string msgHeader wouldn't need to try to allocate more memory
    until it has used the initial 100 characters allocated. Some compilers are
    better at preallocating a default number of bytes than others. Sometimes
    they have to be given a hint. Figure out a good size to reserve (one big
    enough where you won't need to be doing reallocatings, one small enough that
    you're not running out of memory) and then try profiling it again and see if
    it helps.
    Jim Langston, Oct 31, 2007
    #2
    1. Advertising

  3. Bala2508

    Bala Guest

    On Oct 31, 12:09 pm, "Jim Langston" <> wrote:
    > "Bala2508" <> wrote in message
    >
    > news:...
    >
    >
    >
    >
    >
    > > Hi,

    >
    > > I have a C++ application that extensively uses std::string and
    > > std::eek:stringstream in somewhat similar manner as below

    >
    > > std::string msgHeader;

    >
    > > msgHeader = "<";
    > > msgHeader += a;
    > > msgHeader += "><";

    >
    > > msgHeader += b;
    > > msgHeader += "><";

    >
    > > msgHeader += c;
    > > msgHeader += ">";

    >
    > > Similarly it uses ostringstream as well and the function that uses
    > > this gets called almost on every message that my application gets on
    > > the socket. I am using this to precisely construct a XML Message to
    > > be sent to another application.

    >
    > > What we observed when we ran a collect/analyzer on the application is
    > > that it shows majority of the CPU spent in trying to deal with these 2
    > > datatypes, their memory allocation using std::allocator and other
    > > stuff. The CPU goes as high as 100% sometimes.

    >
    > > I would like to get an advice/suggestion on the following points
    > > 1. Is there a better way to use std::string / std::eek:stringstream than
    > > the way I have been using it?
    > > 2. AM I using the wrong datatype for such kind of operations and
    > > should move on to use something else? Any suggestions what the
    > > datatype should be?

    >
    > > I eventually need these datatypes because the external library that I
    > > am using to send this data out needs it in std::string /
    > > std::eek:stringstream formats.

    >
    > > Would like to have some suggestions to bring down the CPU utilization.

    >
    > One suggestion would be .reserve(). I E.
    > std::string msgHeader;
    > msgHeader.reserve( 100 );
    >
    > That way the string msgHeader wouldn't need to try to allocate more memory
    > until it has used the initial 100 characters allocated. Some compilers are
    > better at preallocating a default number of bytes than others. Sometimes
    > they have to be given a hint. Figure out a good size to reserve (one big
    > enough where you won't need to be doing reallocatings, one small enough that
    > you're not running out of memory) and then try profiling it again and see if
    > it helps.- Hide quoted text -
    >
    > - Show quoted text -


    I also clear the string using msgHeader.str("") method once i am done
    with the sending of the message. Then again when this method gets
    called, the same sequence of events happen. Wouldnt it clear the
    allocated memory once i do a msgHeader.str("")? How do reserving
    essentially help in this scenario?
    Bala, Oct 31, 2007
    #3
  4. On 2007-10-31 19:48, Bala wrote:
    > On Oct 31, 12:09 pm, "Jim Langston" <> wrote:
    >> "Bala2508" <> wrote in message
    >>
    >> news:...
    >>
    >>
    >>
    >>
    >>
    >> > Hi,

    >>
    >> > I have a C++ application that extensively uses std::string and
    >> > std::eek:stringstream in somewhat similar manner as below

    >>
    >> > std::string msgHeader;

    >>
    >> > msgHeader = "<";
    >> > msgHeader += a;
    >> > msgHeader += "><";

    >>
    >> > msgHeader += b;
    >> > msgHeader += "><";

    >>
    >> > msgHeader += c;
    >> > msgHeader += ">";

    >>
    >> > Similarly it uses ostringstream as well and the function that uses
    >> > this gets called almost on every message that my application gets on
    >> > the socket. I am using this to precisely construct a XML Message to
    >> > be sent to another application.

    >>
    >> > What we observed when we ran a collect/analyzer on the application is
    >> > that it shows majority of the CPU spent in trying to deal with these 2
    >> > datatypes, their memory allocation using std::allocator and other
    >> > stuff. The CPU goes as high as 100% sometimes.

    >>
    >> > I would like to get an advice/suggestion on the following points
    >> > 1. Is there a better way to use std::string / std::eek:stringstream than
    >> > the way I have been using it?
    >> > 2. AM I using the wrong datatype for such kind of operations and
    >> > should move on to use something else? Any suggestions what the
    >> > datatype should be?

    >>
    >> > I eventually need these datatypes because the external library that I
    >> > am using to send this data out needs it in std::string /
    >> > std::eek:stringstream formats.

    >>
    >> > Would like to have some suggestions to bring down the CPU utilization.

    >>
    >> One suggestion would be .reserve(). I E.
    >> std::string msgHeader;
    >> msgHeader.reserve( 100 );
    >>
    >> That way the string msgHeader wouldn't need to try to allocate more memory
    >> until it has used the initial 100 characters allocated. Some compilers are
    >> better at preallocating a default number of bytes than others. Sometimes
    >> they have to be given a hint. Figure out a good size to reserve (one big
    >> enough where you won't need to be doing reallocatings, one small enough that
    >> you're not running out of memory) and then try profiling it again and see if
    >> it helps.- Hide quoted text -
    >>

    > I also clear the string using msgHeader.str("") method once i am done
    > with the sending of the message. Then again when this method gets
    > called, the same sequence of events happen. Wouldnt it clear the
    > allocated memory once i do a msgHeader.str("")? How do reserving
    > essentially help in this scenario?


    To clear the string use clear() instead, that is what it is meant for.
    clear() will not affect the capacity of the string so if you do
    something like

    std::string str;
    str.reserve(100);
    str.clear();

    you will still be able to put 100 characters into the string before it
    needs to reallocate.

    Of course, if msgHeader is declared in the function that gets called it
    will go out of scope when the function returns and will be reallocated
    when it is called again, in which case a new string will be constructed
    in which case the operations on the string will have not effect over two
    different calls. If msgHeader on the other hand is external to the
    function then you will probably benefit from using reserve. BTW, when
    calling reserve() with an argument that is smaller than or equal to the
    current capacity no action is taken.

    --
    Erik Wikström
    =?UTF-8?B?RXJpayBXaWtzdHLDtm0=?=, Oct 31, 2007
    #4
  5. Bala2508

    Bala Guest

    On Oct 31, 3:21 pm, Erik Wikström <> wrote:
    > On 2007-10-31 19:48, Bala wrote:
    >
    >
    >
    >
    >
    > > On Oct 31, 12:09 pm, "Jim Langston" <> wrote:
    > >> "Bala2508" <> wrote in message

    >
    > >>news:...

    >
    > >> > Hi,

    >
    > >> > I have a C++ application that extensively uses std::string and
    > >> > std::eek:stringstream in somewhat similar manner as below

    >
    > >> > std::string msgHeader;

    >
    > >> > msgHeader = "<";
    > >> > msgHeader += a;
    > >> > msgHeader += "><";

    >
    > >> > msgHeader += b;
    > >> > msgHeader += "><";

    >
    > >> > msgHeader += c;
    > >> > msgHeader += ">";

    >
    > >> > Similarly it uses ostringstream as well and the function that uses
    > >> > this gets called almost on every message that my application gets on
    > >> > the socket. I am using this to precisely construct a XML Message to
    > >> > be sent to another application.

    >
    > >> > What we observed when we ran a collect/analyzer on the application is
    > >> > that it shows majority of the CPU spent in trying to deal with these 2
    > >> > datatypes, their memory allocation using std::allocator and other
    > >> > stuff. The CPU goes as high as 100% sometimes.

    >
    > >> > I would like to get an advice/suggestion on the following points
    > >> > 1. Is there a better way to use std::string / std::eek:stringstream than
    > >> > the way I have been using it?
    > >> > 2. AM I using the wrong datatype for such kind of operations and
    > >> > should move on to use something else? Any suggestions what the
    > >> > datatype should be?

    >
    > >> > I eventually need these datatypes because the external library that I
    > >> > am using to send this data out needs it in std::string /
    > >> > std::eek:stringstream formats.

    >
    > >> > Would like to have some suggestions to bring down the CPU utilization.

    >
    > >> One suggestion would be .reserve(). I E.
    > >> std::string msgHeader;
    > >> msgHeader.reserve( 100 );

    >
    > >> That way the string msgHeader wouldn't need to try to allocate more memory
    > >> until it has used the initial 100 characters allocated. Some compilers are
    > >> better at preallocating a default number of bytes than others. Sometimes
    > >> they have to be given a hint. Figure out a good size to reserve (one big
    > >> enough where you won't need to be doing reallocatings, one small enough that
    > >> you're not running out of memory) and then try profiling it again and see if
    > >> it helps.- Hide quoted text -

    >
    > > I also clear the string using msgHeader.str("") method once i am done
    > > with the sending of the message. Then again when this method gets
    > > called, the same sequence of events happen. Wouldnt it clear the
    > > allocated memory once i do a msgHeader.str("")? How do reserving
    > > essentially help in this scenario?

    >
    > To clear the string use clear() instead, that is what it is meant for.
    > clear() will not affect the capacity of the string so if you do
    > something like
    >
    > std::string str;
    > str.reserve(100);
    > str.clear();
    >
    > you will still be able to put 100 characters into the string before it
    > needs to reallocate.
    >
    > Of course, if msgHeader is declared in the function that gets called it
    > will go out of scope when the function returns and will be reallocated
    > when it is called again, in which case a new string will be constructed
    > in which case the operations on the string will have not effect over two
    > different calls. If msgHeader on the other hand is external to the
    > function then you will probably benefit from using reserve. BTW, when
    > calling reserve() with an argument that is smaller than or equal to the
    > current capacity no action is taken.
    >
    > --
    > Erik Wikström- Hide quoted text -
    >
    > - Show quoted text -


    msgHeader is local to the function. And the maximum size would not be
    more than a 100 bytes. So I plan to modify my code to use reserve and
    clear as you suggested and will try a hand on the performance. I hope
    it helps.

    BTW, a general question.
    If i dont use reserve and my function looks somewhat like this below

    void somefunction ()
    {
    std::string msgHeader
    msgHeader = "<";
    msgHeader += a;
    msgHeader += "><";

    msgHeader += b;
    msgHeader += "><";

    msgHeader += c;
    msgHeader += ">";
    }

    How is the actual memory allocation done? My understanding is that
    the string library tries to reallocate memory on every statement.
    That is, initially when it finds the statement "msgHeader = "<";", it
    allocates say 1 byte to the msgHeader.
    Then at the next statement it reallocates msgHeader as sizeof (a) +
    current memory of msgHeader and so on.
    Is this correct? If yes, then I am sure using reserve would improve
    the performance dramatically.

    Thanks,
    Bala
    Bala, Oct 31, 2007
    #5
  6. On 2007-10-31 21:58, Bala wrote:
    > On Oct 31, 3:21 pm, Erik Wikström <> wrote:
    >> On 2007-10-31 19:48, Bala wrote:
    >>
    >>
    >>
    >>
    >>
    >> > On Oct 31, 12:09 pm, "Jim Langston" <> wrote:
    >> >> "Bala2508" <> wrote in message

    >>
    >> >>news:...

    >>
    >> >> > Hi,

    >>
    >> >> > I have a C++ application that extensively uses std::string and
    >> >> > std::eek:stringstream in somewhat similar manner as below

    >>
    >> >> > std::string msgHeader;

    >>
    >> >> > msgHeader = "<";
    >> >> > msgHeader += a;
    >> >> > msgHeader += "><";

    >>
    >> >> > msgHeader += b;
    >> >> > msgHeader += "><";

    >>
    >> >> > msgHeader += c;
    >> >> > msgHeader += ">";

    >>
    >> >> > Similarly it uses ostringstream as well and the function that uses
    >> >> > this gets called almost on every message that my application gets on
    >> >> > the socket. I am using this to precisely construct a XML Message to
    >> >> > be sent to another application.

    >>
    >> >> > What we observed when we ran a collect/analyzer on the application is
    >> >> > that it shows majority of the CPU spent in trying to deal with these 2
    >> >> > datatypes, their memory allocation using std::allocator and other
    >> >> > stuff. The CPU goes as high as 100% sometimes.

    >>
    >> >> > I would like to get an advice/suggestion on the following points
    >> >> > 1. Is there a better way to use std::string / std::eek:stringstream than
    >> >> > the way I have been using it?
    >> >> > 2. AM I using the wrong datatype for such kind of operations and
    >> >> > should move on to use something else? Any suggestions what the
    >> >> > datatype should be?

    >>
    >> >> > I eventually need these datatypes because the external library that I
    >> >> > am using to send this data out needs it in std::string /
    >> >> > std::eek:stringstream formats.

    >>
    >> >> > Would like to have some suggestions to bring down the CPU utilization.

    >>
    >> >> One suggestion would be .reserve(). I E.
    >> >> std::string msgHeader;
    >> >> msgHeader.reserve( 100 );

    >>
    >> >> That way the string msgHeader wouldn't need to try to allocate more memory
    >> >> until it has used the initial 100 characters allocated. Some compilers are
    >> >> better at preallocating a default number of bytes than others. Sometimes
    >> >> they have to be given a hint. Figure out a good size to reserve (one big
    >> >> enough where you won't need to be doing reallocatings, one small enough that
    >> >> you're not running out of memory) and then try profiling it again and see if
    >> >> it helps.- Hide quoted text -

    >>
    >> > I also clear the string using msgHeader.str("") method once i am done
    >> > with the sending of the message. Then again when this method gets
    >> > called, the same sequence of events happen. Wouldnt it clear the
    >> > allocated memory once i do a msgHeader.str("")? How do reserving
    >> > essentially help in this scenario?

    >>
    >> To clear the string use clear() instead, that is what it is meant for.
    >> clear() will not affect the capacity of the string so if you do
    >> something like
    >>
    >> std::string str;
    >> str.reserve(100);
    >> str.clear();
    >>
    >> you will still be able to put 100 characters into the string before it
    >> needs to reallocate.
    >>
    >> Of course, if msgHeader is declared in the function that gets called it
    >> will go out of scope when the function returns and will be reallocated
    >> when it is called again, in which case a new string will be constructed
    >> in which case the operations on the string will have not effect over two
    >> different calls. If msgHeader on the other hand is external to the
    >> function then you will probably benefit from using reserve. BTW, when
    >> calling reserve() with an argument that is smaller than or equal to the
    >> current capacity no action is taken.
    >>
    >> --
    >> Erik Wikström- Hide quoted text -
    >>
    >> - Show quoted text -

    >
    > msgHeader is local to the function. And the maximum size would not be
    > more than a 100 bytes. So I plan to modify my code to use reserve and
    > clear as you suggested and will try a hand on the performance. I hope
    > it helps.
    >
    > BTW, a general question.
    > If i dont use reserve and my function looks somewhat like this below
    >
    > void somefunction ()
    > {
    > std::string msgHeader
    > msgHeader = "<";
    > msgHeader += a;
    > msgHeader += "><";
    >
    > msgHeader += b;
    > msgHeader += "><";
    >
    > msgHeader += c;
    > msgHeader += ">";
    > }
    >
    > How is the actual memory allocation done? My understanding is that
    > the string library tries to reallocate memory on every statement.
    > That is, initially when it finds the statement "msgHeader = "<";", it
    > allocates say 1 byte to the msgHeader.
    > Then at the next statement it reallocates msgHeader as sizeof (a) +
    > current memory of msgHeader and so on.
    > Is this correct? If yes, then I am sure using reserve would improve
    > the performance dramatically.


    I do not know, and I do not think the standard says anything about it.
    But a good implementation will probably use a resizing scheme similar to
    the one used for vectors, such as (at least) doubling the capacity every
    time it resizes.

    --
    Erik Wikström
    =?UTF-8?B?RXJpayBXaWtzdHLDtm0=?=, Oct 31, 2007
    #6
  7. Bala2508

    James Kanze Guest

    On Oct 31, 8:21 pm, Erik Wikström <> wrote:
    [...]
    > Of course, if msgHeader is declared in the function that gets called it
    > will go out of scope when the function returns and will be reallocated
    > when it is called again, in which case a new string will be constructed
    > in which case the operations on the string will have not effect over two
    > different calls. If msgHeader on the other hand is external to the
    > function then you will probably benefit from using reserve. BTW, when
    > calling reserve() with an argument that is smaller than or equal to the
    > current capacity no action is taken.


    If you (re-)use a string with static lifetime, you probably
    don't need reserve. It will very quickly reach the capacity of
    the largest header, and never shrink.

    In my own work, I tend to use std::vector<char> a lot for this
    sort of thing, using ostrstream (initialized to use the space in
    the vector) for formatting. My own experience is that the
    implementations of std::vector tend to be better optimized that
    those of std::string, and you have a lot more guarantees
    concerning the allocation strategy. In my case, this is rather
    simple, since I am dealing with fixed length records and fields
    (so something like:

    size_t pos = v.size() ;
    v.resize( v.size() + fieldSize ) ;
    ostrstream formatter( &v[0] + pos, fieldSize ) ;
    formatter << ... ;

    works perfectly). But it's something that may be worth
    considering. (If called with arguments, ostrstream will format
    directly in place, with no dynamic allocation.)

    Although not currently guaranteed, something similar using
    std::string will actually work with all current implementations,
    and will be guaranteed in the next version of the standard.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Nov 1, 2007
    #7
  8. Bala2508

    James Kanze Guest

    On Oct 31, 9:58 pm, Bala <> wrote:
    > On Oct 31, 3:21 pm, Erik Wikström <> wrote:
    > > On 2007-10-31 19:48, Bala wrote:


    [...]
    > If i dont use reserve and my function looks somewhat like this below


    > void somefunction ()
    > {
    > std::string msgHeader
    > msgHeader = "<";
    > msgHeader += a;
    > msgHeader += "><";


    > msgHeader += b;
    > msgHeader += "><";


    > msgHeader += c;
    > msgHeader += ">";
    > }


    > How is the actual memory allocation done?


    However the implementation wants. There are no real
    requirements.

    In practice, I think most implementations today do something
    similar to what they do in std::vector (which requires some sort
    of exponential growth strategy). Many implementations also use
    the small string optimization---there is no dynamic allocation
    whatsoever if the string is small enough (typically something
    between 8 and 32 bytes).

    > My understanding is that the string library tries to
    > reallocate memory on every statement.


    It might, but it probably doesn't.

    You can find out by tracing the capacity of the string after
    each +=. (If the capacity of the empty string immediatly after
    construction is greater than 0, then the implementation probably
    uses the small string optimization.)

    > That is, initially when it finds the statement "msgHeader = "<";", it
    > allocates say 1 byte to the msgHeader.
    > Then at the next statement it reallocates msgHeader as sizeof (a) +
    > current memory of msgHeader and so on.
    > Is this correct? If yes, then I am sure using reserve would improve
    > the performance dramatically.


    Anytime you can set a reasonable maximum for the length, reserve
    is likely to help. How much depends largely on the
    implementation, however.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Nov 1, 2007
    #8
  9. Bala2508

    James Kanze Guest

    On Oct 31, 10:55 pm, Erik Wikström <> wrote:

    [...]
    > I do not know, and I do not think the standard says anything
    > about it. But a good implementation will probably use a
    > resizing scheme similar to the one used for vectors, such as
    > (at least) doubling the capacity every time it resizes.


    Doubling is actually not a very good strategy; multiplying by
    say 1.5 is considerably better. (As a general rule, the
    multiplier should be less that (1+sqrt(5))/2---about 1.6. 1.5
    is close enough, and easy to calculate.) In memory tight
    situations, of course, the multiplier should be even smaller.

    The original STL implementation did use 2, and I suspect that
    many implementations still do, even though we now know that it
    isn't such a good idea.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Nov 1, 2007
    #9
  10. On Wed, 31 Oct 2007 20:58:33 -0000, Bala wrote:
    >How is the actual memory allocation done? My understanding is that
    >the string library tries to reallocate memory on every statement.
    >That is, initially when it finds the statement "msgHeader =3D "<";", it
    >allocates say 1 byte to the msgHeader.


    Probably yes.

    >Then at the next statement it reallocates msgHeader as sizeof (a) +
    >current memory of msgHeader and so on.
    >Is this correct? If yes, then I am sure using reserve would improve
    >the performance dramatically.


    Probably for string but you cannot call reserve() for ostringstream.
    Both std::string and std::eek:stringstream are not meant to be utilized
    "extensively". Consider to use a library for writing XML, e.g.
    http://www.tbray.org/ongoing/When/200x/2004/02/20/GenxStatus


    --
    Roland Pibinger
    "The best software is simple, elegant, and full of drama" - Grady Booch
    Roland Pibinger, Nov 1, 2007
    #10
  11. * James Kanze:
    > On Oct 31, 10:55 pm, Erik Wikström <> wrote:
    >
    > [...]
    >> I do not know, and I do not think the standard says anything
    >> about it. But a good implementation will probably use a
    >> resizing scheme similar to the one used for vectors, such as
    >> (at least) doubling the capacity every time it resizes.

    >
    > Doubling is actually not a very good strategy; multiplying by
    > say 1.5 is considerably better. (As a general rule, the
    > multiplier should be less that (1+sqrt(5))/2---about 1.6. 1.5
    > is close enough, and easy to calculate.) In memory tight
    > situations, of course, the multiplier should be even smaller.
    >
    > The original STL implementation did use 2, and I suspect that
    > many implementations still do, even though we now know that it
    > isn't such a good idea.


    Could you elaborate, specifically on where the golden ratio enters the
    picture?

    It seems like a number drawn out of thin air, like e.g. a factor 1.7 for
    a hash table.

    Generally it's an optimization question, and the answer for optimization
    is to measure if it matters.

    Cheers,

    - Alf

    --
    A: Because it messes up the order in which people normally read text.
    Q: Why is it such a bad thing?
    A: Top-posting.
    Q: What is the most annoying thing on usenet and in e-mail?
    Alf P. Steinbach, Nov 1, 2007
    #11
  12. Bala2508

    Bala Guest

    On Nov 1, 8:37 am, "Alf P. Steinbach" <> wrote:
    > * James Kanze:
    >
    >
    >
    >
    >
    > > On Oct 31, 10:55 pm, Erik Wikström <> wrote:

    >
    > > [...]
    > >> I do not know, and I do not think the standard says anything
    > >> about it. But a good implementation will probably use a
    > >> resizing scheme similar to the one used for vectors, such as
    > >> (at least) doubling the capacity every time it resizes.

    >
    > > Doubling is actually not a very good strategy; multiplying by
    > > say 1.5 is considerably better. (As a general rule, the
    > > multiplier should be less that (1+sqrt(5))/2---about 1.6. 1.5
    > > is close enough, and easy to calculate.) In memory tight
    > > situations, of course, the multiplier should be even smaller.

    >
    > > The original STL implementation did use 2, and I suspect that
    > > many implementations still do, even though we now know that it
    > > isn't such a good idea.

    >
    > Could you elaborate, specifically on where the golden ratio enters the
    > picture?
    >
    > It seems like a number drawn out of thin air, like e.g. a factor 1.7 for
    > a hash table.
    >
    > Generally it's an optimization question, and the answer for optimization
    > is to measure if it matters.
    >
    > Cheers,
    >
    > - Alf
    >
    > --
    > A: Because it messes up the order in which people normally read text.
    > Q: Why is it such a bad thing?
    > A: Top-posting.
    > Q: What is the most annoying thing on usenet and in e-mail?- Hide quoted text -
    >
    > - Show quoted text -


    Yeah it matters because the process uses 1 full CPU of 8 (12.5%).
    Every message that it gets on its socket, it creates this kind of a
    header and forwards it to a library for further processing.

    The rates that we supplied were around a 100,000 messages per second
    and the requirement is that it performs with say maximum 60%
    utilization instead of 100%.
    Bala, Nov 1, 2007
    #12
  13. James Kanze wrote:
    > On Oct 31, 8:21 pm, Erik Wikström <> wrote:
    > [...]
    >> Of course, if msgHeader is declared in the function that gets called it
    >> will go out of scope when the function returns and will be reallocated
    >> when it is called again, in which case a new string will be constructed
    >> in which case the operations on the string will have not effect over two
    >> different calls. If msgHeader on the other hand is external to the
    >> function then you will probably benefit from using reserve. BTW, when
    >> calling reserve() with an argument that is smaller than or equal to the
    >> current capacity no action is taken.

    >
    > If you (re-)use a string with static lifetime, you probably
    > don't need reserve. It will very quickly reach the capacity of
    > the largest header, and never shrink.


    On the other hand the function won't be thread safe.
    Tadeusz Kopec, Nov 1, 2007
    #13
  14. Bala2508

    Bala Guest

    On Nov 1, 11:19 am, Tadeusz Kopec <> wrote:
    > James Kanze wrote:
    > > On Oct 31, 8:21 pm, Erik Wikström <> wrote:
    > > [...]
    > >> Of course, if msgHeader is declared in the function that gets called it
    > >> will go out of scope when the function returns and will be reallocated
    > >> when it is called again, in which case a new string will be constructed
    > >> in which case the operations on the string will have not effect over two
    > >> different calls. If msgHeader on the other hand is external to the
    > >> function then you will probably benefit from using reserve. BTW, when
    > >> calling reserve() with an argument that is smaller than or equal to the
    > >> current capacity no action is taken.

    >
    > > If you (re-)use a string with static lifetime, you probably
    > > don't need reserve. It will very quickly reach the capacity of
    > > the largest header, and never shrink.

    >
    > On the other hand the function won't be thread safe.


    Yes you are right, it wont be thread safe. In this current scenario, i
    am not using it with a static lifetime just to avoid locking and
    unlocking while creating the headers, hence its an automatic
    variable. In the meanwhile, im gettin myself tuned to run the test
    with the suggested changes for reserve. Hoping that it helps, because
    cant think of anything more as a solution at this moment.
    Bala, Nov 1, 2007
    #14
  15. Bala2508

    Jim Langston Guest

    "James Kanze" <> wrote in message
    news:...
    On Oct 31, 10:55 pm, Erik Wikström <> wrote:

    [...]
    > I do not know, and I do not think the standard says anything
    > about it. But a good implementation will probably use a
    > resizing scheme similar to the one used for vectors, such as
    > (at least) doubling the capacity every time it resizes.


    Doubling is actually not a very good strategy; multiplying by
    say 1.5 is considerably better. (As a general rule, the
    multiplier should be less that (1+sqrt(5))/2---about 1.6. 1.5
    is close enough, and easy to calculate.) In memory tight
    situations, of course, the multiplier should be even smaller.

    The original STL implementation did use 2, and I suspect that
    many implementations still do, even though we now know that it
    isn't such a good idea.

    =====

    I deciced to test my implementation so I wrote this:

    #include <iostream>
    #include <string>

    int main()
    {
    std::string Foo;

    std::string::size_type LastCapacity = Foo.capacity();
    std::cout << "Initial Capacity:" << LastCapacity << "\n";
    for ( int i = 0; i < 100; ++i )
    {
    Foo += "x";
    if ( Foo.capacity() != LastCapacity )
    {
    std::cout << "Size:" << Foo.size() << " " << "Capacity:" <<
    Foo.capacity() << "\n";
    LastCapacity = Foo.capacity();
    }
    }
    }

    The output for my system is:

    Initial Capacity:15
    Size:16 Capacity:31
    Size:32 Capacity:47
    Size:48 Capacity:70
    Size:71 Capacity:105

    So as we can see, on my system if I did not initially .reserve() and added 1
    character at a time then I would wind up with 4 extra memory reallocations.

    I'm on Windows XP with Microsoft Visual C++ .net 2005. Not sure that the OS
    matters, just the compiler.
    Jim Langston, Nov 1, 2007
    #15
  16. On 2007-11-01 21:53, Jim Langston wrote:
    > "James Kanze" <> wrote in message
    > news:...
    > On Oct 31, 10:55 pm, Erik Wikstré—£ <> wrote:
    >
    > [...]
    >> I do not know, and I do not think the standard says anything
    >> about it. But a good implementation will probably use a
    >> resizing scheme similar to the one used for vectors, such as
    >> (at least) doubling the capacity every time it resizes.

    >
    > Doubling is actually not a very good strategy; multiplying by
    > say 1.5 is considerably better. (As a general rule, the
    > multiplier should be less that (1+sqrt(5))/2---about 1.6. 1.5
    > is close enough, and easy to calculate.) In memory tight
    > situations, of course, the multiplier should be even smaller.
    >
    > The original STL implementation did use 2, and I suspect that
    > many implementations still do, even though we now know that it
    > isn't such a good idea.
    >
    > =====
    >
    > I deciced to test my implementation so I wrote this:
    >
    > #include <iostream>
    > #include <string>
    >
    > int main()
    > {
    > std::string Foo;
    >
    > std::string::size_type LastCapacity = Foo.capacity();
    > std::cout << "Initial Capacity:" << LastCapacity << "\n";
    > for ( int i = 0; i < 100; ++i )
    > {
    > Foo += "x";
    > if ( Foo.capacity() != LastCapacity )
    > {
    > std::cout << "Size:" << Foo.size() << " " << "Capacity:" <<
    > Foo.capacity() << "\n";
    > LastCapacity = Foo.capacity();
    > }
    > }
    > }
    >
    > The output for my system is:
    >
    > Initial Capacity:15
    > Size:16 Capacity:31
    > Size:32 Capacity:47
    > Size:48 Capacity:70
    > Size:71 Capacity:105
    >
    > So as we can see, on my system if I did not initially .reserve() and added 1
    > character at a time then I would wind up with 4 extra memory reallocations.


    And for those too lazy to do the math themselves the numbers means that
    the capacity is multiplied by 1.5 on each resize.

    To James Kanze: I too would be interested in hearing about the source of
    the "less than 1.6 multiplication" rule. I have tried to google for it
    but I do not even know about where to start.

    --
    Erik Wikström
    =?UTF-8?B?RXJpayBXaWtzdHLDtm0=?=, Nov 1, 2007
    #16
  17. Bala2508

    Jim Langston Guest

    "Erik Wikström" <> wrote in message
    news:durWi.12684$...
    > On 2007-11-01 21:53, Jim Langston wrote:
    >> "James Kanze" <> wrote in message
    >> news:...
    >> On Oct 31, 10:55 pm, Erik Wikstr? <> wrote:
    >>
    >> [...]
    >>> I do not know, and I do not think the standard says anything
    >>> about it. But a good implementation will probably use a
    >>> resizing scheme similar to the one used for vectors, such as
    >>> (at least) doubling the capacity every time it resizes.

    >>
    >> Doubling is actually not a very good strategy; multiplying by
    >> say 1.5 is considerably better. (As a general rule, the
    >> multiplier should be less that (1+sqrt(5))/2---about 1.6. 1.5
    >> is close enough, and easy to calculate.) In memory tight
    >> situations, of course, the multiplier should be even smaller.
    >>
    >> The original STL implementation did use 2, and I suspect that
    >> many implementations still do, even though we now know that it
    >> isn't such a good idea.
    >>
    >> =====
    >>
    >> I deciced to test my implementation so I wrote this:
    >>
    >> #include <iostream>
    >> #include <string>
    >>
    >> int main()
    >> {
    >> std::string Foo;
    >>
    >> std::string::size_type LastCapacity = Foo.capacity();
    >> std::cout << "Initial Capacity:" << LastCapacity << "\n";
    >> for ( int i = 0; i < 100; ++i )
    >> {
    >> Foo += "x";
    >> if ( Foo.capacity() != LastCapacity )
    >> {
    >> std::cout << "Size:" << Foo.size() << " " << "Capacity:" <<
    >> Foo.capacity() << "\n";
    >> LastCapacity = Foo.capacity();
    >> }
    >> }
    >> }
    >>
    >> The output for my system is:
    >>
    >> Initial Capacity:15
    >> Size:16 Capacity:31
    >> Size:32 Capacity:47
    >> Size:48 Capacity:70
    >> Size:71 Capacity:105
    >>
    >> So as we can see, on my system if I did not initially .reserve() and
    >> added 1
    >> character at a time then I would wind up with 4 extra memory
    >> reallocations.

    >
    > And for those too lazy to do the math themselves the numbers means that
    > the capacity is multiplied by 1.5 on each resize.
    >
    > To James Kanze: I too would be interested in hearing about the source of
    > the "less than 1.6 multiplication" rule. I have tried to google for it
    > but I do not even know about where to start.


    It was explained to me once. Consider a string with an initial capacity of
    100. There are 100 bytes of memory allocated. Then if you double it, there
    is a hole in the first 100 bytes, and the next 200 bytes are allocated. Now
    you double it again to 400. 100 is not enough so it can't use the hole, so
    it allocates 400 bytes,leaving a 300 byte hole. Double again, 800, 300 is
    not enough, it allcoates 800, leaving a 700 byte hole. As we can see, it
    can never reuse the memory from the previous allocations since they have to
    be continguous.

    So that's where the 1.6 comes in. That with future allocations eventually
    the system will be able to reuse previous memory allocated for the string.
    Jim Langston, Nov 1, 2007
    #17
  18. Bala2508

    Bala Guest

    On Nov 1, 5:37 pm, "Jim Langston" <> wrote:
    > "Erik Wikström" <> wrote in message
    >
    > news:durWi.12684$...
    >
    >
    >
    >
    >
    > > On 2007-11-01 21:53, Jim Langston wrote:
    > >> "James Kanze" <> wrote in message
    > >>news:...
    > >> On Oct 31, 10:55 pm, Erik Wikstr? <> wrote:

    >
    > >> [...]
    > >>> I do not know, and I do not think the standard says anything
    > >>> about it. But a good implementation will probably use a
    > >>> resizing scheme similar to the one used for vectors, such as
    > >>> (at least) doubling the capacity every time it resizes.

    >
    > >> Doubling is actually not a very good strategy; multiplying by
    > >> say 1.5 is considerably better. (As a general rule, the
    > >> multiplier should be less that (1+sqrt(5))/2---about 1.6. 1.5
    > >> is close enough, and easy to calculate.) In memory tight
    > >> situations, of course, the multiplier should be even smaller.

    >
    > >> The original STL implementation did use 2, and I suspect that
    > >> many implementations still do, even though we now know that it
    > >> isn't such a good idea.

    >
    > >> =====

    >
    > >> I deciced to test my implementation so I wrote this:

    >
    > >> #include <iostream>
    > >> #include <string>

    >
    > >> int main()
    > >> {
    > >> std::string Foo;

    >
    > >> std::string::size_type LastCapacity = Foo.capacity();
    > >> std::cout << "Initial Capacity:" << LastCapacity << "\n";
    > >> for ( int i = 0; i < 100; ++i )
    > >> {
    > >> Foo += "x";
    > >> if ( Foo.capacity() != LastCapacity )
    > >> {
    > >> std::cout << "Size:" << Foo.size() << " " << "Capacity:" <<
    > >> Foo.capacity() << "\n";
    > >> LastCapacity = Foo.capacity();
    > >> }
    > >> }
    > >> }

    >
    > >> The output for my system is:

    >
    > >> Initial Capacity:15
    > >> Size:16 Capacity:31
    > >> Size:32 Capacity:47
    > >> Size:48 Capacity:70
    > >> Size:71 Capacity:105

    >
    > >> So as we can see, on my system if I did not initially .reserve() and
    > >> added 1
    > >> character at a time then I would wind up with 4 extra memory
    > >> reallocations.

    >
    > > And for those too lazy to do the math themselves the numbers means that
    > > the capacity is multiplied by 1.5 on each resize.

    >
    > > To James Kanze: I too would be interested in hearing about the source of
    > > the "less than 1.6 multiplication" rule. I have tried to google for it
    > > but I do not even know about where to start.

    >
    > It was explained to me once. Consider a string with an initial capacity of
    > 100. There are 100 bytes of memory allocated. Then if you double it, there
    > is a hole in the first 100 bytes, and the next 200 bytes are allocated. Now
    > you double it again to 400. 100 is not enough so it can't use the hole, so
    > it allocates 400 bytes,leaving a 300 byte hole. Double again, 800, 300 is
    > not enough, it allcoates 800, leaving a 700 byte hole. As we can see, it
    > can never reuse the memory from the previous allocations since they have to
    > be continguous.
    >
    > So that's where the 1.6 comes in. That with future allocations eventually
    > the system will be able to reuse previous memory allocated for the string..- Hide quoted text -
    >
    > - Show quoted text -


    I wrote a similar test application on Solaris 10 Sparc with g++3.4.3.
    It clearly shows, that the reserve thing is surely going to help my
    performance.

    Thanks a ton for your inputs. Will let you know once the test is
    successful.

    I started appending 2 characters a time in a loop of 10.

    This is the output without reserve

    Start With :0
    Current Size:2 Current Capacity:2
    Current Size:4 Current Capacity:4
    Current Size:6 Current Capacity:6
    Current Size:8 Current Capacity:8
    Current Size:10 Current Capacity:10
    Current Size:12 Current Capacity:12
    Current Size:14 Current Capacity:14
    Current Size:16 Current Capacity:16
    Current Size:18 Current Capacity:18
    Current Size:20 Current Capacity:20

    This is the output with reserve of 20

    Start With :20

    Thanks,
    Bala
    Bala, Nov 1, 2007
    #18
  19. That mystical magical sequence appears again! (Was: std::string

    On Thu, 2007-11-01 at 14:37 -0700, Jim Langston wrote:

    > It was explained to me once. Consider a string with an initial capacity of
    > 100. There are 100 bytes of memory allocated. Then if you double it, there
    > is a hole in the first 100 bytes, and the next 200 bytes are allocated. Now
    > you double it again to 400. 100 is not enough so it can't use the hole, so
    > it allocates 400 bytes,leaving a 300 byte hole. Double again, 800, 300 is
    > not enough, it allcoates 800, leaving a 700 byte hole. As we can see, it
    > can never reuse the memory from the previous allocations since they have to
    > be continguous.
    >
    > So that's where the 1.6 comes in. That with future allocations eventually
    > the system will be able to reuse previous memory allocated for the string.


    Sounds like something Knuth would have noticed, but I can't find it in
    TAoCP. Recognising its Knuthiness and the use of the golden ratio, it
    looks like a Fibonacci thing:

    +-+
    |X|
    +-+

    +---+-+
    |X X| |
    +---+-+

    +-----+
    |X X X|
    +-----+

    +---------+-----+
    |X X X X X| |
    +---------+-----+

    +---------------+
    |X X X X X X X X|
    +---------------+

    +-------------------------+---------------+
    |X X X X X X X X X X X X X |
    +-------------------------+---------------+

    +-----------------------------------------+
    |X X X X X X X X X X X X X X X X X X X X X|
    +-----------------------------------------+

    But I wonder how useful this is in practise. It seems to be a real
    best-case thing - if you can't merely extend, then get the next
    Fibonacci number size *before* the previous allocation and copy there.
    If you don't happen to get the space before the current storage, then
    you still get fragmentation - although I suppose being *strictly*
    Fibonacci probably makes that average out to be the lowest of a general
    purpose system as adjacent free blocks will tend to support Fibonacci
    sized allocations more often.

    The ratio of adjacent pairs in the Fibonacci sequence of course
    approaches the golden ratio as the noise due to being rational
    approaches zero. Being that memory allocations must have integer size -
    would you be better storing the previous capacity and adding the current
    capacity to determine the next one - and *really* go Fibonacci?

    --
    Tristan Wibberley

    Any opinion expressed is mine (or else I'm playing devils advocate for
    the sake of a good argument). My employer had nothing to do with this
    communication.
    Tristan Wibberley, Nov 2, 2007
    #19
  20. Bala2508

    Jim Langston Guest

    "Bala" <> wrote in message
    news:...
    On Nov 1, 5:37 pm, "Jim Langston" <> wrote:
    > "Erik Wikström" <> wrote in message
    >
    > news:durWi.12684$...
    >
    > > On 2007-11-01 21:53, Jim Langston wrote:
    > >> "James Kanze" <> wrote in message
    > >>news:...
    > >> On Oct 31, 10:55 pm, Erik Wikstr? <> wrote:

    >
    > >> [...]
    > >>> I do not know, and I do not think the standard says anything
    > >>> about it. But a good implementation will probably use a
    > >>> resizing scheme similar to the one used for vectors, such as
    > >>> (at least) doubling the capacity every time it resizes.

    >
    > >> Doubling is actually not a very good strategy; multiplying by
    > >> say 1.5 is considerably better. (As a general rule, the
    > >> multiplier should be less that (1+sqrt(5))/2---about 1.6. 1.5
    > >> is close enough, and easy to calculate.) In memory tight
    > >> situations, of course, the multiplier should be even smaller.

    >
    > >> The original STL implementation did use 2, and I suspect that
    > >> many implementations still do, even though we now know that it
    > >> isn't such a good idea.

    >
    > >> =====

    >
    > >> I deciced to test my implementation so I wrote this:

    >
    > >> #include <iostream>
    > >> #include <string>

    >
    > >> int main()
    > >> {
    > >> std::string Foo;

    >
    > >> std::string::size_type LastCapacity = Foo.capacity();
    > >> std::cout << "Initial Capacity:" << LastCapacity << "\n";
    > >> for ( int i = 0; i < 100; ++i )
    > >> {
    > >> Foo += "x";
    > >> if ( Foo.capacity() != LastCapacity )
    > >> {
    > >> std::cout << "Size:" << Foo.size() << " " << "Capacity:" <<
    > >> Foo.capacity() << "\n";
    > >> LastCapacity = Foo.capacity();
    > >> }
    > >> }
    > >> }

    >
    > >> The output for my system is:

    >
    > >> Initial Capacity:15
    > >> Size:16 Capacity:31
    > >> Size:32 Capacity:47
    > >> Size:48 Capacity:70
    > >> Size:71 Capacity:105

    >
    > >> So as we can see, on my system if I did not initially .reserve() and
    > >> added 1
    > >> character at a time then I would wind up with 4 extra memory
    > >> reallocations.

    >
    > > And for those too lazy to do the math themselves the numbers means that
    > > the capacity is multiplied by 1.5 on each resize.

    >
    > > To James Kanze: I too would be interested in hearing about the source of
    > > the "less than 1.6 multiplication" rule. I have tried to google for it
    > > but I do not even know about where to start.

    >
    > It was explained to me once. Consider a string with an initial capacity
    > of
    > 100. There are 100 bytes of memory allocated. Then if you double it,
    > there
    > is a hole in the first 100 bytes, and the next 200 bytes are allocated.
    > Now
    > you double it again to 400. 100 is not enough so it can't use the hole,
    > so
    > it allocates 400 bytes,leaving a 300 byte hole. Double again, 800, 300 is
    > not enough, it allcoates 800, leaving a 700 byte hole. As we can see, it
    > can never reuse the memory from the previous allocations since they have
    > to
    > be continguous.
    >
    > So that's where the 1.6 comes in. That with future allocations eventually
    > the system will be able to reuse previous memory allocated for the
    > string.- Hide quoted text -
    >
    > - Show quoted text -


    I wrote a similar test application on Solaris 10 Sparc with g++3.4.3.
    It clearly shows, that the reserve thing is surely going to help my
    performance.

    Thanks a ton for your inputs. Will let you know once the test is
    successful.

    I started appending 2 characters a time in a loop of 10.

    This is the output without reserve

    Start With :0
    Current Size:2 Current Capacity:2
    Current Size:4 Current Capacity:4
    Current Size:6 Current Capacity:6
    Current Size:8 Current Capacity:8
    Current Size:10 Current Capacity:10
    Current Size:12 Current Capacity:12
    Current Size:14 Current Capacity:14
    Current Size:16 Current Capacity:16
    Current Size:18 Current Capacity:18
    Current Size:20 Current Capacity:20

    This is the output with reserve of 20

    Start With :20

    Thanks,
    Bala

    ======================

    OUCH! No wonder you are having performance issues! Your platform isn't
    preallocating ANY extra space! This means every time you add a character it
    has to reallocate memory. Very bad. Yes, in your case .reserve() should
    help TREMENDOUSLY.

    That, in my opionion, is a very bad implementation of std::string.
    Jim Langston, Nov 2, 2007
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Steve B.

    Globalization and performances

    Steve B., Aug 18, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    301
  2. Chris
    Replies:
    3
    Views:
    2,006
    Chris
    Feb 17, 2004
  3. Jason Heyes
    Replies:
    1
    Views:
    2,604
    Shezan Baig
    Feb 6, 2005
  4. vincent delft
    Replies:
    2
    Views:
    321
    Thomas Guettler
    Nov 15, 2004
  5. Pallav singh
    Replies:
    3
    Views:
    3,975
    Saeed Amrollahi
    Oct 21, 2009
Loading...

Share This Page