C++ strings & C strings

B

Bo Persson

Frederick said:
Bo Persson posted:
A std::string implementation can use the small string optimization,
to avoid dynamic allocation for some strings. If the string is
shorter than a certain size, and the size is known at compile time,
std::string can take advantage of that.


I've never heard of that... how would it be implemented? Something
like:

#include <cstddef>

template<std::size_t i>
std::string::string(char const (&str))
{
/* Something Funky... ? */
}


Something Funky. :)

Actually a combination of inlining, compiler intrinsics, and an
agressive optimizer. Visual C++ Express does this!


The exact code looks like this (inside basic_string):

__forceinline
basic_string(const value_type* _String,
const allocator_type& _Allocator =
allocator_type() )
: _Parent(_Allocator)
{
const size_type _StringSize = traits_type::length(_String);

if (_MySmallStringCapacity < _StringSize)
{
_Construct(_String, _StringSize);
}
else
{
traits_type::copy(_MySmallString._Buffer, _String,
_StringSize);

_SetSmallStringCapacity();
_SetSize(_StringSize);
}
}


Here traits_type::length contains a call to std::strlen, which is a
compiler intrinsic. If _String is a literal, this call is evaluated at
compile time.

If so, the condition in the if-statement is also a constant
expression, evaluating to false, so the else part is selected.

The traits_type::copy contains a call to std::memcpy, which is also an
intrinsic if all parameters are constant. It is inlined as one or more
mov instructions.


Here is an example from a test program with this constructor, followed
by a copy construction to a second string

std::string whatever = "abcd";
std::string whatever2 = whatever;

The compiler also takes advantage of the fact that register BL is
zero, and that EBP already contains the string length.

; 530 :
; 531 : std::string whatever = "abcd";

0080d a1 00 00 00 00 mov eax, DWORD PTR
??_C@_04EHKALCEN@abcd?$AA@
00 00 mov DWORD PTR _whatever$[esp+1792], eax
00 00 mov BYTE PTR _whatever$[esp+1819], bl
00 00 mov DWORD PTR _whatever$[esp+1820], ebp
00 00 mov BYTE PTR _whatever$[esp+1796], bl

; 532 :
; 533 : std::string whatever2 = whatever;

00 00 mov DWORD PTR _whatever2$[esp+1792], eax
00 00 mov BYTE PTR _whatever2$[esp+1819], bl
00 00 mov DWORD PTR _whatever2$[esp+1820], ebp
00 00 mov BYTE PTR _whatever2$[esp+1796], bl



This is of course a selected best case, but rather good
(understatement :).

I have used this example before, when arguing that well tuned C++
library code not only defies the alleged template code bloat, but
actually can be both smaller and faster that portable C code. Not to
mention easier to use correctly than some combination of
strlen/malloc/free/strcpy/strcat.


Bo Persson
 
A

arnuld

Noah said:
The bug is that it checks the string length of the string starting at
the second character in pc. What was actually wanted was strlen(pc) +
1. Later when you add one to the len variable you increase to the
actual length of pc and then try to copy it. This causes a buffer
overrun because the \0 character makes one more. The +1 is correct but
it is in all the wrong places.

whoops!, i am running "zero" on calling myself a "C++ beginning
programmer"

-- arnuld
http://arnuld.blogpsot.com
 
Y

Yannick Tremblay

can't, heck i dont even know C, i was using C++ Primer & decided to
understand the problem by practically *trying* to write the code.
as another poster pointed, it should have been strlen(pc) + 1. Line
above creates a buffer overflow which is really dangerous and the
cause of countless number of bugs in the history of programming.
Yan, your point is rally serious, that is why i want to know where &
what exactly is the bug.

Arnuld,

The point I am trying to make is: code correctness comes well before
optimisation of code execution speed. Code that is very fast at
generating a core dump (crashing) is not useful.


C++ strings offer an interface that make it easier to write bug free
code. This IMO is much more important than a 5% execution speed
difference for either implementation.

If you are a beginner programmer, learn to write correct code that
is clear and maintainable first. Worry about optimising execution
speed after you have written a first version of the code that works
correctly and is bug free _*and*_ you have measured (profiling)
that this particular section of the code is having an impact on
performance.

90% of your execution time is spent on 10% of your code.

So write 100% of code correct and bug free then if performance
is not good enough, find the 10% of the code that needs to be
optimised.


Yan
 
A

arnuld.zero

Yannick said:
as another poster pointed, it should have been strlen(pc) + 1. Line
above creates a buffer overflow which is really dangerous and the
cause of countless number of bugs in the history of programming.

yes, i read that. it was Noah who corrected me & i did the correction.
If you are a beginner programmer, learn to write correct code that
is clear and maintainable first. Worry about optimising execution
speed after you have written a first version of the code that works
correctly and is bug free _*and*_ you have measured (profiling)
that this particular section of the code is having an impact on
performance.

90% of your execution time is spent on 10% of your code.

So write 100% of code correct and bug free then if performance
is not good enough, find the 10% of the code that needs to be
optimised.

well, that is really an advice from a "Matured Programmer", i can
"feel" the years of experience behind this *advice*, ok i am doing it.

thanks for your precious time Yan.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,278
Latest member
BuzzDefenderpro

Latest Threads

Top