C Style Strings

K

Kaz Kylheku

Bo said:
My favourite test involves these two C++ strings, construct one from a
literal, and copy it to another (only works for short literals):

std::string whatever = "abcd";

0040276A mov eax,dword ptr [string "abcd" (434734h)]
0040276F mov dword ptr [esp+28Ch],eax
00402776 mov byte ptr [esp+2A7h],bl
0040277D mov dword ptr [esp+2A8h],ebp
00402784 mov byte ptr [esp+290h],bl

std::string whatever2 = whatever;

0040278B mov dword ptr [esp+33Ch],eax
00402792 mov byte ptr [esp+357h],bl
00402799 mov dword ptr [esp+358h],ebp
004027A0 mov byte ptr [esp+340h],bl

You haven't actually copied the string yet, and already you have sunk
in four instructions! Whoa. Five, if you count the initialization of
BL.

At this point, the two string objects are still sharing data, so the
job is not finished. You see, if you modify wherever2, that
modification must not change whatever. So additional logic will have to
do a copy-on-write duplication later. You cannot ignore this cost,
because the reason you created the new string wherever2 is so that you
could modify it independently of whatever, right? If you didn't want to
do that, you would make whatever2 a reference to whatever, an alias.

Copy-on-write logic is tricky when you can have iterators referencing
into objects. Modifications through iterators have to do the
copy-on-write, and accesses through iterators must work properly before
and after.

The following example appears in the C++ standard, 21.3:

[Note: These rules are formulated to allow, but not require, a
reference
counted implementation. A reference counted implementation must
have the same semantics as a non-reference counted implementation.

[Example:
string s1("abc");
string::iterator i = s1.begin();
string s2 = s1;
*i = 'a'; // Must modify only s1
-end example] -end note]

This brings up the point that you can work with C++ strings quite a bit
more efficiently if you treat them as immutable and use "const
std::string &" references wherever possible.
 
C

CBFalconer

Noah said:
No, it doesn't. Construction of an object in no way implies new.
Construction of an object will only make a call to new if the
object allocates something on the heap. This again, is no
different that in C. It has nothing to do with the compiler...
the programmer is in complete control here.

In other words, you are just plain wrong.

What possible excuse is there for posting this to c.l.c? F'ups set
to c.l.c++.

--
"Our enemies are innovative and resourceful, and so are we.
They never stop thinking about new ways to harm our country
and our people, and neither do we." -- G. W. Bush.
"The people can always be brought to the bidding of the
leaders. All you have to do is tell them they are being
attacked and denounce the pacifists for lack of patriotism
and exposing the country to danger. It works the same way
in any country." --Hermann Goering.
 
M

Michael Mair

Bo said:
Sorry about that. I really meant "Using the highly optimized functions
of the C standard library". :)

The theme of the paper is of course that C++ can (sometimes :) run 5
times faster than C, *without* resorting to application specific
coding, profiling, or hand-optimized code. You just use:

std::sort(buf.begin(), buf.end());

and the compiler will expand the sort template, and inline functions
as appropriate. You just cannot do that with qsort.

Indeed. The good points and the flaws of the paper already have
been beaten to death so many times that I'll resist the temptation
to post a real response ;-)

My claim was that the same effect can be seen for std::string.
Sometimes it is much faster that C-style string functions, because the
compiler does some of the work at compile time.

In my particular case, and with some lucky constant propagation from
preceding code, this string constructor results in 5 assembly
instructions:

__forceinline
basic_string(const value_type* _String,
const allocator_type& _Allocator =
allocator_type() ) : _Parent(_Allocator)
{
const size_type _StringSize = traits_type::length(_String);

if (_MySmallStringCapacity < _StringSize)
{
_Construct(_String, _StringSize);
}
else
{
traits_type::copy(_MySmallString._Buffer, _String,
_StringSize);

_SetSmallStringCapacity();
_SetSize(_StringSize);
}
}

Uh, whatever -- this seems somewhat unlike the standard C++ (98)
I recall.
Or is that a part of something implementation specific that is
supposed to impress me?

std::string whatever = "abcd";

0040276A mov eax,dword ptr [string "abcd" (434734h)]
0040276F mov dword ptr [esp+28Ch],eax
00402776 mov byte ptr [esp+2A7h],bl
0040277D mov dword ptr [esp+2A8h],ebp
00402784 mov byte ptr [esp+290h],bl

Not bad for a code bloating template! :)

If you say so -- this is not exactly an assembler language I
am sufficiently fluent in to determine whether this is optimal
in each and every possible situation. The last time I counted
instructions and cycles was on the MC68000...


Cheers
Michael
 
W

websnarf

What is the distinction between literal size for?
std::string whatever = "abcd";

0040276A mov eax,dword ptr [string "abcd" (434734h)]
0040276F mov dword ptr [esp+28Ch],eax
00402776 mov byte ptr [esp+2A7h],bl
0040277D mov dword ptr [esp+2A8h],ebp
00402784 mov byte ptr [esp+290h],bl

std::string whatever2 = whatever;

0040278B mov dword ptr [esp+33Ch],eax
00402792 mov byte ptr [esp+357h],bl
00402799 mov dword ptr [esp+358h],ebp
004027A0 mov byte ptr [esp+340h],bl

You haven't actually copied the string yet, and already you have sunk
in four instructions! Whoa. Five, if you count the initialization of
BL.

Yeah, I wasn't going to say anything, since there are already too many
languages being discussed here; if I brought up the real cost of this
assembly I don't know who I would have been talking to. Mixing dword
writes with byte writes, and ignoring the cost of loading ebp and bl
are amongst their other sins.
At this point, the two string objects are still sharing data, so the
job is not finished. You see, if you modify wherever2, that
modification must not change whatever. So additional logic will have to
do a copy-on-write duplication later. You cannot ignore this cost,
because the reason you created the new string wherever2 is so that you
could modify it independently of whatever, right? If you didn't want to
do that, you would make whatever2 a reference to whatever, an alias.

Well actually no ... and I think that's their point. The implicit lack
of concern for performance that you get from going to a higher level
language means that they actually might mean for this to be an alias by
implementation, but not by semantics. What they are doing is making as
many copies as they intend to use, so that as they are "destroyed",
they are done so properly (think of message passing systems). But its
no more legitimate than the numerous non-COW case; they just choose to
turn COW on.
Copy-on-write logic is tricky when you can have iterators referencing
into objects. Modifications through iterators have to do the
copy-on-write, and accesses through iterators must work properly before
and after.

Yes, one real problem is the accumulated cost. All operations that
perform a modify have to first check if a copy needs to be made first.
So now every "cheap" modification operation may be as bad as doubled in
cost. Copy-on-write tricks can be programmed at the option of the
developer who uses the strings; so it shouldn't cost you an
unrecoverable penalty unless its balanced by other program needs. Yet,
if a STL vendor decides to put that in their std::string, there is not
much you can do about it.

A much worse problem is, what are you supposed to do about
multithreaded situations? Multiple threads might easily be updating
the ref count at once; C++ does not include provisions for this AFAIK.

But this is all high level ... you aren't supposed to be thinking about
these costs.
The following example appears in the C++ standard, 21.3:

[Note: These rules are formulated to allow, but not require, a
reference
counted implementation. A reference counted implementation must
have the same semantics as a non-reference counted implementation.

[Example:
string s1("abc");
string::iterator i = s1.begin();
string s2 = s1;
*i = 'a'; // Must modify only s1
-end example] -end note]

Of course -- the cost of *i = 'a'; goes up (whether it has to copy or
not.)
This brings up the point that you can work with C++ strings quite a bit
more efficiently if you treat them as immutable and use "const
std::string &" references wherever possible.

Oh yeah, even in C, you should really put const everywhere. Even if
the compiler doesn't leverage the information for performance reasons,
it is worth it for the mistakes it prevents.
 
B

Bo Persson

Kaz Kylheku said:
Bo said:
My favourite test involves these two C++ strings, construct one
from a
literal, and copy it to another (only works for short literals):

std::string whatever = "abcd";

0040276A mov eax,dword ptr [string "abcd" (434734h)]
0040276F mov dword ptr [esp+28Ch],eax
00402776 mov byte ptr [esp+2A7h],bl
0040277D mov dword ptr [esp+2A8h],ebp
00402784 mov byte ptr [esp+290h],bl

std::string whatever2 = whatever;

0040278B mov dword ptr [esp+33Ch],eax
00402792 mov byte ptr [esp+357h],bl
00402799 mov dword ptr [esp+358h],ebp
004027A0 mov byte ptr [esp+340h],bl

You haven't actually copied the string yet, and already you have
sunk
in four instructions! Whoa. Five, if you count the initialization of
BL.

If you look closely, the first two instructions actually do copy the
string! The compiler optimizes a four byte memcpy this way, a one
register load-store.

In terminating the string, we are lucky that BL aready contains a
null. Even luckier is the fact that EBP happens to contain the value
4, the size of the string. A couple of line earlier, the code uses an
ios_base::hex constant for I/O. Noticing that it too has the value 4,
the compiler reuses the EBP register.

Now, having it all in registers, the compiler can easily store a copy
of the string in whatever2.

At this point, the two string objects are still sharing data, so the
job is not finished.

No they are not. The job is finished!

There is no copy-on-write involved, it is an example of a "short
string optimization". When the string is very short, it is stored
inside the std::string object, avoiding dynamic allocation.


This is of course a best case, lucky circumstances, but it also shows
that using high level C++ constructs doesn't have to result in bad
code. It can even be remarkably good!

I told you it was my favorite example! :)


Bo Persson
 
B

Bo Persson

Michael Mair said:
Bo Persson schrieb:

Uh, whatever -- this seems somewhat unlike the standard C++ (98)
I recall.

You don't remember the

basic_string(const charT* s, const Allocator& a = Allocator());

constructor from the standard?

My example is from the implementation I use. The only specific part is
the __forceinline hint, that makes the compiler see that first
inlining the function will then let it optimize most of it away.
Or is that a part of something implementation specific that is
supposed to impress me?

You may not be easily impressed, but to me it seems pretty good to
construct a std::string object in five machine instructions. This is
an object that can later be dynamically extended, should that be
necessary.

It is also an example of how 10 or so lines of allegedly bloated C++
template code results in just a few machine instructions.
std::string whatever = "abcd";

0040276A mov eax,dword ptr [string "abcd" (434734h)]
0040276F mov dword ptr [esp+28Ch],eax
00402776 mov byte ptr [esp+2A7h],bl
0040277D mov dword ptr [esp+2A8h],ebp
00402784 mov byte ptr [esp+290h],bl

Not bad for a code bloating template! :)

If you say so -- this is not exactly an assembler language I
am sufficiently fluent in to determine whether this is optimal
in each and every possible situation. The last time I counted
instructions and cycles was on the MC68000...

:)


Bo Persson
 
B

Bo Persson

What is the distinction between literal size for?

The std::string implementation uses the "small string optimization",
where short strings are stored inside the std::string object, thus
avoiding dynamic memory allocation.

On a 64 bit system, just storing a pointer would use 8 bytes - much
more than the string itself, in this case. Remembering the allocated
size could take another 8 bytes. Using this space, you could store up
to 16 chars without allocating anything.

(This is 32 bit code though).
std::string whatever = "abcd";

0040276A mov eax,dword ptr [string "abcd" (434734h)]
0040276F mov dword ptr [esp+28Ch],eax
00402776 mov byte ptr [esp+2A7h],bl
0040277D mov dword ptr [esp+2A8h],ebp
00402784 mov byte ptr [esp+290h],bl

std::string whatever2 = whatever;
The following example appears in the C++ standard, 21.3:

[Note: These rules are formulated to allow, but not require, a
reference
counted implementation. A reference counted implementation must
have the same semantics as a non-reference counted
implementation.

[Example:
string s1("abc");
string::iterator i = s1.begin();
string s2 = s1;
*i = 'a'; // Must modify only s1
-end example] -end note]

Of course -- the cost of *i = 'a'; goes up (whether it has to copy
or
not.)

Except that in my example it does not. The strings are already
distinct.


Bo Persson
 
W

websnarf

Bo said:
The std::string implementation uses the "small string optimization",
where short strings are stored inside the std::string object, thus
avoiding dynamic memory allocation.

I see. You understand, that std::string still ends up eating cost
spread throughout the class to test whether or not the string is in
"small string optimization" mode or not right? Are you able to find
real world examples where this is a useful idea?
On a 64 bit system, just storing a pointer would use 8 bytes - much
more than the string itself, in this case. Remembering the allocated
size could take another 8 bytes. Using this space, you could store up
to 16 chars without allocating anything.

Yes and it also means that for all larger strings you end up eating an
additional penalty for the extra space in the header no matter what.
So a large array of strings will have a higher probability of thrashing
the cache twice (once for the header, and once for the contents.)
 
K

Kaz Kylheku

I see. You understand, that std::string still ends up eating cost
spread throughout the class to test whether or not the string is in
"small string optimization" mode or not right? Are you able to find
real world examples where this is a useful idea?


Yes and it also means that for all larger strings you end up eating an
additional penalty for the extra space in the header no matter what.
So a large array of strings will have a higher probability of thrashing
the cache twice (once for the header, and once for the contents.)

Actually, one way to represent std::string would be this (ignoring the
fact that it's actually a specialization of basic_string and all that
crap);

class string {
private:
__some_word_type word;
};

Word could have a small bitfield in it, say two bits. On a 32 bit
system, it would look like this:

value tag
[ 2 - 31 ] [ 0 - 1]

If the tag value is 01, then the value is a pointer to an object, which
consists of a record (with a reference count in it and possibly other
things) followed by string data. The pointer is computed simply by
masking the tag part to 00.

If the tag value is 10, then the value is a pointer to a statically
allocated C string. Of course, you mask out the tag to produce the
pointer, which is four byte aligned.

If the tag value is 00, then the top three bytes of the value hold
string data. The least significant byte is 0 and serves as a null
terminator. The fact that the tag value is 00 makes this possible.
This representation relies on big endian, of course. But since this is
library code, architecture dependencies are fine! On little endian,
something else would be done.

With this representation, creating a string out of literal data would
just be two instructions: OR'ing the pointer with 1, and then storing
it.

Copying a string would just be an assignment of the word, followed by
an update of the reference count --- only if the tag is 01.

So as you can see, there doesn't have to be any size overhead for the
header: just make use of bits in the pointer that are not needed,
thanks to alignment.

There is still overhead for checking the tag of course.

There is no way around that other than making additional (non standard,
of course) static types for string, and using those explicitly in the
program. E.g. you could have a static_string whose representation is
exactly like the static variant of string. The construction of a
regular string from a static_string would just copy the word without
checking the tag, thanks to a conversion constructor that is overloaded
for static_string.

class string {
private:
__some_word_type word;
public:
// friendship access to rhs.word
// blindly copy the word; we know it's an 10 type, no refcount to
deal with
string(static_string rhs) : word(rhs.word) { }

// inline string, i.e. the 00 variant with data in the word itself
// blindly copy also
string(inline_string rhs) : word(rhs.word) { }

string(string rhs) : word(rhs.word)
{
if (my_tag() == 1)
bump_refcount();
}

const char *c_str()
{
switch (my_tag()) {
case 0:
return (const char *) &word; // requires big endian.
case 1:
return fetch_string_from_object((string_impl *)
mask_word_to_pointer());
case 2:
return (const char *) mask_word_to_pointer();
}
}
};

As an alternate strategy, storage for short strings be placed in a
special memory heap. They would be implemented as pointers into that
heap. In this heap, all allocation would be fixed-size, with strings
being null terminated. There would be no sharing of identical strings,
so no refcounting or garbage collection. The destructor would simply
return the string into a free list.

These kinds of type tagging schemes are the norm for the implementation
of languages like Common Lisp. It's very slick. All values are word
sized. A single 32 bit (or whatever size) word has all the information
for the run-time to deduce what it's dealing with: it could be (for
instance) a 29 byte fixed integer, a pointer to a bignum integer, a
class object, a string, a UNICODE character, you name it. A function
call with 5 argument passes 5 equal-sized words, always. No struct
bullshit. :) Assignment from one memory location to another is just a
word-sized copy. No refcounts have to be updated, because there is
garbage collection.
 
B

Bo Persson

I see. You understand, that std::string still ends up eating cost
spread throughout the class to test whether or not the string is in
"small string optimization" mode or not right?

Yes, that is part of how things work, as always it's a compromise. On
the other hand, if you use dynamic allocation always, you either have
to allocate a buffer even for the empty string, or constantly check
for a null pointer. Nothing is free!
Are you able to find
real world examples where this is a useful idea?

No benchmarks comparing real applications, no.
Yes and it also means that for all larger strings you end up eating
an
additional penalty for the extra space in the header no matter what.
So a large array of strings will have a higher probability of
thrashing
the cache twice (once for the header, and once for the contents.)

A std::string has to provide size(), capacity(), and the string value
itself. Conceptually, it has to store a combination of three pointers
and/or size_t values. In short string mode, the capacity and buffer
location is known, so their space can be overlaid by the string
content.

Not allocating heap space for short strings saves some on
fragmentation and cache use.


So, if your application uses lots of long strings, allocates them
mostly at startup, and seldom modifies them, checking the mode flag
will be a loss. If the application uses mostly shorter strings, some
of which are initialized from literals, the savings in fewer dynamic
allocations will be a clear win. If you are in between, it will vary,
of course.


Bo Persson
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top