I see. You understand, that std::string still ends up eating cost
spread throughout the class to test whether or not the string is in
"small string optimization" mode or not right? Are you able to find
real world examples where this is a useful idea?
Yes and it also means that for all larger strings you end up eating an
additional penalty for the extra space in the header no matter what.
So a large array of strings will have a higher probability of thrashing
the cache twice (once for the header, and once for the contents.)
Actually, one way to represent std::string would be this (ignoring the
fact that it's actually a specialization of basic_string and all that
crap);
class string {
private:
__some_word_type word;
};
Word could have a small bitfield in it, say two bits. On a 32 bit
system, it would look like this:
value tag
[ 2 - 31 ] [ 0 - 1]
If the tag value is 01, then the value is a pointer to an object, which
consists of a record (with a reference count in it and possibly other
things) followed by string data. The pointer is computed simply by
masking the tag part to 00.
If the tag value is 10, then the value is a pointer to a statically
allocated C string. Of course, you mask out the tag to produce the
pointer, which is four byte aligned.
If the tag value is 00, then the top three bytes of the value hold
string data. The least significant byte is 0 and serves as a null
terminator. The fact that the tag value is 00 makes this possible.
This representation relies on big endian, of course. But since this is
library code, architecture dependencies are fine! On little endian,
something else would be done.
With this representation, creating a string out of literal data would
just be two instructions: OR'ing the pointer with 1, and then storing
it.
Copying a string would just be an assignment of the word, followed by
an update of the reference count --- only if the tag is 01.
So as you can see, there doesn't have to be any size overhead for the
header: just make use of bits in the pointer that are not needed,
thanks to alignment.
There is still overhead for checking the tag of course.
There is no way around that other than making additional (non standard,
of course) static types for string, and using those explicitly in the
program. E.g. you could have a static_string whose representation is
exactly like the static variant of string. The construction of a
regular string from a static_string would just copy the word without
checking the tag, thanks to a conversion constructor that is overloaded
for static_string.
class string {
private:
__some_word_type word;
public:
// friendship access to rhs.word
// blindly copy the word; we know it's an 10 type, no refcount to
deal with
string(static_string rhs) : word(rhs.word) { }
// inline string, i.e. the 00 variant with data in the word itself
// blindly copy also
string(inline_string rhs) : word(rhs.word) { }
string(string rhs) : word(rhs.word)
{
if (my_tag() == 1)
bump_refcount();
}
const char *c_str()
{
switch (my_tag()) {
case 0:
return (const char *) &word; // requires big endian.
case 1:
return fetch_string_from_object((string_impl *)
mask_word_to_pointer());
case 2:
return (const char *) mask_word_to_pointer();
}
}
};
As an alternate strategy, storage for short strings be placed in a
special memory heap. They would be implemented as pointers into that
heap. In this heap, all allocation would be fixed-size, with strings
being null terminated. There would be no sharing of identical strings,
so no refcounting or garbage collection. The destructor would simply
return the string into a free list.
These kinds of type tagging schemes are the norm for the implementation
of languages like Common Lisp. It's very slick. All values are word
sized. A single 32 bit (or whatever size) word has all the information
for the run-time to deduce what it's dealing with: it could be (for
instance) a 29 byte fixed integer, a pointer to a bignum integer, a
class object, a string, a UNICODE character, you name it. A function
call with 5 argument passes 5 equal-sized words, always. No struct
bullshit.
Assignment from one memory location to another is just a
word-sized copy. No refcounts have to be updated, because there is
garbage collection.