sizeof(std::string) seems to small

J

jl_post

Dear C++ community,

I have a question regarding the size of C++ std::strings.
Basically, I compiled the following code under two different compilers:

std::string someString = "Hello, world!";
int size1 = sizeof(std::string);
int size2 = sizeof(someString);

and printed out the values of size1 and size2. size1 and size2 always
matched in value (in other words, size1 == size2). That makes sense to
me.

Under the Visual C++ 6.0, size1 and size2 both equalled 16, but
under a GNU C++ compiler (under Linux), size1 and size2 were both 4. I
understand that different compilers are allowed to implement
std::string differently which allows for the differences between the
results of sizeof(std::string) by the different compilers.

What I don't understand is why sizeof(std::string) returns 4 with
any compiler. I mean, a value of 4 just seems too small for me. I
figure that any std::string implementation should have at least a
pointer (which points to the main string), an integer storing the
already allocated space for the main string (whose value gets returned
in the call to std::string::capacity()), and possibly even an integer
storing the length of the string.

Just the pointer alone would take up 4 bytes (I tested it and
sizeof(char*) does indeed equal 4), so I can't see how there could
possibly be any more room for anything else, like the integer that
holds the already allocated space (the one used in
std::string::capacity()). The fact that Visual C++ has a
sizeof(std::string) of 16 makes a lot more sense to me, as it clearly
has enough space to hold these integers.

So my main question is: Assuming that sizeof(char*) equals 4, how
is it possible that sizeof(std::string) can be 4 on any compiler?

Also, shouldn't sizeof(std::string) be AT LEAST sizeof(char*) +
sizeof(unsigned int) ? I'm curious why it isn't on the GNU C++
compiler that I'm using.

Thank-you in advance for any responses.

-- Jean-Luc
 
V

Victor Bazarov

Dear C++ community,

I have a question regarding the size of C++ std::strings.
Basically, I compiled the following code under two different compilers:

std::string someString = "Hello, world!";
int size1 = sizeof(std::string);
int size2 = sizeof(someString);

and printed out the values of size1 and size2. size1 and size2 always
matched in value (in other words, size1 == size2). That makes sense to
me.

Under the Visual C++ 6.0, size1 and size2 both equalled 16, but
under a GNU C++ compiler (under Linux), size1 and size2 were both 4. I
understand that different compilers are allowed to implement
std::string differently which allows for the differences between the
results of sizeof(std::string) by the different compilers.

What I don't understand is why sizeof(std::string) returns 4 with
any compiler.

It does? Really? Wait, didn't you just say that "Under the Visual C++
6.0, size1 .. equalled 16"? And 'size1' _is_ 'sizeof(std::string)', no?
So, why do you say "sizeof(std::string) returns 4 with any compiler"? It
apparently does NOT in VC++ 6.0...
> [...]

So my main question is: Assuming that sizeof(char*) equals 4, how
is it possible that sizeof(std::string) can be 4 on any compiler?

It isn't.
Also, shouldn't sizeof(std::string) be AT LEAST sizeof(char*) +
sizeof(unsigned int) ? I'm curious why it isn't on the GNU C++
compiler that I'm using.

"Use the Source, Luke!" Just look at their implementation. They
may have a simple thing like

class blah {
blah_internal *pimpl;
public:
/// all members simply forwarding the requests to 'pimpl'
};

V
 
M

mlimber

Dear C++ community,

I have a question regarding the size of C++ std::strings.
Basically, I compiled the following code under two different compilers:

std::string someString = "Hello, world!";
int size1 = sizeof(std::string);
int size2 = sizeof(someString);

and printed out the values of size1 and size2. size1 and size2 always
matched in value (in other words, size1 == size2). That makes sense to
me.

Under the Visual C++ 6.0, size1 and size2 both equalled 16, but
under a GNU C++ compiler (under Linux), size1 and size2 were both 4. I
understand that different compilers are allowed to implement
std::string differently which allows for the differences between the
results of sizeof(std::string) by the different compilers.

What I don't understand is why sizeof(std::string) returns 4 with
any compiler. I mean, a value of 4 just seems too small for me. I
figure that any std::string implementation should have at least a
pointer (which points to the main string), an integer storing the
already allocated space for the main string (whose value gets returned
in the call to std::string::capacity()), and possibly even an integer
storing the length of the string.

Just the pointer alone would take up 4 bytes (I tested it and
sizeof(char*) does indeed equal 4), so I can't see how there could
possibly be any more room for anything else, like the integer that
holds the already allocated space (the one used in
std::string::capacity()). The fact that Visual C++ has a
sizeof(std::string) of 16 makes a lot more sense to me, as it clearly
has enough space to hold these integers.

So my main question is: Assuming that sizeof(char*) equals 4, how
is it possible that sizeof(std::string) can be 4 on any compiler?

Also, shouldn't sizeof(std::string) be AT LEAST sizeof(char*) +
sizeof(unsigned int) ? I'm curious why it isn't on the GNU C++
compiler that I'm using.

Thank-you in advance for any responses.

-- Jean-Luc

The library could implement it using the Pimpl idiom (cf.
http://www.gotw.ca/gotw/024.htm):

class stringImpl;

class string
{
public:
// Forwarding functions
private:
stringImpl* pImpl;
};

Thus you have only a pointer as a member.

Cheers! --M
 
B

benben

Any type in C++ has a fixed value. This is because the compiler needs to
know the exact size of each type to allocate stack frame.

An std::string object does not contain the string data in the object itself
typically. Rather, it dynamically manages the string content somewhere else
in the memory. Usually, and by default, it allocates/ manages/ and
eventually deallocates the string content on the free store.

An std::string only needs a pointer to the content and an integer to cache
the size of the string. Of course, more complex of representation is
possible. Many versions of std::string manages its dynamic string contents
as a number of memory "chunks".

Ben
 
B

BigBrian

So my main question is: Assuming that sizeof(char*) equals 4, how
is it possible that sizeof(std::string) can be 4 on any compiler?

Then, if std::string only contains a char* then it's size is 4.
so I can't see how there could possibly be any more room for anything else
Also, shouldn't sizeof(std::string) be AT LEAST sizeof(char*) +
sizeof(unsigned int) ?

It could be implemented so that they're using some of the first few
bytes in the memory pointed to by this char * for something other than
the characters of the string. Then you only need one char *, you don't
need to store anything else as members of the object. Then,
std::string::c_str() could return this char * + (some number to skip
the non character data)*sizeof(char), std::string::capacity could
return *((int*)( pointer to char* + capacity location )),... I'm just
speculating, but this is at least one way that it could be implemented
with a single char *.

-Brian
 
A

Andre Kostur

It does? Really? Wait, didn't you just say that "Under the Visual C++
6.0, size1 .. equalled 16"? And 'size1' _is_ 'sizeof(std::string)', no?
So, why do you say "sizeof(std::string) returns 4 with any compiler"? It
apparently does NOT in VC++ 6.0...

Uh, Victor... the OP is expressing his surprise that there exists at least
one compiler for which sizeof(std::string) is 4, not that every compiler
returns 4... (existential vs. universal quantifier....)
 
K

Karl Heinz Buchegger

So my main question is: Assuming that sizeof(char*) equals 4, how
is it possible that sizeof(std::string) can be 4 on any compiler?

I really don't know how the GNU people implemented std::string.
But who says that the pointer in std::string has to point to the
characters?
What about an intermeidate structure which holds, amongst the other
things you mentioned, a reference counter? Then it would be possible
that in

std::string st1 = "hello world";
std::string st2 = "hello world";

both strings internally point to the very same memory area

st1 st2
+-------+ +--------+
| o-----------+ +-------------o |
+-------+ | | +--------+
| |
| |
v v
+----------+
| cap: 12 |
| len: 11 |
| ref: 2 |
| data: o--------+
+----------+ |
|
+-------------------------+
|
v
+---+---+---+---+---+---+---+---+---+---+---+---+
| h | e | l | l | o | | w | o | r | l | d | |
+---+---+---+---+---+---+---+---+---+---+---+---+
Also, shouldn't sizeof(std::string) be AT LEAST sizeof(char*) +
sizeof(unsigned int) ? I'm curious why it isn't on the GNU C++
compiler that I'm using.

As said: I don't know if the GNU people did it that way. But it would
be possible.
 
J

jl_post

Victor Bazarov replied:
It does? Really? Wait, didn't you just say that
"Under the Visual C++ 6.0, size1 .. equalled 16"?
And 'size1' _is_ 'sizeof(std::string)', no?
So, why do you say "sizeof(std::string) returns 4
with any compiler"? It apparently does NOT in VC++
6.0...


I apologize, Victor. When I said "it returns 4 with any compiler" I
did not mean "it returns 4 with EVERY compiler." By using the word
"any" I meant to say that "if there exists any compiler with which
sizeof(std::string) returns 4, then I have trouble understanding why 4
is returned."

And you are right in saying that it apparently does not return 4 in
VC++ 6.0. My point was that it made sense to me that VC++ returned a
value greater than 4, but I was confused that some compilers (by which
I mean GNU C++ and not VC++ 6.0) returned 4.

By using the word "any," I didn't mean "every."

Sorry for the misunderstanding, Victor.

-- Jean-Luc
 
V

Victor Bazarov

Andre said:
@newsread1.mlpsca01.us.to.verio.net:




Uh, Victor... the OP is expressing his surprise that there exists at least
one compiler for which sizeof(std::string) is 4, not that every compiler
returns 4... (existential vs. universal quantifier....)

My apologies. English is not my native tongue, I sometimes have trouble
with it.

V
 
H

Howard Hinnant

Karl Heinz Buchegger said:
std::string st1 = "hello world";
std::string st2 = "hello world";

both strings internally point to the very same memory area

st1 st2
+-------+ +--------+
| o-----------+ +-------------o |
+-------+ | | +--------+
| |
| |
v v
+----------+
| cap: 12 |
| len: 11 |
| ref: 2 |
| data: o--------+
+----------+ |
|
+-------------------------+
|
v
+---+---+---+---+---+---+---+---+---+---+---+---+
| h | e | l | l | o | | w | o | r | l | d | |
+---+---+---+---+---+---+---+---+---+---+---+---+

Nice ASCII art! :)

Just fyi, "Effect STL" by Scott Meyers does a nice survey std::string
layouts circa 2000 (Item 15). Things have changed since then in at
least one implementation I'm aware of (CodeWarrior) but it is still a
nice survey.

The above diagram is very close to "Implementation B" from this survey.
"Implementation C" from the survey gives an example where sizeof(string)
would be equal to sizeof(char*).

-Howard
 
J

jl_post

Wow! Thank you all for the extremely fast responses!

I never really thought the pointer pointing to another structure
altogether (as opposed to pointing to an array of chars), but it makes
a lot more sense now, especially having peeked at the source (at
Victor's suggestion).

(To find the file it was using for the #include, I ran:

c++ -E source.cpp

and used the output to find the full pathname of the "string" header
file.)

Let me say that the header file is much, much larger than I thought
it would be! Yet, strangely enough, sizeof(std::string) is still just
a tiny value of 4. (I guess that's not so strange when you think about
it: the methods don't really contribute to the literal size of the
object -- they're only there to be used when they're needed.)

Anyway, thanks for answering my questions. I'm a big proponent of
using std::string (which sometimes results in lots of friction with
fellow programmers who oppose them "because of all their extra baggage
and overhead"), so it's nice to see that they are quite small and not
really any larger than they need to be.

Thanks again.

-- Jean-Luc
 
J

Jeff Flinn

Victor Bazarov said:
My apologies. English is not my native tongue, I sometimes have trouble
with it.

Don't feel too bad, but I interpreted it the same way you originally did. :)

Jeff
 
A

Andre Kostur

My apologies. English is not my native tongue, I sometimes have
trouble with it.

This would be the first indication to me that english isn't your native
tongue, your english is quite good! I've read many, many of your posts
over the years, and I didn't see any significant errors.... :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top