stringstream str(), buffer overflows

Robert J. Hansen · Feb 9, 2006

While writing some code to demonstrate different local search
strategies, I found something kind of unusual. I suspect it's a bug in
my understanding rather than a bug in GCC, and I'm hoping someone here
can help me out.

The task is simple: solve the game Boggle using a pure hill-climbing
strategy, returning a tab-separated list of words as a char*. (I'm
interfacing with Python, so the char* is necessary; that seems to be
the easiest way to get Python strings back from C++ code.) When
reading the data out I wind up overrunning the end of the allocated
space, despite the fact that at first blush it appears I'm doing things
right.

I'm retyping the offending code here. For clarity I'm leaving off the
std:: prefix from some calls, but these should be obvious. The
following is a method, not a function; it's declared inline in the
header file.

=====
char* words() const
{
stringstream ss;
ostream_iterator<string> oi(ss, "\t");
copy(_wordset->begin(), _wordset->end(), oi);
// The stringstream has a trailing '\t' at the end of it
// which we're not going to copy. This will turn into
// a trailing '\0'.
char* rv = new char[ss.str().size()];
memset((void*) rv, 0, ss.str().size());
copy(ss.str().begin(), ss.str().end() - 1, rv);
return rv;
}
=====

On OS X, this code works as expected. On Win32 and Debian, SIGSEGV (or
its Windows equivalent) is caught, with the offending line being the
call to copy. Replacing the last two lines of the method with

string t = ss.str().substr(0, ss.str().size() - 1);
copy(t.begin(), t.end(), rv);
return rv;

.... makes everything work just fine, though.

Can anyone give me a clear, concise description of my error in
understanding? Or is this a bug in GCC?

Dietmar Kuehl · Feb 9, 2006

Robert said:
stringstream ss; [...]
char* rv = new char[ss.str().size()];
memset((void*) rv, 0, ss.str().size());
copy(ss.str().begin(), ss.str().end() - 1, rv);
return rv;
}

That's a good one

Note that the 'str()' method of the string
stream returns the string *by value* that is, for each call to
'str()' you get a fresh copy. In the above code I spotted a total
of four copies which can easily turn into an unnecessary performance
problem even if the code happens to work for whatever unfortunate
reason you encounter (a condidate could be the use of some form of
reference counted copy where the resulting sequence is actually a
valid one). As a result of having multiple copies, you try to
iterate over an invalid range by using the begin iterator of one
sequence and the end iterator of another. Apart from the resulting
range being a different size you also run the danger of trying to
access arbitrary memory in between.

Thus, you should get the string only once and copy it from there.
I'd write the above portion of copying the code something like
below which also avoids the unnecessary call to 'memset()' which
merely assigns values which will be overridden anyway:

char* to_c_string(std::string const& str) {
char* rc = new char[str.size()];
*std::copy(str.begin(), str.end() - 1, rc) * 0;
return rc;
}

...
return to_c_string(ss.str());

Robert J. Hansen · Feb 9, 2006

Note that the 'str()' method of the string stream returns

the string *by value*

Thanks! This is exactly the thing I was missing--for some reason I
thought it was returning by reference. In light of this, your remarks
about unnecessary construction of string objects is well-taken,
although it's not a performance hit in this problem (the code executes
in sub-second time).

Once more, thanks for the reply!

Roland Pibinger · Feb 9, 2006

Robert said:
Robert said:

stringstream ss; [...]
char* rv = new char[ss.str().size()];
memset((void*) rv, 0, ss.str().size());
copy(ss.str().begin(), ss.str().end() - 1, rv);
return rv;
}

Click to expand...

That's a good one Note that the 'str()' method of the string
stream returns the string *by value* that is, for each call to
'str()' you get a fresh copy.

The main problem with the code is that it mixes low-level (memset,
char[]) and (insufficiently understood) high level constructs
(stringstream, string, copy). That mixture is always an indication for
problems in the code.

In the above code I spotted a total
of four copies which can easily turn into an unnecessary performance
problem even if the code happens to work for whatever unfortunate
reason you encounter (a condidate could be the use of some form of
reference counted copy where the resulting sequence is actually a
valid one). As a result of having multiple copies, you try to
iterate over an invalid range by using the begin iterator of one
sequence and the end iterator of another. Apart from the resulting
range being a different size you also run the danger of trying to
access arbitrary memory in between.

Thus, you should get the string only once and copy it from there.
I'd write the above portion of copying the code something like
below which also avoids the unnecessary call to 'memset()' which
merely assigns values which will be overridden anyway:

char* to_c_string(std::string const& str) {
char* rc = new char[str.size()];
*std::copy(str.begin(), str.end() - 1, rc) * 0;

Don't know what the above line does ...

return rc;
}

...
return to_c_string(ss.str());

Of course, returning a new-ed object to the caller (delete by caller)
is bad style. Very bad style, IMO.

Best regards,
Roland Pibinger

Roland Pibinger · Feb 9, 2006

Thanks! This is exactly the thing I was missing--for some reason I
thought it was returning by reference.

You are right and the interface is wrong. There really should be a
'reference-iterface' instead of a 'value-interface'.

Best wishes,
Roland Pibinger

Dietmar Kuehl · Feb 10, 2006

Roland said:
Robert said:

stringstream ss; [...]
char* rv = new char[ss.str().size()];
memset((void*) rv, 0, ss.str().size());
copy(ss.str().begin(), ss.str().end() - 1, rv);
return rv;
}

Click to expand...

That's a good one Note that the 'str()' method of the string
stream returns the string *by value* that is, for each call to
'str()' you get a fresh copy.

Click to expand...

The main problem with the code is that it mixes low-level (memset,
char[]) and (insufficiently understood) high level constructs
(stringstream, string, copy). That mixture is always an indication for
problems in the code.

Whether the mixture of code is problematic or not depends on the
user's requirements. The problem he is encountering, however, is
not related to this mixture but to the use of iterators to different
temporaries.

char* to_c_string(std::string const& str) {
char* rc = new char[str.size()];
*std::copy(str.begin(), str.end() - 1, rc) * 0;

Click to expand...

Don't know what the above line does ...

Right... There is small but rather obvious typo: the last asterisk
should have been an assignment:

*std::copy(str.begin(), str.end() - 1, rc) = 0;

Sorry for that.

Of course, returning a new-ed object to the caller (delete by caller)
is bad style. Very bad style, IMO.

I generally agree on this. However, as was explained in the original
article, the author considered this to be mandatory.

Robert J. Hansen · Feb 10, 2006

the author considered this to be mandatory

I'd phrase it as "regrettably mandatory", myself. I really dislike how
unsafe and obnoxious-to-use C ways of doing things are the lowest
common denominator for inter-language communications, but for now that
appears to be the only game in town.

On the other hand, I haven't taken a look at Boost:

ython yet. Maybe
I should.

Roland Pibinger · Feb 10, 2006

char* to_c_string(std::string const& str) {
char* rc = new char[str.size()];
*std::copy(str.begin(), str.end() - 1, rc) * 0;

Click to expand...

Don't know what the above line does ...

Click to expand...

Right... There is small but rather obvious typo: the last asterisk
should have been an assignment:

*std::copy(str.begin(), str.end() - 1, rc) = 0;

Nitpicking ... you need to check for str.size() > 0.

Roland Pibinger · Feb 10, 2006

I'd phrase it as "regrettably mandatory", myself.

When (on which occasion) do you delete the returned char[]?

I really dislike how
unsafe and obnoxious-to-use C ways of doing things are the lowest
common denominator for inter-language communications, but for now that
appears to be the only game in town.

At least, we have a common, non-proprietary, widely understood
'inter-language'.

Best wishes,
Roland Pibinger

Robert J. Hansen · Feb 10, 2006

When ... do you delete the returned char[]?

This is Officially Not My Problem(tm). SWIG creates a wrapper which
takes the char* and takes responsibility for deleting it. Or, rather,
that's my understanding of what SWIG is doing; it wouldn't be
impossible for me to be mistaken.

Roland Pibinger · Feb 10, 2006

When ... do you delete the returned char[]?

Click to expand...

This is Officially Not My Problem(tm). SWIG creates a wrapper which
takes the char* and takes responsibility for deleting it.

That would be a surprise to me. How can SWIG know in which way the
char[] has been created? With new[], malloc, a custom allocator, as
static, ...?

Or, rather, that's my understanding of what SWIG is doing; it
wouldn't be impossible for me to be mistaken.

Maybe a SWIG expert can comment ...

Robert J. Hansen · Feb 11, 2006

How can SWIG know in which way the char[]

has been created?

Just hazarding a guess, the "-c++" flag I pass to SWIG might be a good
clue to it that it should use delete.

Dietmar Kuehl · Feb 11, 2006

Robert said:
How can SWIG know in which way the char[]
has been created?

Click to expand...

Just hazarding a guess, the "-c++" flag I pass to SWIG might be a good
clue to it that it should use delete.

Which one though? Here are the built-in candidates but these are not
the only ones which might exist in a program:

delete p;
delete[] p;
operator delete(p);
operator delete[](p);

My guess would be that it actually uses 'malloc(p)' in which case
the memory is better allocated differently...

Exception handling problem	11	Jul 3, 2013
stringstream: rdbuf()->sgetn(buf, rdbuf()->in_avail()) contentsdiffer from str()	2	Feb 11, 2008
Strict aliasing and buffer handling	20	Jun 20, 2011
buffer size	3	May 4, 2009
how to empty a stringstream?	1	Jul 15, 2003
copy map to ostream_iterator	8	Oct 2, 2004
Rule of three and so forth...	5	Jun 28, 2006
Increase memory buffer with realloc or malloc?	7	Sep 7, 2009

stringstream str(), buffer overflows

Robert J. Hansen

Dietmar Kuehl

Robert J. Hansen

Roland Pibinger

Roland Pibinger

Dietmar Kuehl

Robert J. Hansen

Roland Pibinger

Roland Pibinger

Robert J. Hansen

Roland Pibinger

Robert J. Hansen

Dietmar Kuehl

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads