stringstream str(), buffer overflows

  • Thread starter Robert J. Hansen
  • Start date
R

Robert J. Hansen

While writing some code to demonstrate different local search
strategies, I found something kind of unusual. I suspect it's a bug in
my understanding rather than a bug in GCC, and I'm hoping someone here
can help me out.

The task is simple: solve the game Boggle using a pure hill-climbing
strategy, returning a tab-separated list of words as a char*. (I'm
interfacing with Python, so the char* is necessary; that seems to be
the easiest way to get Python strings back from C++ code.) When
reading the data out I wind up overrunning the end of the allocated
space, despite the fact that at first blush it appears I'm doing things
right.

I'm retyping the offending code here. For clarity I'm leaving off the
std:: prefix from some calls, but these should be obvious. The
following is a method, not a function; it's declared inline in the
header file.

=====
char* words() const
{
stringstream ss;
ostream_iterator<string> oi(ss, "\t");
copy(_wordset->begin(), _wordset->end(), oi);
// The stringstream has a trailing '\t' at the end of it
// which we're not going to copy. This will turn into
// a trailing '\0'.
char* rv = new char[ss.str().size()];
memset((void*) rv, 0, ss.str().size());
copy(ss.str().begin(), ss.str().end() - 1, rv);
return rv;
}
=====

On OS X, this code works as expected. On Win32 and Debian, SIGSEGV (or
its Windows equivalent) is caught, with the offending line being the
call to copy. Replacing the last two lines of the method with

string t = ss.str().substr(0, ss.str().size() - 1);
copy(t.begin(), t.end(), rv);
return rv;

.... makes everything work just fine, though.

Can anyone give me a clear, concise description of my error in
understanding? Or is this a bug in GCC?
 
D

Dietmar Kuehl

Robert said:
stringstream ss; [...]
char* rv = new char[ss.str().size()];
memset((void*) rv, 0, ss.str().size());
copy(ss.str().begin(), ss.str().end() - 1, rv);
return rv;
}

That's a good one :) Note that the 'str()' method of the string
stream returns the string *by value* that is, for each call to
'str()' you get a fresh copy. In the above code I spotted a total
of four copies which can easily turn into an unnecessary performance
problem even if the code happens to work for whatever unfortunate
reason you encounter (a condidate could be the use of some form of
reference counted copy where the resulting sequence is actually a
valid one). As a result of having multiple copies, you try to
iterate over an invalid range by using the begin iterator of one
sequence and the end iterator of another. Apart from the resulting
range being a different size you also run the danger of trying to
access arbitrary memory in between.

Thus, you should get the string only once and copy it from there.
I'd write the above portion of copying the code something like
below which also avoids the unnecessary call to 'memset()' which
merely assigns values which will be overridden anyway:

char* to_c_string(std::string const& str) {
char* rc = new char[str.size()];
*std::copy(str.begin(), str.end() - 1, rc) * 0;
return rc;
}

...
return to_c_string(ss.str());
 
R

Robert J. Hansen

Note that the 'str()' method of the string stream returns
the string *by value*

Thanks! This is exactly the thing I was missing--for some reason I
thought it was returning by reference. In light of this, your remarks
about unnecessary construction of string objects is well-taken,
although it's not a performance hit in this problem (the code executes
in sub-second time).

Once more, thanks for the reply!
 
R

Roland Pibinger

Robert said:
stringstream ss; [...]
char* rv = new char[ss.str().size()];
memset((void*) rv, 0, ss.str().size());
copy(ss.str().begin(), ss.str().end() - 1, rv);
return rv;
}

That's a good one :) Note that the 'str()' method of the string
stream returns the string *by value* that is, for each call to
'str()' you get a fresh copy.

The main problem with the code is that it mixes low-level (memset,
char[]) and (insufficiently understood) high level constructs
(stringstream, string, copy). That mixture is always an indication for
problems in the code.
In the above code I spotted a total
of four copies which can easily turn into an unnecessary performance
problem even if the code happens to work for whatever unfortunate
reason you encounter (a condidate could be the use of some form of
reference counted copy where the resulting sequence is actually a
valid one). As a result of having multiple copies, you try to
iterate over an invalid range by using the begin iterator of one
sequence and the end iterator of another. Apart from the resulting
range being a different size you also run the danger of trying to
access arbitrary memory in between.

Thus, you should get the string only once and copy it from there.
I'd write the above portion of copying the code something like
below which also avoids the unnecessary call to 'memset()' which
merely assigns values which will be overridden anyway:

char* to_c_string(std::string const& str) {
char* rc = new char[str.size()];
*std::copy(str.begin(), str.end() - 1, rc) * 0;

Don't know what the above line does ...
return rc;
}

...
return to_c_string(ss.str());

Of course, returning a new-ed object to the caller (delete by caller)
is bad style. Very bad style, IMO.

Best regards,
Roland Pibinger
 
R

Roland Pibinger

Thanks! This is exactly the thing I was missing--for some reason I
thought it was returning by reference.

You are right and the interface is wrong. There really should be a
'reference-iterface' instead of a 'value-interface'.

Best wishes,
Roland Pibinger
 
D

Dietmar Kuehl

Roland said:
Robert said:
stringstream ss; [...]
char* rv = new char[ss.str().size()];
memset((void*) rv, 0, ss.str().size());
copy(ss.str().begin(), ss.str().end() - 1, rv);
return rv;
}

That's a good one :) Note that the 'str()' method of the string
stream returns the string *by value* that is, for each call to
'str()' you get a fresh copy.

The main problem with the code is that it mixes low-level (memset,
char[]) and (insufficiently understood) high level constructs
(stringstream, string, copy). That mixture is always an indication for
problems in the code.

Whether the mixture of code is problematic or not depends on the
user's requirements. The problem he is encountering, however, is
not related to this mixture but to the use of iterators to different
temporaries.
char* to_c_string(std::string const& str) {
char* rc = new char[str.size()];
*std::copy(str.begin(), str.end() - 1, rc) * 0;

Don't know what the above line does ...

Right... There is small but rather obvious typo: the last asterisk
should have been an assignment:

*std::copy(str.begin(), str.end() - 1, rc) = 0;

Sorry for that.
Of course, returning a new-ed object to the caller (delete by caller)
is bad style. Very bad style, IMO.

I generally agree on this. However, as was explained in the original
article, the author considered this to be mandatory.
 
R

Robert J. Hansen

the author considered this to be mandatory

I'd phrase it as "regrettably mandatory", myself. I really dislike how
unsafe and obnoxious-to-use C ways of doing things are the lowest
common denominator for inter-language communications, but for now that
appears to be the only game in town.

On the other hand, I haven't taken a look at Boost::python yet. Maybe
I should.
 
R

Roland Pibinger

char* to_c_string(std::string const& str) {
char* rc = new char[str.size()];
*std::copy(str.begin(), str.end() - 1, rc) * 0;

Don't know what the above line does ...

Right... There is small but rather obvious typo: the last asterisk
should have been an assignment:

*std::copy(str.begin(), str.end() - 1, rc) = 0;

Nitpicking ... you need to check for str.size() > 0.
 
R

Roland Pibinger

I'd phrase it as "regrettably mandatory", myself.

When (on which occasion) do you delete the returned char[]?
I really dislike how
unsafe and obnoxious-to-use C ways of doing things are the lowest
common denominator for inter-language communications, but for now that
appears to be the only game in town.

At least, we have a common, non-proprietary, widely understood
'inter-language'.

Best wishes,
Roland Pibinger
 
R

Robert J. Hansen

When ... do you delete the returned char[]?

This is Officially Not My Problem(tm). SWIG creates a wrapper which
takes the char* and takes responsibility for deleting it. Or, rather,
that's my understanding of what SWIG is doing; it wouldn't be
impossible for me to be mistaken.
 
R

Roland Pibinger

When ... do you delete the returned char[]?

This is Officially Not My Problem(tm). SWIG creates a wrapper which
takes the char* and takes responsibility for deleting it.

That would be a surprise to me. How can SWIG know in which way the
char[] has been created? With new[], malloc, a custom allocator, as
static, ...?
Or, rather, that's my understanding of what SWIG is doing; it
wouldn't be impossible for me to be mistaken.

Maybe a SWIG expert can comment ...
 
R

Robert J. Hansen

How can SWIG know in which way the char[]
has been created?

Just hazarding a guess, the "-c++" flag I pass to SWIG might be a good
clue to it that it should use delete.
 
D

Dietmar Kuehl

Robert said:
How can SWIG know in which way the char[]
has been created?

Just hazarding a guess, the "-c++" flag I pass to SWIG might be a good
clue to it that it should use delete.

Which one though? Here are the built-in candidates but these are not
the only ones which might exist in a program:

delete p;
delete[] p;
operator delete(p);
operator delete[](p);

My guess would be that it actually uses 'malloc(p)' in which case
the memory is better allocated differently...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top