Ill-formed code or a compiler bug (mingw-w64 g++)?


K. Frank

Hello Group!

We've been discussing over on the mingw64 list:

some unexpected behavior in a 64-bit test program when built under the
mingw-w64 version of g++.

We're trying to decide whether the test program is ill-formed, and
that the compiler is off the hook, or whether we might have stumbled
a compiler bug.

Some context: The test program is logically incorrect in that the
code is not
expected to do what it was nominally intended to do. That's not the
Rather, the question is whether the code is legal, well-formed c++,
and whether
the compiler is generating the correct output. It seems that the key
point is
the comparisons between 32-bit and 64-bit integral types (the variable
pos, an
unsigned, and std::string::npos, a size_t) in the test code.

Here is the code:


#include <iostream>
#include <streambuf>
#include <string>

#include <cstdio>

class StdoutStream : public std::basic_streambuf<char>
StdoutStream(std::eek:stream &stream) : m_stream (stream) {
m_old_buf = stream.rdbuf();
stream.rdbuf (this);
~StdoutStream() {
// output anything that is left
if (!m_string.empty()) printf ("%s", m_string.c_str());
m_stream.rdbuf (m_old_buf);

virtual int_type overflow(int_type v) {
if (v == '\n') {
printf ("%s", m_string.c_str());
m_string.erase (m_string.begin(), m_string.end());
else m_string += v;
return v;

virtual std::streamsize xsputn (const char *p, std::streamsize n) {
m_string.append (p, p + n);
unsigned pos = 0;
// size_t pos = 0; // this is the correct declaration
while (pos != std::string::npos) { // 32-bit unsigned compared
against 64-bit size_t
// while (pos != ((unsigned) -1)) { // this works
printf ("xsputn: top of loop, pos = %d\n", pos);
pos = m_string.find ('\n');
if (pos != std::string::npos) { // 32-bit unsigned compared
against 64-bit size_t
// if (pos != ((unsigned) -1)) { // this works
std::string tmp(m_string.begin(), m_string.begin() + pos + 1);
printf ("%s", tmp.c_str());
m_string.erase (m_string.begin(), m_string.begin() + pos + 1);
printf ("xsputn: bottom of loop, pos = %d\n", pos);
printf ("xsputn: before return, n = %d, pos = %d\n", n, pos);
return n;

std::eek:stream &m_stream;
std::streambuf *m_old_buf;
std::string m_string;

int main (int argc, char *argv[]) {

printf ("hello...\n");

StdoutStream *out = new StdoutStream (std::cout);

if (out) { // to avoid unused variable warning
// prints neither message to text1
// std::cout << "Message 1 (with endl)..." << std::endl;
// std::cout << "Message 2 (with endl)..." << std::endl;

// prints only "Message 1" to text1
try {
std::cout << "Message 1 (with '\\n')...\n";
std::cout << "Message 2 (with '\\n')...\n";
catch (...) { // no exception thrown -- not the ptoblem
printf ("exception caught...\n");

// prints both messages to text1
// std::cout << "Message 1 (with two '\\n's)...\nMessage 2 (with
two '\\n's)...\n";

printf ("goodbye!\n");

return 0;


It compiles without error:

Here is the unexpected output when it runs:

xsputn: top of loop, pos = 0
Message 1 (with '\n')...
xsputn: bottom of loop, pos = 24
xsputn: top of loop, pos = 24

Note, in particular, that the second time through the loop, the "top
of loop"
message is printed, but the "bottom of loop" message is not.
it does not appear that the loop (and xsputn function) is exited by
virtue of
an exception being thrown, as the catch block is main does not appear
be being executed.

I believe (that because of the logically incorrect 32-bit / 64-bit
that the code should enter an infinite loop.

When compiled using a 32-bit version of g++ (so that the logic error
away), the program gives the following desired (and seemingly correct)

xsputn: top of loop, pos = 0
Message 1 (with '\n')...
xsputn: bottom of loop, pos = 24
xsputn: top of loop, pos = 24
xsputn: bottom of loop, pos = -1
xsputn: before return, n = 25, pos = -1
xsputn: top of loop, pos = 0
Message 2 (with '\n')...
xsputn: bottom of loop, pos = 24
xsputn: top of loop, pos = 24
xsputn: bottom of loop, pos = -1
xsputn: before return, n = 25, pos = -1

Would anyone have some ideas about what might be going on here?

I apologize for the slightly complicated test program. I haven't been
able to reproduce
the unexpected behavior in a simpler setting without the
basic_streambuf and xsputn
stuff. (An attempt to keep the loop structure and the 32-bit / 64-bit
comparisons, but
get rid of the basic_streambuf-derived class produces code that enters
an infinite loop.)

Thanks in advance for your insights.

K. Frank



K. Frank

Hi Paavo!

Thanks very much. I believe that your analysis is correct.

Some comments below...

Your program is ill-formed.
if find() does not find '\n', it returns npos (-1 casted to
std::string::size_type). This will be truncated to 0xffffffff when
assigned to unsigned pos.

Here, if 'pos+1' were evaluated first, it would overwrap and yield zero,
which is a legal op for unsigned ints. However, as m_string.begin()
returns a 64-bit iterator and the + operator associates left-to-right,
all the additions are done in 64-bit, so you attempt to construct a 4G
tmp string, which will most probably cause UB.

I think you're right.

Let me see if I can restate your argument:

pos = m_string.find ('\n');
if (pos != std::string::npos) {
std::string tmp(m_string.begin(), m_string.begin() + pos +

1) pos = m_string.find ('\n');

m_string.find ('\n') either returns a legal position in m_stirng or
However, because of the 32-bit / 64-bit mismatch, pos is converted to
32-bit -1, which is neither npos, nor a legal position in m-string.

2) if (pos != std::string::npos) {

Because the 32-bit pos is not actually equal to npos, this if
does not protect us from executing the next line. (But, so far,
is legal.)

3) std::string tmp(m_string.begin(), m_string.begin() + pos + 1);

As soon as we create the iterator "m_string.begin() + pos" (or perhaps
soon as we use it) we have undefined behavior because the iterator
neither to a position within m_string nor to "one past the end."

I would say that the undefined behavior is due to indexing "into"
m_string past its end, rather than trying to instantiate a 4GB string.
(I would think that the latter would be legal, but should throw a
bad_alloc, or something.)
If you put some parens around 'pos+1' here and a couple of lines later,
then you indeed get a well-formed infinite loop program.

Yes, I agree. To analyze a bit further, it's a bit of a happenstance
adding the parentheses "(pos + 1)" makes the code legal.

pos is still a wacky value -- an unsigned 32-bit -1. But as you point
"(pos + 1)" is now zero, and "m_string.begin + 0" is a legal iterator
m-string, so no undefined behavior occurs. (But if pos had been some
wacky value, such as an unsigned 32-bit -2, then adding the
parentheses would
not have eliminated the undefined behavior.)

Does this all sound right?

Thanks for clearing this up.

K. Frank

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question