Ill-formed code or a compiler bug (mingw-w64 g++)?

K

K. Frank

Hello Group!

We've been discussing over on the mingw64 list:

http://sourceforge.net/mailarchive/message.php?msg_id=28355621

some unexpected behavior in a 64-bit test program when built under the
64-bit
mingw-w64 version of g++.

We're trying to decide whether the test program is ill-formed, and
therefore
that the compiler is off the hook, or whether we might have stumbled
across
a compiler bug.

Some context: The test program is logically incorrect in that the
code is not
expected to do what it was nominally intended to do. That's not the
point.
Rather, the question is whether the code is legal, well-formed c++,
and whether
the compiler is generating the correct output. It seems that the key
point is
the comparisons between 32-bit and 64-bit integral types (the variable
pos, an
unsigned, and std::string::npos, a size_t) in the test code.

Here is the code:

====================

#include <iostream>
#include <streambuf>
#include <string>

#include <cstdio>

class StdoutStream : public std::basic_streambuf<char>
{
public:
StdoutStream(std::eek:stream &stream) : m_stream (stream) {
m_old_buf = stream.rdbuf();
stream.rdbuf (this);
}
~StdoutStream() {
// output anything that is left
if (!m_string.empty()) printf ("%s", m_string.c_str());
m_stream.rdbuf (m_old_buf);
}

protected:
virtual int_type overflow(int_type v) {
if (v == '\n') {
printf ("%s", m_string.c_str());
m_string.erase (m_string.begin(), m_string.end());
}
else m_string += v;
return v;
}

virtual std::streamsize xsputn (const char *p, std::streamsize n) {
m_string.append (p, p + n);
unsigned pos = 0;
// size_t pos = 0; // this is the correct declaration
while (pos != std::string::npos) { // 32-bit unsigned compared
against 64-bit size_t
// while (pos != ((unsigned) -1)) { // this works
printf ("xsputn: top of loop, pos = %d\n", pos);
pos = m_string.find ('\n');
if (pos != std::string::npos) { // 32-bit unsigned compared
against 64-bit size_t
// if (pos != ((unsigned) -1)) { // this works
std::string tmp(m_string.begin(), m_string.begin() + pos + 1);
printf ("%s", tmp.c_str());
m_string.erase (m_string.begin(), m_string.begin() + pos + 1);
}
printf ("xsputn: bottom of loop, pos = %d\n", pos);
}
printf ("xsputn: before return, n = %d, pos = %d\n", n, pos);
return n;
}

private:
std::eek:stream &m_stream;
std::streambuf *m_old_buf;
std::string m_string;
};

int main (int argc, char *argv[]) {

printf ("hello...\n");

StdoutStream *out = new StdoutStream (std::cout);

if (out) { // to avoid unused variable warning
// prints neither message to text1
// std::cout << "Message 1 (with endl)..." << std::endl;
// std::cout << "Message 2 (with endl)..." << std::endl;

// prints only "Message 1" to text1
try {
std::cout << "Message 1 (with '\\n')...\n";
std::cout << "Message 2 (with '\\n')...\n";
}
catch (...) { // no exception thrown -- not the ptoblem
printf ("exception caught...\n");
}

// prints both messages to text1
// std::cout << "Message 1 (with two '\\n's)...\nMessage 2 (with
two '\\n's)...\n";
}

printf ("goodbye!\n");

return 0;
}

====================


It compiles without error:

Here is the unexpected output when it runs:

C:\>stdout_stream_error
hello...
xsputn: top of loop, pos = 0
Message 1 (with '\n')...
xsputn: bottom of loop, pos = 24
xsputn: top of loop, pos = 24
goodbye!

Note, in particular, that the second time through the loop, the "top
of loop"
message is printed, but the "bottom of loop" message is not.
Furthermore,
it does not appear that the loop (and xsputn function) is exited by
virtue of
an exception being thrown, as the catch block is main does not appear
to
be being executed.

I believe (that because of the logically incorrect 32-bit / 64-bit
comparison)
that the code should enter an infinite loop.

When compiled using a 32-bit version of g++ (so that the logic error
goes
away), the program gives the following desired (and seemingly correct)
output:

C:\>stdout_stream_error
hello...
xsputn: top of loop, pos = 0
Message 1 (with '\n')...
xsputn: bottom of loop, pos = 24
xsputn: top of loop, pos = 24
xsputn: bottom of loop, pos = -1
xsputn: before return, n = 25, pos = -1
xsputn: top of loop, pos = 0
Message 2 (with '\n')...
xsputn: bottom of loop, pos = 24
xsputn: top of loop, pos = 24
xsputn: bottom of loop, pos = -1
xsputn: before return, n = 25, pos = -1
goodbye!

Would anyone have some ideas about what might be going on here?

I apologize for the slightly complicated test program. I haven't been
able to reproduce
the unexpected behavior in a simpler setting without the
basic_streambuf and xsputn
stuff. (An attempt to keep the loop structure and the 32-bit / 64-bit
comparisons, but
get rid of the basic_streambuf-derived class produces code that enters
an infinite loop.)


Thanks in advance for your insights.


K. Frank
 
K

K. Frank

Hi Paavo!

Thanks very much. I believe that your analysis is correct.

Some comments below...

Your program is ill-formed.
...
if find() does not find '\n', it returns npos (-1 casted to
std::string::size_type). This will be truncated to 0xffffffff when
assigned to unsigned pos.


Here, if 'pos+1' were evaluated first, it would overwrap and yield zero,
which is a legal op for unsigned ints. However, as m_string.begin()
returns a 64-bit iterator and the + operator associates left-to-right,
all the additions are done in 64-bit, so you attempt to construct a 4G
tmp string, which will most probably cause UB.

I think you're right.

Let me see if I can restate your argument:

pos = m_string.find ('\n');
if (pos != std::string::npos) {
std::string tmp(m_string.begin(), m_string.begin() + pos +
1);

1) pos = m_string.find ('\n');

m_string.find ('\n') either returns a legal position in m_stirng or
npos.
However, because of the 32-bit / 64-bit mismatch, pos is converted to
a
32-bit -1, which is neither npos, nor a legal position in m-string.

2) if (pos != std::string::npos) {

Because the 32-bit pos is not actually equal to npos, this if
statement
does not protect us from executing the next line. (But, so far,
everything
is legal.)

3) std::string tmp(m_string.begin(), m_string.begin() + pos + 1);

As soon as we create the iterator "m_string.begin() + pos" (or perhaps
as
soon as we use it) we have undefined behavior because the iterator
points
neither to a position within m_string nor to "one past the end."

I would say that the undefined behavior is due to indexing "into"
m_string past its end, rather than trying to instantiate a 4GB string.
(I would think that the latter would be legal, but should throw a
bad_alloc, or something.)
If you put some parens around 'pos+1' here and a couple of lines later,
then you indeed get a well-formed infinite loop program.

Yes, I agree. To analyze a bit further, it's a bit of a happenstance
that
adding the parentheses "(pos + 1)" makes the code legal.

pos is still a wacky value -- an unsigned 32-bit -1. But as you point
out
"(pos + 1)" is now zero, and "m_string.begin + 0" is a legal iterator
into
m-string, so no undefined behavior occurs. (But if pos had been some
other
wacky value, such as an unsigned 32-bit -2, then adding the
parentheses would
not have eliminated the undefined behavior.)

Does this all sound right?
Cheers
Paavo

Thanks for clearing this up.


K. Frank
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top