C++ teaser: Is this a compiler bug, or is this expected behavior?

  • Thread starter Generic Usenet Account
  • Start date
G

Generic Usenet Account

Compile the following snippet of code and run it. If the program spits
out bat:bat instead of bat:zat, what would you say? Would you say that
the compiler has a problem, or would you lay the blame on "undefined
execution of function parameters" in the C/C++ standard and "sequence
points"?

/////// Code snippet begins ///////

#include <iostream>
char foo[10]="cat";
char* writestring()
{
foo[0]='b';
return foo;
}

char* write2()
{
foo[0]='z';
return foo;
}


int main(void)
{ std::cout << writestring() << ":" << write2() << std::endl; }



/////// Code snippet ends ///////

Thanks,
Bhat


[Purists who hold that this NG is meant to discuss compiler neutral,
standard C++ issues only may not proceed beyond this point;-)]










For those of you who are "trivially inclined", here's some
background......
I stumbled upon a "bug" in my C++ compiler (g++ 3.3.1), which I
promptly reported to Bugzilla. The code snippet above was actually
provided by someone from the GCC volunteer community. They attributed
the unexpected behavior to the undefined behavior of execution of
function parameters and sequence points. In my original code snippet,
I was maintaining an STL map between IP addresses e.g. 105.52.20.33,
5000 and 47.32.68.95, 6000.

When I displayed the entries in the map, the second IP address was
displayed incorrectly. So instead of the mapping:

105.52.20.33, 5000 >>-->> 47.32.68.95, 6000
I got

105.52.20.33, 5000 >>-->> 105.52.20.33, 6000

The bug does not manifest when the code is compiled using native
Solaris C++
compiler version "WorkShop Compilers 5.0 02/04/10 C++ 5.0 Patch
107311-17"

Here's my original code snippet

/////// Code snippet begins ///////
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>

#include <string>
#include <map>
#include <iostream>

using namespace std;


struct addrLessThan:public binary_function<const struct sockaddr_in,
const
struct sockaddr_in, bool>
{
bool operator()(const struct sockaddr_in addr1, const struct
sockaddr_in
addr2) const
{
bool retVal = true;

string addrStr1 = inet_ntoa(addr1.sin_addr);
string addrStr2 = inet_ntoa(addr2.sin_addr);

if(addrStr1 > addrStr2)
retVal = false;
else if(addrStr1 == addrStr2)
retVal = (addr1.sin_port < addr2.sin_port);

return retVal;
}
};



typedef map<struct sockaddr_in, struct sockaddr_in, addrLessThan>
IpV4AddrMap;



main()
{
struct sockaddr_in actualAddress, mappedAddress;

actualAddress.sin_port=5000;
actualAddress.sin_addr.s_addr = inet_addr("105.52.20.33");

mappedAddress.sin_port=6000;
mappedAddress.sin_addr.s_addr = inet_addr("47.32.68.95");

IpV4AddrMap map;

map[actualAddress] = mappedAddress;

IpV4AddrMap::iterator itor = map.find(actualAddress);

if(itor != map.end())
{
cout << "Key: " << inet_ntoa(itor->first.sin_addr)
<< ", " << itor->first.sin_port << endl
<< "Value: " << inet_ntoa(itor->second.sin_addr)
<< ", " << itor->second.sin_port << endl
<< endl;
}
return 0;
}

/////// Code snippet ends ///////


For more details, you can go to
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22265
 
V

Victor Bazarov

Generic said:
Compile the following snippet of code and run it. If the program spits
out bat:bat instead of bat:zat, what would you say? Would you say that
the compiler has a problem, or would you lay the blame on "undefined
execution of function parameters" in the C/C++ standard and "sequence
points"?

/////// Code snippet begins ///////

#include <iostream>
char foo[10]="cat";
char* writestring()
{
foo[0]='b';
return foo;
}

char* write2()
{
foo[0]='z';
return foo;
}


int main(void)
{ std::cout << writestring() << ":" << write2() << std::endl; }
[..]

Yes, the latter, the correct term is "the order of evaluation of the
function arguments is unspecified". A simpler expression is

cout << writestring() << write2();

which is the same as

( cout.operator<<( writestring() ) ) . operator<< ( write2() );

in which 'write2()' is allowed to be evaluated before 'writestring()'
as I understand it. The part to the right of the second dot is the
function with its arguments. The left part is the object, which also
needs to be evaluated...

V
 
O

Old Wolf

Generic said:
Compile the following snippet of code and run it. If the program
spits out bat:bat instead of bat:zat, what would you say? Would
you say that the compiler has a problem, or would you lay the blame
on "undefined execution of function parameters" in the C/C++
standard and "sequence points"?

#include <iostream>
char foo[10]="cat";
char* writestring()
{
foo[0]='b';
return foo;
}

char* write2()
{
foo[0]='z';
return foo;
}

int main(void)
{ std::cout << writestring() << ":" << write2() << std::endl; }

The behaviour is unspecified (NOT undefined), and
"bat:bat", "bat:zat", and "zat:zat" are all valid outputs.
(But "zat:bat" is not.)

You must remember that writestring() can be called at any point
between the start of this statement's execution, and the point
where its return value is needed. The same goes for write2().

Sequence points are not an issue here, because there are
no instances of multiple side-effects occuring without an
intervening sequence point (a function call has a sequence
point after its arguments have been evaluated, and another one
as it returns).

The example has some similarities to:

foo( a(), b() );

where there is no reason to suspect that a() will be called
before b().

BTW, Why are you posting to comp.sources.d ?
The code snippet above was actually provided by someone from
the GCC volunteer community. They attributed the unexpected
behavior to the undefined behavior of execution of function
parameters and sequence points.

If that was their exact wording, then they are wrong (or
expressed their intention incorrectly).

The behaviour is only unexpected if you were expecting
the wrong thing :)
I stumbled upon a "bug" in my C++ compiler (g++ 3.3.1), which I
promptly reported to Bugzilla.
cout << "Key: " << inet_ntoa(itor->first.sin_addr)
<< ", " << itor->first.sin_port << endl
<< "Value: " << inet_ntoa(itor->second.sin_addr)
<< ", " << itor->second.sin_port << endl
<< endl;

Unfortunately you have wasted the time of the Bugzilla people.
You have correctly identified the essence of the "problem",
namely that inet_ntoa() returns a pointer into a static buffer.
In fact, on my system, the inet_ntoa manpage specifically
says:
The string is
returned in a statically allocated buffer, which
subsequent calls will overwrite.

If you still think this is a bug, then what do you think the
'fix' should be? The most common suggestion that people make
on comp.lang.c (or c++) is to force left-to-right evaluation
of function parameters.

This has been discussed to death before, but the main reason
for opposing it is that it would force compilers to produce
slower code in many cases. For example, some calling conventions
feature parameters being pushed onto a stack, with the right-most
parameters pushed first. A function with this calling convention
would need the compiler to jump through some hoops, instead of
a few simple function calls followed by a stack push of the
return value.
 
G

Geo

Old said:
The behaviour is unspecified (NOT undefined), and
"bat:bat", "bat:zat", and "zat:zat" are all valid outputs.
(But "zat:bat" is not.)

Attempting to modify a literal value is undefined behaviour, surely ?
 
M

msalters

Geo schreef:
Attempting to modify a literal value is undefined behaviour, surely ?

It would be. However, char[10] is not a literal. It can be modified.
It's equivalent to { int foo = 10; ++foo; } That doesn't modify 10.

HTH,
Michiel Salters
 
G

Geo

msalters said:
Geo schreef:
Attempting to modify a literal value is undefined behaviour, surely ?

It would be. However, char[10] is not a literal. It can be modified.
It's equivalent to { int foo = 10; ++foo; } That doesn't modify 10.

HTH,
Michiel Salters

No it's not equivalent at all,


char foo[10]="cat";

reserves 10 character slots and points char[0] at the address of "cat",
which is a literal. Later, foo[0] = 'z' is an attempt to modify the
first chatacter of "cat", i.e. modify the literal, which is undefined
behaviour.
 
K

Karl Heinz Buchegger

Geo said:
Geo schreef:
Old Wolf wrote:

The behaviour is unspecified (NOT undefined), and
"bat:bat", "bat:zat", and "zat:zat" are all valid outputs.
(But "zat:bat" is not.)

Attempting to modify a literal value is undefined behaviour, surely ?

It would be. However, char[10] is not a literal. It can be modified.
It's equivalent to { int foo = 10; ++foo; } That doesn't modify 10.

HTH,
Michiel Salters

No it's not equivalent at all,

char foo[10]="cat";

reserves 10 character slots and points char[0] at the address of "cat",
which is a literal. Later, foo[0] = 'z' is an attempt to modify the
first chatacter of "cat", i.e. modify the literal, which is undefined
behaviour.

You might want to reread your 'C++ begining programmers intorduction'
to figure out what
char foo[10] = "cat";
realy does.
Hint: It does not what you describe above.
 
I

Ian Malone

Geo said:
Geo schreef:
Attempting to modify a literal value is undefined behaviour, surely ?

It would be. However, char[10] is not a literal. It can be modified.
It's equivalent to { int foo = 10; ++foo; } That doesn't modify 10.
No it's not equivalent at all,


char foo[10]="cat";

reserves 10 character slots and points char[0] at the address of "cat",
which is a literal. Later, foo[0] = 'z' is an attempt to modify the
first chatacter of "cat", i.e. modify the literal, which is undefined
behaviour.

I may be wrong, but I was under the impression that:
char foo[10]="cat";
results in an array of size 10 in which members are initialised from
the string "cat" (including terminating \0). Whereas
char *bar="cat";
results in a pointer which points to the address of the literal "cat".
 
M

msalters

Geo schreef:
msalters said:
Geo schreef:
Old Wolf wrote:

The behaviour is unspecified (NOT undefined), and
"bat:bat", "bat:zat", and "zat:zat" are all valid outputs.
(But "zat:bat" is not.)

Attempting to modify a literal value is undefined behaviour, surely ?

It would be. However, char[10] is not a literal. It can be modified.
It's equivalent to { int foo = 10; ++foo; } That doesn't modify 10.

HTH,
Michiel Salters

No it's not equivalent at all,


char foo[10]="cat";

reserves 10 character slots and points char[0] at the address of "cat",
which is a literal. Later, foo[0] = 'z' is an attempt to modify the
first chatacter of "cat", i.e. modify the literal, which is undefined
behaviour.

That's the description for { const char* foo = "cat"; }

You can't even point foo[0] to "cat". foo[0] is a char, check typeid()
or sizeof() if you don't believe me. A 'char' is not a 'char*', and
only the latter points.

Also, if you could point foo to "cat", you surely could later point
it to "dog". However, the compiler will tell you that

char foo[10]="cat";
foo = "dog"

is illegal. Of course,

const char* foo = "cat";
foo = "dog";

is legal.

HTH,
Michiel Salters
 
R

ri_wells

Old said:
The behaviour is unspecified (NOT undefined), and
"bat:bat", "bat:zat", and "zat:zat" are all valid outputs.
(But "zat:bat" is not.)


I find it somewhat hard to accept that "bat:bat" and "zat:zat" are
valid outputs. In fact I would be more willing to accept "zat:bat" as
valid output. The reason is that I can live with the fact that within
the same statement, the order of evaluating the functions is undefined.

Rick
 
M

msalters

(e-mail address removed) schreef:
I find it somewhat hard to accept that "bat:bat" and "zat:zat" are
valid outputs. In fact I would be more willing to accept "zat:bat" as
valid output. The reason is that I can live with the fact that within
the same statement, the order of evaluating the functions is undefined.

True. So if write2 is called first, the string is changed to 'zat' and
the later to 'bat'. The char* returned is the same in both cases. If
cout only looks at that char* after both write*s have returned, it
will see "bat" twice, since the char* returned from write2 points
to memory later overwritten by a 'b'.

The point to remember is that write2 doesn't return a char* pointing
to a historical state of memory. It points to some memory, and the
user of write2 has to be aware that the contents of that memory can
change even after write2 returns.

Regards
Michiel Salters
 
M

Mike Smith

I find it somewhat hard to accept that "bat:bat" and "zat:zat" are
valid outputs. In fact I would be more willing to accept "zat:bat" as
valid output. The reason is that I can live with the fact that within
the same statement, the order of evaluating the functions is undefined.

But writestring() and write2() both return the same value, which is the
address of foo[0], *regardless* of what foo[] contains at any given
point in time. So why wouldn't they be the same?
 
R

Rick N. Backer

(e-mail address removed) schreef:

True. So if write2 is called first, the string is changed to 'zat' and
the later to 'bat'. The char* returned is the same in both cases. If
cout only looks at that char* after both write*s have returned, it
will see "bat" twice, since the char* returned from write2 points
to memory later overwritten by a 'b'.

The point to remember is that write2 doesn't return a char* pointing
to a historical state of memory. It points to some memory, and the
user of write2 has to be aware that the contents of that memory can
change even after write2 returns.

Regards
Michiel Salters

Avoid the whole issue all together. Make two independent char *
variables to capture the output from the functions and then use the
variables in the cout statement. Now, everything is defined, no
surprises about the running order of things, and the debate ceases.


I find the best practice for me, when I am not sure of any potential
side effects or questions, is to use functions calls in-line in
statements only where I am absolutely sure of the consequences and
consign the return of all others to variables I can safely use
wherever and whenever I chose.

Ken Wilson

Amer. Dlx. Tele, Gary Moore LP, LP DC Classic w/P90s,
Jeff Beck Strat, Morgan OM Acoustic,
Rick 360/12, Std. Strat (MIM), Mesa 100 Nomad,
Mesa F-30

"Goodnight Austin, Texas, wherever you are."
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,043
Latest member
CannalabsCBDReview

Latest Threads

Top