Who can explain this bug?

mathog · Apr 17, 2013

Platform: gcc 4.6.3 on AMD Athlon(tm) II X2 245 Processor on
Ubuntu 12.04

The bug I thought was due to a particularly hard to find memory access
problem (see other thread I started today) seems to be even odder than
that. It was tracked down to a particular conditional, which seems to
optimize in such a way that the result is not always correct. (But
possibly only when embedded in the rest of the program.)

The skeletal elements of the problem code are:

double function(double *ymax){

double yheight;
double baseline;
double tmp=0.0;
double fs;
double yll, boff; /* these have values */
/* bbox is a struct that includes yMin and yMax */

/* set fs */

/* start of loop */
/* set yll, boff, bbox */
yheight = bbox.yMax - bbox.yMin;
printf("baseline %lf tmp %20.17lf %16llX ymax %20.17lf %16llX \n",
baseline, tmp, *(uint64_t *)&tmp,
(ymax ? *ymax: 666), (ymax ? *(uint64_t *)ymax: 666));

if(ymax){
tmp = fs * (((double)bbox.yMax)/yheight);
//the next line would "fix" the code when compiled in:
// printf("DEBUG ymax %lf tmp %lf\n", *ymax, tmp);fflush(stdout);
if(*ymax <= tmp){ // <--------THIS IS THE PROBLEM
*ymax = tmp;
baseline = yll - boff;
}
}
/* end of loop */
return(baseline);
}

When this is compiled with -O2 or -O3, AND when it is part of the larger
program, the indicated test is failing for the equals case, the
output from the printf statements is:

baseline 0.000000 tmp 0.00000000000000000 0 ymax
0.00000000000000000 0
baseline 12.680004 tmp 6.85406955898171422 401B6A9135E157A9 ymax
6.85406955898171422 401B6A9135E157A9
baseline 12.680004 tmp 6.85406955898171422 401B6A9135E157A9 ymax
6.85406955898171422 401B6A9135E157A9
baseline 12.680004 tmp 6.85406955898171422 401B6A9135E157A9 ymax
6.85406955898171422 401B6A9135E157A9

which is wrong, the baseline value should have been changing because
yll and boff were different on each loop iteration.

Compile it with -O0 (or with -O2 or -O3 and not be part of the larger
program, or with the commented out printf uncommented) and
instead it emits:

baseline 0.000000 tmp 0.00000000000000000 0 ymax
0.00000000000000000 0
baseline 12.680004 tmp 6.85406955898171422 401B6A9135E157A9 ymax
6.85406955898171422 401B6A9135E157A9
baseline 13.480004 tmp 6.85406955898171422 401B6A9135E157A9 ymax
6.85406955898171422 401B6A9135E157A9
baseline 17.480006 tmp 6.85406955898171422 401B6A9135E157A9 ymax
6.85406955898171422 401B6A9135E157A9

which is the correct result.

So, bizarrely, there is a problem with the conditional "<=" not giving
the right answer when the values of *ymax and tmp are equal when the
code is compiled in certain ways. And they are equal, down to the last
bit, as shown by the print statements!

This modification does not resolve the issue (when compiled in such a
way that it would be an issue):
if(*ymax <= (tmp * 1.0000000000000001)){

but this one does:
if(*ymax <= (tmp * 1.00000000001)){

So, what sort of "optimization" is it that gives the wrong result for
the "<=" condition when *ymax and tmp are equal????

Thanks,

David Mathog

mathog · Apr 18, 2013

Gordon said:
An optimization that keeps excess precision in a calculation.

A typical floating point unit uses registers that are 80 bits wide,
and only narrows it down to 64 bits when you store in a double.
The code may not bother storing the result, then re-loading it
before doing a comparison.

That would do it. In the example tmp is calculated and probably lives
in a register but the ymax value is stored. (If after the calculation
tmp is not stored to ymax, it is never used again, so there is no reason
it would not just be in the register.) So tmp may differ in the extra
16 bits out of the 80 bits that were not stored into ymax.

See the GCC option -ffloat-store option.

Right. But I want this to work always - code behavior generally should
not depend on obscure compiler flags being set one way or the other.
That flag cannot be the one causing troubles here in any case, because
it is disabled for -O0 and -O3. Actually comparing O0 and O3 it was
not immediately evident which of the many changed flags might have
caused this.

Comparing for floating-point equality is treading on dangerous
ground. Where calculations are involved, there's almost always an
excuse for at least one bit of roundoff. Also, very few non-integer
decimal numbers can be exactly represented in finite-precision
binary floating point.

The odd thing here is effectively what is happening is

b = 10000;
a = 1.0/9.0; /* a calculation with no exact representation */
if(b <= a) b= a; /* comparison is true */
a = 1.0/9.0; /* the SAME calculation */
if(b <= a) b= a; /* comparison is false */

Seems like a compiler bug to me that a value _known to have been stored_
in memory at 64 bits would be compared directly to a recent calculation
at 80 bits. Possibly faster to do it that way, but, as here, it can
give the wrong answer.

Thanks,

David Mathog

James Kuyper · Apr 18, 2013

That would do it. In the example tmp is calculated and probably lives
in a register but the ymax value is stored. (If after the calculation
tmp is not stored to ymax, it is never used again, so there is no reason
it would not just be in the register.) So tmp may differ in the extra
16 bits out of the 80 bits that were not stored into ymax.

Right. But I want this to work always - code behavior generally should
not depend on obscure compiler flags being set one way or the other.

You're out of luck - the standard doesn't mandate the existence of any
feature that would deal with this. There's no portable way to do this,
just a variety of non-portable ways.

If you can restrict the use of your code to fully conforming
implementations of C99 or later, you can examine the value of
FLT_EVAL_METHOD (defined in <float.h>) to determine whether or not a
given compiler carries excess precision when evaluating expressions of
type float or double. However, if that value is -1, the evaluation
method is indeterminable, and if it's any other negative value, the
meaning is implementation-defined. If FLT_EVAL_METHOD >= 0, then float_t
and double_t (defined in <math.h>) are guaranteed to be wide enough that
intermediate expressions are evaluated in the same type, but the
standard imposes no requirements on float_t or double_t when
FLT_EVAL_METHOD < 0.

That flag cannot be the one causing troubles here in any case, because
it is disabled for -O0 and -O3. Actually comparing O0 and O3 it was
not immediately evident which of the many changed flags might have
caused this.

I couldn't find any documentation of a connection between --float-store
and -O0 or -O3. It also seems odd that something would be disabled for
those two levels, but not for -O1 or -O2. Can you explain that comment a
little more?

....

The odd thing here is effectively what is happening is

b = 10000;
a = 1.0/9.0; /* a calculation with no exact representation */
if(b <= a) b= a; /* comparison is true */
a = 1.0/9.0; /* the SAME calculation */
if(b <= a) b= a; /* comparison is false */

It might seem to you that both comparisons are identical, but they can
look very different to an optimizer. One side in one of the comparisons
might be a value that the optimizer has decided to keep in a
high-precision register; the corresponding side in the other comparison
might be a value that it has decided to retrieve from a lower-precision
memory location.

Seems like a compiler bug to me that a value _known to have been stored_
in memory at 64 bits would be compared directly to a recent calculation
at 80 bits. Possibly faster to do it that way, but, as here, it can
give the wrong answer.

That's why --float-store was invented, so you could express your
preference for "more consistent" vs. "faster and with greater but
inconsistent precision". That's precisely why the project I'm working on
has mandated --float-store when compiling using gcc (SGIs MIPS-PRO
compiler used to be our main compiler, and we used to use several
others, but for several years now we've been using gcc exclusively - it
makes portability testing a bit difficult).

Nobody · Apr 18, 2013

Right. But I want this to work always - code behavior generally should
not depend on obscure compiler flags being set one way or the other.

Tough; floating-point calculations are effectively non-deterministic
without certain additional compiler flags.

Tim Rentsch · Apr 18, 2013

mathog said:
Platform: gcc 4.6.3 on AMD Athlon(tm) II X2 245 Processor on
Ubuntu 12.04

The bug I thought was due to a particularly hard to find memory
access problem (see other thread I started today) seems to be
even odder than that. It was tracked down to a particular
conditional, which seems to optimize in such a way that the
result is not always correct. (But possibly only when embedded
in the rest of the program.)

The skeletal elements of the problem code [white space added] are:

double function( double *ymax ){
double yheight;
double baseline;
double tmp = 0.0;
double fs;
double yll, boff; /* these have values */
/* bbox is a struct that includes yMin and yMax */

/* set fs */

/* start of loop */
/* set yll, boff, bbox */
yheight = bbox.yMax - bbox.yMin;
printf( "baseline %lf tmp %20.17lf %16llX ymax %20.17lf %16llX \n",
baseline, tmp, *(uint64_t *)&tmp,
(ymax ? *ymax : 666), (ymax ? *(uint64_t *)ymax : 666) );

if( ymax ){
tmp = fs * ( ((double) bbox.yMax) / yheight );
// the next line would "fix" the code when compiled in:
// printf( "DEBUG ymax %lf tmp %lf\n", *ymax, tmp); fflush(stdout);
if( *ymax <= tmp ){ // <--------THIS IS THE PROBLEM
*ymax = tmp;
baseline = yll - boff;
}
}
/* end of loop */
return(baseline);
}

[program behaves differently compiled with no optimization
and -O2 or -O3]

So, bizarrely, there is a problem with the conditional "<=" not
giving the right answer when the values of *ymax and tmp are
equal when the code is compiled in certain ways.
[snip elaboration]

The problem is gcc, which is non-conforming in how it handles
floating point. Changing the program with

volatile double tmp = 0.0;

or, if you prefer a more delicate touch, with

if( *ymax <= *(volatile double *)&tmp ){ ...

should remove the offending non-conforming behavior.

Tim Rentsch · Apr 18, 2013

So, what sort of "optimization" is it that gives the wrong result for
the "<=" condition when *ymax and tmp are equal????

Click to expand...

An optimization that keeps excess precision in a calculation.

A typical floating point unit uses registers that are 80 bits wide,
and only narrows it down to 64 bits when you store in a double.
The code may not bother storing the result, then re-loading it
before doing a comparison. [snip]

Only in non-conforming implementations. What happens in a
conforming implementation must be as if the value were actually
stored and re-fetched, and the semantics of assignment imcludes
removing any extra range and precision (footnote, 6.3.1.8 p2).
What gcc does is not a legitimate optimization but non-conforming
behavior.

Tim Rentsch · Apr 18, 2013

James Kuyper said:
That would do it. In the example tmp is calculated and probably lives
in a register but the ymax value is stored. (If after the calculation
tmp is not stored to ymax, it is never used again, so there is no reason
it would not just be in the register.) So tmp may differ in the extra
16 bits out of the 80 bits that were not stored into ymax.

Right. But I want this to work always - code behavior generally should
not depend on obscure compiler flags being set one way or the other.

Click to expand...

You're out of luck - the standard doesn't mandate the existence of any
feature that would deal with this. There's no portable way to do this,
just a variety of non-portable ways. [snip elaboration]

This explanation is all messed up. There is a portable way of
doing what he wants to do, for _conforming_ implementations, and
he is doing it: storing the result of a floating point
calculation into a variable (ie, by assignment) is obliged to
remove any extra range and precision, which would eliminate the
unexpected behavior. That gcc does not behave this way is a bug
in gcc, not behavior that the Standard admits in conforming
implementations.

Tim Rentsch · Apr 18, 2013

Nobody said:
Tough; floating-point calculations are effectively non-deterministic
without certain additional compiler flags.

Floating-point operations can be effectively non-deterministic.
But in C, assigning and accessing floating-point variables is not
as non-deterministic as you seem to think it is. The culprit
here is gcc, which cheerfully generates non-conforming code for
floating point expressions involving assignment and subsequent
access.

glen herrmannsfeldt · Apr 18, 2013

It is commonly said, for good reasons, not to depend on equality
in floating point comparisons. Not even if you just assigned the
same value to a variable.

That would do it. In the example tmp is calculated and probably lives
in a register but the ymax value is stored. (If after the calculation
tmp is not stored to ymax, it is never used again, so there is no reason
it would not just be in the register.) So tmp may differ in the extra
16 bits out of the 80 bits that were not stored into ymax.

Actually, I believe it is 11 bits, you also get 4 more exponent bits.
The latter is convenient in not having to worry about overflow in
intermediate calculations when the result will fit. (and no hidden 1,
like in the other forms.)

Right. But I want this to work always - code behavior generally should
not depend on obscure compiler flags being set one way or the other.
That flag cannot be the one causing troubles here in any case, because
it is disabled for -O0 and -O3. Actually comparing O0 and O3 it was
not immediately evident which of the many changed flags might have
caused this.

The x87 floating point processor has eight registers, each 80 bits
wide. The original design was for a virtual stack such that when
it overflowed an interrupt routine would write some out to memory,
and on underflow read them back. Programs could be written without
worry about stack size. After the hardware (8087) was built, it
turned out that it was not possible to write such a routine.
As I remember it, you couldn't restore all the status bits back
to the appropriate values.

There are control bits that affect the precision of some operations,
but not all of them. Some systems (runtime start-up code) set those
bits, others don't.

The odd thing here is effectively what is happening is

b = 10000;
a = 1.0/9.0; /* a calculation with no exact representation */
if(b <= a) b= a; /* comparison is true */
a = 1.0/9.0; /* the SAME calculation */
if(b <= a) b= a; /* comparison is false */

If you really have a compiler where 10000 <= 1.0/9.0 then
it is a bug.

As far as I know, it is usual for -O0 to keep intermediates in
register during evaluation of an expression, but store to the actual
variable in the end. With -O3, most try to keep values in registers
between statements, until they run out of registers.

In any case, the complications of writing a register allocator
for arbitrarily complicated expressions and eight registers means
that often you will be surprised with the results.

Seems like a compiler bug to me that a value _known to have been stored_
in memory at 64 bits would be compared directly to a recent calculation
at 80 bits. Possibly faster to do it that way, but, as here, it can
give the wrong answer.

If you don't depend on equality, or even being close, for floating
point then it isn't a problem. For even more fun, note that there
was (maybe still are) Cray computers where:

if(a*b!=b*a) printf("Ooops!\n");

might surprise you. That is, the floating point multiplier gave
different answers depending on the order of the operands.

The normal problem with the inconsistent extra precision is that it
causes some calculations designed to avoid problems with precision
loss to fail.

Another common problem is that constant expressions evaluated at
compile time give different results from the same expression,
(with variables) evaluated at run time.

-- glen

glen herrmannsfeldt · Apr 18, 2013

Tim Rentsch said:
(e-mail address removed) (Gordon Burditt) writes:

So, what sort of "optimization" is it that gives the wrong result for
the "<=" condition when *ymax and tmp are equal????

Click to expand...

An optimization that keeps excess precision in a calculation.
A typical floating point unit uses registers that are 80 bits wide,
and only narrows it down to 64 bits when you store in a double.
The code may not bother storing the result, then re-loading it
before doing a comparison. [snip]

Click to expand...

Only in non-conforming implementations. What happens in a
conforming implementation must be as if the value were actually
stored and re-fetched, and the semantics of assignment imcludes
removing any extra range and precision (footnote, 6.3.1.8 p2).
What gcc does is not a legitimate optimization but non-conforming
behavior.

I believe that is correct. Fortran is more lenient in what it allows
for floating point expression evaluation. But storing every intermediate
value can slow things down a lot!

Well, they should go to cache instead of all the way to main memory,
but it is still slow.

Does gcc always store intermediate values with -O0?

-- glen

gwowen · Apr 19, 2013

The culprit here is gcc, which cheerfully generates non-conforming code for
floating point expressions involving assignment and subsequent
access.

If you ask for conformance, with --std=c99, say, you get conformance.
I don't know of any compiler that is strictly conforming without being
specifically asked to be.

Noob · Apr 19, 2013

Tim said:
Floating-point operations can be effectively non-deterministic.
But in C, assigning and accessing floating-point variables is not
as non-deterministic as you seem to think it is. The culprit
here is gcc, which cheerfully generates non-conforming code for
floating point expressions involving assignment and subsequent
access.

You're quick to castigate gcc!

Can you provide the test-case showing the problem you speak of?

Regards.

James Kuyper · Apr 19, 2013

You're quick to castigate gcc!

Can you provide the test-case showing the problem you speak of?

According to its documentation, gcc enables -fexcess-precision=standard
automatically if you set -std=c99. Wording that addresses the excess
precision issue was only added in that version, so that seems reasonable.

I would be very surprised if gcc generates any code which fails to
conform to the floating point accuracy requirements of the C standard,
when invoked with the options that promise such conformance. That's not
because gcc's developers are unusually competent, but because the C
standard imposes so few accuracy requirements on floating point code.
gcc's developers would have to be unusually incompetent to mis-handle
the few cases where it does impose requirements.

5.2.4.2.2p6 says: "The accuracy of the floating-point operations (+, -,
*, /) and of the library functions in <math.h> and <complex.h> that
return floating-point results is implementation defined, as is the
accuracy of the conversion between floating-point internal
representations and string representations performed by the library
functions in <stdio.h>, <stdlib.h>, and <wchar.h>. The implementation
may state that the accuracy is unknown."

If __STDC_IEC_559__ is pre-defined by the implementation, the accuracy
requirements of Annex F apply, which are essentially the same as those
of IEC 60559:1989 (== IEEE 754:1985). However, if that macro is not
pre-defined. about the only significant floating point accuracy
requirements imposed by the C Standard are those on the conversion of
floating point constants to floating point values (6.4.4.2p3-5), and of
conversions to floating point types (6.3.1.4p2, 6.3.1.5p1).

The standard does recommend that "The translation-time conversion of
floating constants should match the execution-time conversion of
character strings by library functions, such as strtod, given matching
inputs suitable for both conversions, the same result format, and
default execution-time rounding." (6.4.4.2p7). It also recommends that
the printf() family be fairly accurate when performing the opposite
conversion (7.21.6.1p12-13). However, in both cases, that's in the
"Recommended Practice" section, it's not an actual requirement.

Keith Thompson · Apr 19, 2013

gwowen said:
If you ask for conformance, with --std=c99, say, you get conformance.
I don't know of any compiler that is strictly conforming without being
specifically asked to be.

A quibble: the phrase "strictly conforming" applies to a subset
of conforming *programs*, not implementations. An implementation,
as far as the standard is concerned, is either conforming or not.

I wouldn't complain about the otherwise reasonable use of "strictly"
to modify "conforming" if not for the fact that the standard actually
uses "strictly conforming" as a defined technical term.

mathog · Apr 19, 2013

Tim said:
The culprit
here is gcc, which cheerfully generates non-conforming code for
floating point expressions involving assignment and subsequent
access.

This little test program shows the same issue:

#include <stdio.h>
#include <stdlib.h>

void atest(double *ymax, double a, double b){
double tmp;
tmp=a/b;
if(*ymax <= tmp){
*ymax = tmp;
printf("True\n");
}
else {
printf("False\n");
}
}

int main(void){
double ymax=0;
double a, b;

while(1){
(void) fscanf(stdin,"%lf %lf",&a,&b);
if(!b)break;
atest(&ymax,a,b);
}
}

gcc -O3 -o test test.c
./test

11 23
True
11 23
False
11 19
True
11 19
False
etc.

As suggested elsewhere in this thread, making tmp "volatile" does
resolve the issue. Volatile is supposed to prevent compilers from
optimizing away calculations that need to be repeated. Here for some
reason it forces the result of tmp into a variable. Why? The compiler
then knows that 'tmp" may change unpredictably between where it is set
and where it is read, but why does that imply that the desired value at
the second position is the calculated value rather than something
written into tmp from a register in a hardware device (or whatever)
outside of this program's control?

Regards,

David Mathog

James Kuyper · Apr 19, 2013

This little test program shows the same issue:

#include <stdio.h>
#include <stdlib.h>

void atest(double *ymax, double a, double b){
double tmp;
tmp=a/b;
if(*ymax <= tmp){
*ymax = tmp;
printf("True\n");
}
else {
printf("False\n");
}
}

int main(void){
double ymax=0;
double a, b;

while(1){
(void) fscanf(stdin,"%lf %lf",&a,&b);
if(!b)break;
atest(&ymax,a,b);
}
}

11 23
True
11 23
False
11 19
True
11 19
False
etc.

If compiled with -fexcess-precision=standard (which is the default if
you choose -std=c99), then the results become consistent. All you need
to do, in order get standard-conforming behavior, is to request
standard-conforming behavior (at least, in this case).

Noob · Apr 20, 2013

Tim: In your haste to blame gcc, you forget that gcc is NOT
a conforming compiler by default (it compiles gnuc89).

http://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html

If you want conforming, it is the operator's responsibility
to explicitly ask for it.

gcc -std=c99 -pedantic-errors

This little test program shows the same issue:

#include <stdio.h>
#include <stdlib.h>

void atest(double *ymax, double a, double b) {
double tmp;
tmp=a/b;
if(*ymax <= tmp){
*ymax = tmp;
printf("True\n");
}
else {
printf("False\n");
}
}

int main(void){
double ymax=0;
double a, b;

while(1){
(void) fscanf(stdin,"%lf %lf",&a,&b);
if(!b)break;
atest(&ymax,a,b);
}
}

11 23
True
11 23
False
11 19
True
11 19
False
etc.

As suggested elsewhere in this thread, making tmp "volatile" does
resolve the issue. Volatile is supposed to prevent compilers from
optimizing away calculations that need to be repeated. Here for some
reason it forces the result of tmp into a variable. Why? The compiler
then knows that 'tmp' may change unpredictably between where it is set
and where it is read, but why does that imply that the desired value at
the second position is the calculated value rather than something
written into tmp from a register in a hardware device (or whatever)
outside of this program's control?

There have been, literally, entire books written on the subject
of floating-point arithmetic. If you follow comp.arch, you can read
old-timers (mostly 70s and 80s engineers) criticize several aspects
of IEEE 754, or even the very concept of "floating-point" arithmetic.

You have barely scratched the surface of the complexity in FP.

Maybe you'll enjoy David Goldberg's article:
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

Regards.

Tim Rentsch · Apr 20, 2013

gwowen said:
If you ask for conformance, with --std=c99, say, you get
conformance. I don't know of any compiler that is strictly
conforming without being specifically asked to be.

This may shock you, but I didn't just fall off the turnip
truck.

My tests with gcc were done using -std=c99 -pedantic-errors
(which is pretty much my default these days).

Tim Rentsch · Apr 20, 2013

Noob said:
Tim: In your haste to blame gcc, you forget that gcc is NOT
a conforming compiler by default (it compiles gnuc89).

Oh, excuse me, but I forgot no such thing.

http://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html

If you want conforming, it is the operator's responsibility
to explicitly ask for it.

gcc -std=c99 -pedantic-errors

And that is just how I invoked gcc when running my tests.

Tim Rentsch · Apr 20, 2013

James Kuyper said:
You're quick to castigate gcc!

Can you provide the test-case showing the problem you speak of?

Click to expand...

According to its documentation, gcc enables -fexcess-precision=standard
automatically if you set -std=c99. Wording that addresses the excess
precision issue was only added in that version, [snip elaboration].

There's an implication there (perhaps unintended) that discarding
excess precision wasn't required until C99. That conclusion
doesn't jibe with other, pre-C99, documents.

In particular, if we look at comments in Defect Reports, and
also how the wording in this area has changed over time (note
for example n843.htm, and compare n1256 to C99), it's clear that
narrowing behavior was intended all along.

What's more, assignment has _always_ implied removing any extra
range and precision, because in the abstract machine we actually
store the value in an object, and then on subsequent accesses
convert the stored representation to a value. Under the as-if
rule, any information beyond what the object representation can
represent must be lost.

Bottom line: assignment of floating point types always requires
removing any extra range and precision, even in C90.

Homework in C - Help Needed	1	Oct 16, 2024
preprocessor bug?	2	May 10, 2013
bug in gcc?	17	Apr 29, 2009
Can someone pls help me with a little algorithm script	1	Nov 28, 2024
Parsing a string	44	Nov 16, 2010
explain this.....	24	Mar 20, 2006
How to correct the code segment?????????????	13	Sep 13, 2008
Doubly linked list	6	Mar 19, 2006

Who can explain this bug?

mathog

mathog

James Kuyper

Nobody

Tim Rentsch

Tim Rentsch

Tim Rentsch

Tim Rentsch

glen herrmannsfeldt

glen herrmannsfeldt

gwowen

Noob

James Kuyper

Keith Thompson

mathog

James Kuyper

Noob

Tim Rentsch

Tim Rentsch

Tim Rentsch

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads