Floating Point and Wide Registers

Robert Gamble · Aug 21, 2006

Robert Gamble schrieb:
<snip: Cast to double should force the value to type double
and appropriate truncation but does not>

Your interpretation is correct; in fact, even -ffloat-store is not
always enough to force gcc to conforming behaviour; see, for example, a
similar question by myself,
http://groups.google.de/group/comp.lang.c/browse_frm/thread/ed30d2f1380f0f1

This is a truly annoying feature of gcc.

Thanks for the response. It's unfortunate that gcc doesn't follow the
Standard in this regard, the -ffloat-store option inhibits storing any
floating point variables in registers which is overkill for those who just
want explicit narowing conversions to be honored. Hopefully this will be
addressed before c99-compliance is deemed complete but I won't hold my
breath.

Robert Gamble

Robert Gamble · Aug 21, 2006

The output for the lcc-win32 compiler is:

d3 != d1 * d2 // OK
d3 != (double) (d1 * d2) // This is the problem
fdim == 0 // OK

Why?

Because the result of d1 * d2 is already of type double, and the
compiler (probably like gcc) thinks that any cast to the same type
is a NOP (no operation).

That's what I figured.

If you write:
int a,b;

b = (int)((int)a + (int)b);

the same code will be generated as if you had written:

b = a+b;

But in this case that's okay because the relevant issues involved with
floating point dont apply to integer arithmetic.

You are correct that this is not what the compiler should do, and it is
a bug, a cast to double should do a cast.

Either gcc doesn't properly identify the fact that not applying the cast
can result in different behavior or they just don't care. In any case it
would be nice if they either fixed this or at least better advertised the
fact that even in a conforming mode this isn't handled properly and
directed users to the -ffloat-store option.

If you change the comparison to

if (d3 == (double) (float)(d1 * d2))

THEN the result is

d3 == (double) (d1 * d2);

because lcc-win32 forces the cast since float != double.

Assuming that float has less precision than double this shouldn't be the
case in general. The cast to float may cause precision to be permanently
lost making its value no longer equivalent to d3 which was not cast to
float.

Robert Gamble

Robert Gamble · Aug 21, 2006

I was thinking briefly about this matter the other day, and trying
to decide whether sort() could be counted upon. If floating point
equality tests are dubious, are relative relationship tests not
dubious as well?

This is actually what precipitated my original post. I wrote a routine a
while ago to sort an array of doubles but performed arithmetic on the
values and used the result for the sort order. A simplified example would
look something like this:

double da1[10];
double da2[10];
int x, y;
....
if (da1[x]/da2[x] <= da1[y]/da2[y])
/* sorting code */
else
/* sorting code */

The problem was that given all the same values, the result of
evaluating the conditional expression could be different depending on
which values were kept in the wide register and which were narrowed in a
particular evaluation. This of couse would cause the sort function to fail
miserably. This is not at all unexpected and given the range of the
possible values in the array was fixed by replacing the if statement with
something like:

if (da1[y]/da2[y] - da1[x]/da2[x] > 0.001)
...

In general though, this isn't a great solution and I think that properly
placed casts should work, provided the implementation conforms to C99
(which gcc apparently does not in this case).

To sort an array of doubles based solely on the value of each element
though, I think a simple comparision would work properly regardless of
what values are stored where during the evaluation of the expression given
that all the values are ordered. In ther words, if "(da1[0] < da1[1])" is
true one time, it must be true every time it is evaluated if neither
elements have been modified between evaluations. Using the fmin/fmax
functions provided in C99 should work for all values ordered or not. I
can't think of any reason this wouldn't work.

Robert Gamble

Robert Gamble · Aug 21, 2006

When the "same" mathematical value is computed twice, the
results need not be the same, so long as each result is one
of the two nearest representable values to the "true" value.
Thus, not even 0.1==0.1 is guaranteed. Presumably your
compiler takes advantage of that to retain the "spurious"
precision represented in the guard bits in some situations
and not in others.

This is something I wasn't sure about. Is the following guaranteed:

double d1 = 0.1;
double d2 = d1;
d1 == d2; /* always true? */

Robert Gamble

Robert Gamble · Aug 21, 2006

9899:1999 5.1.2.3 Example 4 reads:
"EXAMPLE 4 Implementations employing wide registers have to take care
to honor appropriate semantics. Values are independent of whether they
are represented in a register or in memory. For example, an implicit
spilling of a register is not permitted to alter the value. Also, an
explicit store and load is required to round to the precision of the
storage type. In particular, casts and assignments are required to
perform their specified conversion. For the fragment

double d1, d2;
float f;
d1 = f = expression;
d2 = (float) expression;

the values assigned to d1 and d2 are required to have been converted to
float."

The output of the following program is:

d3 != d1 * d2
d3 != (double) (d1 * d2)
fdim == 0

I expected an output of

d3 != d1 * d2
d3 == (double) (d1 * d2)
fdim == 0

Here is the program:

#include <math.h>
#include <stdio.h>

int main (void) {
double d1, d2, d3;
d1 = 0.1;
d2 = 10.0;
d3 = d1 * d2;

/* First part */
if (d3 == d1 * d2)
puts("d3 == d1 * d2");
else
puts("d3 != d1 * d2");

/* Second part */
if (d3 == (double) (d1 * d2))
puts("d3 == (double) (d1 * d2)");
else
puts("d3 != (double) (d1 * d2)");

/* Third part */
if (fdim(d3, d1 * d2) == 0)
puts("fdim == 0");
else
puts("fdim != 0");

return 0;
}

It was compiled with gcc using -Wall -W -std=c99 -pedantic

I understand the pitfalls of floating point arithmetic and I understand
what is going on here. On my machine (x86) floating point arithmetic
is performed in 80-bit registers and doubles are 64-bits. In the first
example the compiler is computing the result of the multiplication in
an 80-bit register and comparing the result to the double with less
precision. The result is not unexpected because d3 lost some precision
when it was stored into a 64-bit object but the result of the
multiplication did not undergo this loss. I don't have a problem with
this, it is AFAICT Standard conforming.
The part that is unexpected, to me, is the second part where the result
of the multiplication is explicitly cast to double which, according to
my interpretation of the above-quoted Standard verse, requires that the
result is converted to the narrower type of double before the test for
equality if performed. This does not appear to be happening. If I use
the gcc option -ffloat-store the result is as expected but this
shouldn't be required in a Standard-conforming mode.
The result of the last part of the program shows that when the results
of "d1 * d2" is actually converted to a double, it compares equal to
d3.

So my question is: Is my interpretation correct and are the results of
the second two parts guaranteed? If not, where did I go wrong?

Robert Gamble

Michael Mair · Aug 21, 2006

Robert Gamble schrieb:
<snip: Cast to double should force the value to type double
and appropriate truncation but does not>

It was compiled with gcc using -Wall -W -std=c99 -pedantic

I understand the pitfalls of floating point arithmetic and I understand
what is going on here. On my machine (x86) floating point arithmetic
is performed in 80-bit registers and doubles are 64-bits. In the first
example the compiler is computing the result of the multiplication in
an 80-bit register and comparing the result to the double with less
precision. The result is not unexpected because d3 lost some precision
when it was stored into a 64-bit object but the result of the
multiplication did not undergo this loss. I don't have a problem with
this, it is AFAICT Standard conforming.
The part that is unexpected, to me, is the second part where the result
of the multiplication is explicitly cast to double which, according to
my interpretation of the above-quoted Standard verse, requires that the
result is converted to the narrower type of double before the test for
equality if performed. This does not appear to be happening. If I use
the gcc option -ffloat-store the result is as expected but this
shouldn't be required in a Standard-conforming mode.
The result of the last part of the program shows that when the results
of "d1 * d2" is actually converted to a double, it compares equal to
d3.

So my question is: Is my interpretation correct and are the results of
the second two parts guaranteed? If not, where did I go wrong?

Your interpretation is correct; in fact, even -ffloat-store is not
always enough to force gcc to conforming behaviour; see, for
example, a similar question by myself,
http://groups.google.de/group/comp.lang.c/browse_frm/thread/ed30d2f1380f0f1

This is a truly annoying feature of gcc.

Cheers
Michael

jacob navia · Aug 21, 2006

Robert said:
9899:1999 5.1.2.3 Example 4 reads:
"EXAMPLE 4 Implementations employing wide registers have to take care
to honor appropriate semantics. Values are independent of whether they
are represented in a register or in memory. For example, an implicit
spilling of a register is not permitted to alter the value. Also, an
explicit store and load is required to round to the precision of the
storage type. In particular, casts and assignments are required to
perform their specified conversion. For the fragment

double d1, d2;
float f;
d1 = f = expression;
d2 = (float) expression;

the values assigned to d1 and d2 are required to have been converted to
float."

The output of the following program is:

d3 != d1 * d2
d3 != (double) (d1 * d2)
fdim == 0

I expected an output of

d3 != d1 * d2
d3 == (double) (d1 * d2)
fdim == 0

Here is the program:

#include <math.h>
#include <stdio.h>

int main (void) {
double d1, d2, d3;
d1 = 0.1;
d2 = 10.0;
d3 = d1 * d2;

/* First part */
if (d3 == d1 * d2)
puts("d3 == d1 * d2");
else
puts("d3 != d1 * d2");

/* Second part */
if (d3 == (double) (d1 * d2))
puts("d3 == (double) (d1 * d2)");
else
puts("d3 != (double) (d1 * d2)");

/* Third part */
if (fdim(d3, d1 * d2) == 0)
puts("fdim == 0");
else
puts("fdim != 0");

return 0;
}

It was compiled with gcc using -Wall -W -std=c99 -pedantic

I understand the pitfalls of floating point arithmetic and I understand
what is going on here. On my machine (x86) floating point arithmetic
is performed in 80-bit registers and doubles are 64-bits. In the first
example the compiler is computing the result of the multiplication in
an 80-bit register and comparing the result to the double with less
precision. The result is not unexpected because d3 lost some precision
when it was stored into a 64-bit object but the result of the
multiplication did not undergo this loss. I don't have a problem with
this, it is AFAICT Standard conforming.
The part that is unexpected, to me, is the second part where the result
of the multiplication is explicitly cast to double which, according to
my interpretation of the above-quoted Standard verse, requires that the
result is converted to the narrower type of double before the test for
equality if performed. This does not appear to be happening. If I use
the gcc option -ffloat-store the result is as expected but this
shouldn't be required in a Standard-conforming mode.
The result of the last part of the program shows that when the results
of "d1 * d2" is actually converted to a double, it compares equal to
d3.

So my question is: Is my interpretation correct and are the results of
the second two parts guaranteed? If not, where did I go wrong?

Robert Gamble

The output for the lcc-win32 compiler is:

d3 != d1 * d2 // OK
d3 != (double) (d1 * d2) // This is the problem
fdim == 0 // OK

Why?

Because the result of d1 * d2 is already of type double, and the
compiler (probably like gcc) thinks that any cast to the same type
is a NOP (no operation).

If you write:
int a,b;

b = (int)((int)a + (int)b);

the same code will be generated as if you had written:

b = a+b;

You are correct that this is not what the compiler should do, and it is
a bug, a cast to double should do a cast.

If you change the comparison to

if (d3 == (double) (float)(d1 * d2))

THEN the result is

d3 == (double) (d1 * d2);

because lcc-win32 forces the cast since float != double.

Robert Gamble · Aug 21, 2006

I was thinking briefly about this matter the other day, and trying
to decide whether sort() could be counted upon. If floating point
equality tests are dubious, are relative relationship tests not
dubious as well?

Click to expand...

This is actually what precipitated my original post. I wrote a routine a
while ago to sort an array of doubles but performed arithmetic on the
values and used the result for the sort order. A simplified example would
look something like this:

double da1[10];
double da2[10];
int x, y;
...
if (da1[x]/da2[x] <= da1[y]/da2[y])
/* sorting code */
else
/* sorting code */

The problem was that given all the same values, the result of
evaluating the conditional expression could be different depending on
which values were kept in the wide register and which were narrowed in a
particular evaluation. This of couse would cause the sort function to fail
miserably. This is not at all unexpected and given the range of the
possible values in the array was fixed by replacing the if statement with
something like:

if (da1[y]/da2[y] - da1[x]/da2[x] > 0.001)
...

In general though, this isn't a great solution and I think that properly
placed casts should work, provided the implementation conforms to C99
(which gcc apparently does not in this case).

As Doug pointed out else-thread, this isn't even guaranteed if the
implementation isn't in IEEE-conforming mode (which it isn't required to
support). I consider this a QOI issue.

To sort an array of doubles based solely on the value of each element
though, I think a simple comparision would work properly regardless of
what values are stored where during the evaluation of the expression given
that all the values are ordered. In ther words, if "(da1[0] < da1[1])" is
true one time, it must be true every time it is evaluated if neither
elements have been modified between evaluations. Using the fmin/fmax
functions provided in C99 should work for all values ordered or not. I
can't think of any reason this wouldn't work.

I think this part still stands. What I really care about is that the
following never aborts for any ordered double values of d1 and d2:

if (d1 < d2) {
if (d1 < d2)
;
else
abort();
}

This would allow for sorting arrays of floating point values, is this
guaranteed?

Robert Gamble

Ancient_Hacker · Aug 21, 2006

I've found it best to NEVER expect floating-point equality or
non-equality.

A goodly bit of the time whoever writes such a test isnt up on all the
little jiggles that can happen with fp arithmetic. And one is rarely
actually looking for equality, more likely would be satisfied with
being within some small delta.

Walter Roberson · Aug 21, 2006

I've found it best to NEVER expect floating-point equality or
non-equality.

A goodly bit of the time whoever writes such a test isnt up on all the
little jiggles that can happen with fp arithmetic. And one is rarely
actually looking for equality, more likely would be satisfied with
being within some small delta.

I was thinking briefly about this matter the other day, and trying
to decide whether sort() could be counted upon. If floating point
equality tests are dubious, are relative relationship tests not
dubious as well?

Keith Thompson · Aug 21, 2006

I was thinking briefly about this matter the other day, and trying
to decide whether sort() could be counted upon. If floating point
equality tests are dubious, are relative relationship tests not
dubious as well?

It *should* be safe to assume that, for any two floating-point values
x and y, exactly one of the following is true:

x < y
x == y
x > y

.... as long as neither X nor Y is a NaN or Infinity (or possibly some
other exotic pseudo-value).

For example, given:

double x = 1.0;
double y = 1.0;
x /= 7.0;
x *= 7.0;

I'd be comfortable assuming that exactly one of the above conditions
holds, but I wouldn't want to make any assumptions about which one.

On the other hand, there are some values (such as 1.0) that you can
reasonably assume can be represented exactly. I'd be very surprised
if, for example, (0.5 * 2.0 != 1.0). But I suspect most
floating-point calculations aren't limited to such well-behaved
values.

What the C standard actually guarantees is a different matter.

Douglas A. Gwyn · Aug 22, 2006

Robert said:
The part that is unexpected, to me, is the second part where the result
of the multiplication is explicitly cast to double which, according to
my interpretation of the above-quoted Standard verse, requires that the
result is converted to the narrower type of double before the test for
equality if performed. ...

When the "same" mathematical value is computed twice, the
results need not be the same, so long as each result is one
of the two nearest representable values to the "true" value.
Thus, not even 0.1==0.1 is guaranteed. Presumably your
compiler takes advantage of that to retain the "spurious"
precision represented in the guard bits in some situations
and not in others.

If you have set the compiler into an IEEE-conforming mode
(__STDC_IEC_559__) then the guard bits are not supposed to be
kept in the result of each operation (F.2: representation of
the type of each result is required to conform to the IEEE
spec, correctly rounded), and the default rounding mode is
round-to-nearest. (F.2 somewhat contradicts F.7.3 with
espect to dynamic rounding precision.)

As somebody else pointed out, testing for exact equality is
generally unwise.

CBFalconer · Aug 22, 2006

Ancient_Hacker said:
I've found it best to NEVER expect floating-point equality or
non-equality.

A goodly bit of the time whoever writes such a test isnt up on all
the little jiggles that can happen with fp arithmetic. And one
is rarely actually looking for equality, more likely would be
satisfied with being within some small delta.

With gcc one can even tell it to warn on any such usage.

Dik T. Winter · Aug 22, 2006

> I've found it best to NEVER expect floating-point equality or
> non-equality.
Depends.

> A goodly bit of the time whoever writes such a test isnt up on all the
> little jiggles that can happen with fp arithmetic.

I think that Robert Gamble knew what he was doing. And he encountered a
bug in gcc.

Sometimes it is critical to know what is happening precisely.

Note that problems like Robert Gamble encountered are long standing in gcc.
The standard tightened it, but the compilers did not follow. Getting
results in a precision larger than expected can yield havoc in numerical
mathematics. When a long, long time ago I wrote some functions to
calculate some elementary functions (like the arcsine), the code heavily
depended on results rounded to the precision used. But to get a point.
It is simple to double the precision you are using when computing in
floating point. The algorithms are due to T. J. Dekker, and work when
floating point satisfies some requirements that IEEE does satisfy.
But they only work if indeed the precision of intermediate expressions
is the same as for the base type. Consider:
void xpadd(double a, double b, double *c, double *cc) {
*c = a + b;
*cc = (a > b ? (double)(a + b) - a + b : (double)(a + b) - b + a);
}
This will *not* work if a compiler does not honour the cast, but still
uses a wider precision. (And I wonder how Jacob Navia would handle this.)
(You may ponder what that routine means. Simply after execution c is
the sum of a and b properly rounded and mathematically a + b = c + cc.)

Yes, some people do know what they mean when they require equality of
floating point numbers.

Dik T. Winter · Aug 22, 2006

> On the other hand, there are some values (such as 1.0) that you can
> reasonably assume can be represented exactly.

Required by the C standard.

> I'd be very surprised
> if, for example, (0.5 * 2.0 != 1.0).

Want to know about that ternary Russian machine? (Yes, there was a series
produced.)

Keith Thompson · Aug 22, 2006

Dik T. Winter said:
Required by the C standard.

Where is that stated?

Francis Glassborow · Aug 22, 2006

Robert Gamble said:
double da1[10];
double da2[10];
int x, y;
...
if (da1[x]/da2[x] <= da1[y]/da2[y])
/* sorting code */
else
/* sorting code */

The problem was that given all the same values, the result of
evaluating the conditional expression could be different depending on
which values were kept in the wide register and which were narrowed in a
particular evaluation. This of couse would cause the sort function to fail
miserably. This is not at all unexpected and given the range of the
possible values in the array was fixed by replacing the if statement with
something like:

if (da1[y]/da2[y] - da1[x]/da2[x] > 0.001)
...

I think there should be a call to fabs() in that expression.

Francis Glassborow · Aug 22, 2006

Keith Thompson said:
On the other hand, there are some values (such as 1.0) that you can
reasonably assume can be represented exactly. I'd be very surprised
if, for example, (0.5 * 2.0 != 1.0). But I suspect most
floating-point calculations aren't limited to such well-behaved
values.

In practice I agree with you, but as a language lawyer I feel it
necessary to point out that a perverse implementation could be using
something such as a base 3 representation of floating point values and
the above would then be false.

Keith Thompson · Aug 22, 2006

Francis Glassborow said:
In practice I agree with you, but as a language lawyer I feel it
necessary to point out that a perverse implementation could be using
something such as a base 3 representation of floating point values and
the above would then be false.

Agreed.

As a fellow language lawyer, that's why I wrote that I'd be surprised
if they were unequal, not that they must be equal.

Robert Gamble · Aug 22, 2006

Keith said:
Where is that stated?

The number 1 can be exactly represented according to the model
described in 5.2.4.2.2 using any radix (b) since an exponent (e) of
zero must be allowed and b^e is 1 when e is zero. Since a floating
point number must be represented exactly if it can be exactly
represented it is guaranteed that 1 will always be represented exactly
in a floating point number. The same cannot be said for 2.

Robert Gamble

Fixed precision floating point and locale facets	4	Nov 5, 2003
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	1	Feb 1, 2004
comp.lang.c Changes to Answers to Frequently Asked Questions (FAQ)	1	Jul 4, 2004
comp.lang.vhdl FAQ part 1 of 4: general	0	Jul 8, 2003

Floating Point and Wide Registers

Robert Gamble

Robert Gamble

Robert Gamble

Robert Gamble

Robert Gamble

Michael Mair

jacob navia

Robert Gamble

Ancient_Hacker

Walter Roberson

Keith Thompson

Douglas A. Gwyn

CBFalconer

Dik T. Winter

Dik T. Winter

Keith Thompson

Francis Glassborow

Francis Glassborow

Keith Thompson

Robert Gamble

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads