float point arithmetic a-a != 0.0

S

Sean

What I will describe is an abstraction of a problem I encountered.
Let's say we have a positive double array x[],
use a for loop I find out the minimal value of in x[],
and let y = min{ x[] },
then I use another for loop in which I do this:

z = x - y
if (z == 0.0) min_index = i;

min_index is initially set to -1

I use cygwin gcc 3.4.4.
I wrote c code simpliy as I describe above, then I have no problem
finding
the right min_index

But in a more complicated code which basiclly does the same thing,
then weird problem happens.
If I switch the optimization option off, then I get the right result.
If I compile the code with -O2/3/4 then min_index = -1
i.e. non of the values in z[] equals to 0.
add a printf to print out the smallest value of z[],
then I find that the smallest value is very close to zero (~10^-15
10^-18)
but not zero.

My quesiton is: is it possible to get a-a !=0 but close to zero in FP
arithmetic ?
Is FP arithmetic deterministic? Is there other errors in FP arithmetic
other than
rounding errors?


I found an article which seems relevant
http://hal.archives-ouvertes.fr/hal-00128124/en/
In this paper, the following paragraphy says something that is very
new to me.
Can anyone give me some quick explainations?

Despite, or perhaps because of, the prevalence of “IEEE-compliant”
systems,
there exist a number of myths of what IEEE-compliance really entails
from the
point of view of programsemantics. We shall discuss the following
myths, among
others:

• “Since C’s float (resp. double) type is mapped to IEEE single (resp..
double) precision arithmetic, arithmetic operations have a uniquely
de-
fined meaning across platforms.”

• “Arithmetic operations are deterministic; that is, if I do z=x+y in
two
places in the same program and my program never touches x and y in the
meantime, then the results should be the same.”

• A variant: “If x < 1 tests true at one point, then x < 1 stays true
later
if I never modify x.”

• “The same program, strictly compliant with the C standard with no
“un-
defined behaviours”, should yield identical results if compiled on the
same
IEEE-compliant platform by different compliant compilers.”
 
M

Malcolm McLean

• “The same program, strictly compliant with the C standard with no
“un-defined behaviours”, should yield identical results if compiled on the
same IEEE-compliant platform by different compliant compilers.”
That's not the case. In the case of floating point, subtracting a
number from itself should always yield zero (unless the number is
nan), whilst subtracting an integer from an integer of the same value
should also always give zero. However subtracting two reals which are
mathematically identical but calculated differently (eg 0.5 * sqrt(2)
and sin(M_PI/4) ) will often give a result that is close to but not
identical with zero. Whilst the results are deterministic, they can
vary from floating point unit to floating point unit or library to
library, and in practise it is best to regard them as a very tiny
random error.
 
M

Mark Dickinson

Sean said:
What I will describe is an abstraction of a problem I encountered.

[problem details snipped]
then I find that the smallest value is very close to zero (~10^-15
10^-18)
but not zero.

My quesiton is: is it possible to get a-a !=0 but close to zero in FP
arithmetic ?
Is FP arithmetic deterministic? Is there other errors in FP arithmetic
other than
rounding errors?


I found an article which seems relevant
http://hal.archives-ouvertes.fr/hal-00128124/en/

That article is very relevant, and well-worth reading thoroughly if
you have time!
In this paper, the following paragraphy says something that is very
new to me.
Can anyone give me some quick explainations?
[...]

It's difficult to improve on what's in the article. But in short, the
number one cause of problems like the one you mention (assuming that
you're already happy with the basics of floating-point) is that
arithmetic calculations can legally be carried out in greater
precision that you expect, and that it can be hard to predict when
this is going to happen: a single expression might appear identically
in two different places in a piece of code, and be evaluated with
extended precision in one place but with the expected precision in the
second.

In practice, this is a particular problem when working with doubles on
x86 hardware, especially with 32-bit operating systems: doubles
typically have 53-bit precision on such hardware, but the x87 FPU
works in 64-bit precision (though this can be changed), and the x87
FPU registers all store values with a precision of 64 bits. When a
value of type double is 'spilled' from an x87 FPU register to memory,
it gets rounded from 64-bit precision to 53-bit precision, and this
rounding potentially changes the value. The main problem is that it's
hard to control or predict when such spilling occurs. Things are
slowly getting better, though: most 64-bit operating systems on x86-64
use the SSE2 instructions for floating-point arithmetic instead of the
x87 FPU, and the SSE2 instructions work with 53-bit precision
throughout, eliminating most of this type of problem (though you can
probably still have some fun when mixing floats and doubles).

There are various secondary problems, too: double rounding, double
rounding on underflow, special treatment of nans and subnormals on
some platforms, etc., but the above is probably the major thing to be
aware of.

There's lots missing above: read the article for details and
examples!
 
B

Ben Bacarisse

Sean said:
What I will describe is an abstraction of a problem I encountered.
Let's say we have a positive double array x[],
use a for loop I find out the minimal value of in x[],
and let y = min{ x[] },
then I use another for loop in which I do this:

z = x - y
if (z == 0.0) min_index = i;

min_index is initially set to -1


I am sure you've worked this out already, but it would be better to
store the min_index while finding the minimum value.
I use cygwin gcc 3.4.4.
I wrote c code simpliy as I describe above, then I have no problem
finding
the right min_index

But in a more complicated code which basiclly does the same thing,
then weird problem happens.
If I switch the optimization option off, then I get the right result.
If I compile the code with -O2/3/4 then min_index = -1
i.e. non of the values in z[] equals to 0.
add a printf to print out the smallest value of z[],
then I find that the smallest value is very close to zero (~10^-15
10^-18)
but not zero.

My quesiton is: is it possible to get a-a !=0 but close to zero in FP
arithmetic ?

Yes, though that exact example (a-a) is unlikely because the optimiser
will probably replace the subtraction. On some hardware, temporary
results are held to greater precision than the values involved and I
suspect the optimiser has found a way to avoid a re-load by using a
value left in one of these extended precision registers. The effect
being that the fresh value from, say, the array does not exactly equal
the old one that was stored in the register.

I found an article which seems relevant
http://hal.archives-ouvertes.fr/hal-00128124/en/
In this paper, the following paragraphy says something that is very
new to me.
Can anyone give me some quick explainations?

The most helpful paper you can read is likely to be Goldberg's famous
paper that has already been cited. It is hard to know what to trust
when searching the web, but Goldberg is reliable.

<snip>
 
B

Ben Bacarisse

Malcolm McLean said:
That's not the case. In the case of floating point, subtracting a
number from itself should always yield zero (unless the number is
nan),

Isn't this is what the OP is doing? One of the x is y (y was set
from it) yet none of the x - y are equal to zero.
whilst subtracting an integer from an integer of the same value
should also always give zero. However subtracting two reals which are
mathematically identical but calculated differently (eg 0.5 * sqrt(2)
and sin(M_PI/4) ) will often give a result that is close to but not
identical with zero. Whilst the results are deterministic, they can
vary from floating point unit to floating point unit or library to
library, and in practise it is best to regard them as a very tiny
random error.

Because of the permission to use extended precision temporary results,
I think it is safer to assume that all values might have been
"calculated differently" even if they all come from the same
object by nothing more complex than assignment.
 
T

Tim Prince

What I will describe is an abstraction of a problem I encountered.
Let's say we have a positive double array x[],
use a for loop I find out the minimal value of in x[],
and let y = min{ x[] },
then I use another for loop in which I do this:

z = x - y
if (z == 0.0) min_index = i;

min_index is initially set to -1

I use cygwin gcc 3.4.4.
I wrote c code simpliy as I describe above, then I have no problem
finding
the right min_index

But in a more complicated code which basiclly does the same thing,
then weird problem happens.
If I switch the optimization option off, then I get the right result.
If I compile the code with -O2/3/4 then min_index = -1
i.e. non of the values in z[] equals to 0.
add a printf to print out the smallest value of z[],
then I find that the smallest value is very close to zero (~10^-15
10^-18)
but not zero.


As others have pointed out, this probably implies you have suggested the
compiler use extra precision in your compilation. This would happen by
default for 32-bit gcc, or even with the otherwise useful option
-march=pentium-m, unless you add -mfpmath=sse
If you wish to discuss effects of your gcc options, gcc-help mailing
list would be more suitable than c.l.c, but you should be more specific
about the options you chose.
 
E

Ed Prochak

You might find the following article somewhat relevant.

What Every Computer Scientist Should Know About Floating-Point Arithmetichttp://docs.sun.com/source/806-3568/ncg_goldberg.html

Regards.

One phrase to keep handy if looking for more information is:
applied numerical methods

HTH,
ed
 
J

James Dow Allen

I am sure you've worked this out already, but it would be better to
store the min_index while finding the minimum value.

As usual, Ben gives the correct answers. The mentioned change
would be correct even without this floating point issue: given
two ways to code something, choose the way whose correctness
is most obvious.
Yes, though that exact example (a-a) is unlikely because the optimiser
will probably replace the subtraction.

I'll guess that (a-a != 0) is impossible on almost all machines
even *without* optimization. The Subject line in OP is misleading.
Instead, I'd guess the "minimum failing code example" may need only
'a == b' without any subtraction, but with enough intervention so
that either a or b has taken a trip through storage, as Ben implies:
On some hardware, temporary
results are held to greater precision than the values involved and I
suspect the optimiser has found a way to avoid a re-load by using a
value left in one of these extended precision registers.
The most helpful paper you can read is likely to be Goldberg's famous
paper that has already been cited.

It does look outstanding. The excitement makes me regret
that 99.9%+ of my programming was strictly fixed-point.

<anecdote>
In 1969, a "super-programmer" told me about a computer which
never got the same result twice, and for years this confused me.
Only much later did I realize he must have been talking about
the programming problem, somewhat related to OP, that
eventually led to Lorenz' 1972 paper:
"Predictability: Does the Flap of a Butterfly's Wings in
Brazil Set Off a Tornado in Texas?"
</anecdote>

James
 
M

Malcolm McLean

I'll guess that (a-a != 0) is impossible on almost all machines
even *without* optimization.  
Most floating point units allow nan. So I'd hope that the optimiser
didn't replace the call.
 
B

Ben Bacarisse

Malcolm McLean said:
Most floating point units allow nan. So I'd hope that the optimiser
didn't replace the call.

That's a good point.

Of course, if the compiler can prove that 'a' is not a NAN[1] it /can/
optimise, but that proof is probably impossible in most practical
situations. It might be relatively simple on systems that only have
signalling NANs but such systems are, I think, rare -- probably rarer
than those that have no NANs at all.

[1] I'd like to avoid the double negative but "is a number" is not
precise enough.
 
E

Eric Sosman

Malcolm McLean said:
Most floating point units allow nan. So I'd hope that the optimiser
didn't replace the call.

That's a good point.

Of course, if the compiler can prove that 'a' is not a NAN [...]

... and also not an infinity ...
 
B

Ben Bacarisse

Eric Sosman said:
Malcolm McLean said:
Yes, though that exact example (a-a) is unlikely because the optimiser
will probably replace the subtraction.

I'll guess that (a-a != 0) is impossible on almost all machines
even *without* optimization.

Most floating point units allow nan. So I'd hope that the optimiser
didn't replace the call.

That's a good point.

Of course, if the compiler can prove that 'a' is not a NAN [...]

... and also not an infinity ...

Another good point. In case anyone else is led astray, I was confused
by this passage in the Goldberg paper:

"The rule for determining the result of an operation that has
infinity as an operand is simple: replace infinity with a finite
number x and take the limit as x -> oo."

I though this covered the case in question but I missed the "an
operand" -- it does not apply to both being infinity. Table D-3
applies instead: oo - oo is NaN which is all perfectly reasonable.

You'd never think I used to know this stuff...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top