Floating point roundoff error

J

Janna

If I am merely taking the sum of a number of doubles, say a few hundred of
them, to see if that sum is equal to a given value, should I be concerned
with floating point roundoff error? Doesn't that only occur when you start
multiplying and dividing floating poitn values, and not when you merely add
/ subtract them?
 
M

Matt Humphrey

Janna said:
If I am merely taking the sum of a number of doubles, say a few hundred of
them, to see if that sum is equal to a given value, should I be concerned
with floating point roundoff error? Doesn't that only occur when you start
multiplying and dividing floating poitn values, and not when you merely
add / subtract them?

No (finite) floating point system can exactly represent every real number
precisely so as soon as you start using any non-representable values you
will run into errors. The error is induced as soon as you create the value
in memory. Try the following:

double t = 0.1d;
double g = t + t + t;
System.out.println (g);

Matt Humphrey http://www.iviz.com/
 
P

Patricia Shanahan

Janna said:
If I am merely taking the sum of a number of doubles, say a few hundred of
them, to see if that sum is equal to a given value, should I be concerned
with floating point roundoff error? Doesn't that only occur when you start
multiplying and dividing floating poitn values, and not when you merely add
/ subtract them?

You can get floating point rounding error on addition and subtraction.
The problem is biggest if the numbers are very different in magnitude.
Each intermediate result has to be stored as a floating point number, in
some cases in an extended precision form, but still with finite precision.

To think about this, consider three significant digit decimal
arithmetic. Suppose the objective is to add 123 + 0.000123 - 123. Do it
in that order, and the first partial sum, 123.000123, rounds to 123, and
the final answer is 0. On the other hand, if you add 123 - 123 +
0.000123 the first intermediate sum is 0, and the result is 0.000123,
the same as the infinite precision result.

Exactly the same thing can happen in Java double arithmetic, especially
if there are big differences in magnitude among the numbers. Even if
they are of similar magnitude you should generally not expect them to
sum to exactly the value you expect. Floating point comparisons are
normally range tests:

(x >= expected - epsilon || x <= expected + epsilon)

where "epsilon" is a small numbers.

The exception to all this is if all the intermediate results are exactly
representable in double. For example, any value of Java int multiplied
by any power of two in a wide range is exactly representable.

Patricia
 
P

Philipp

Patricia said:
Floating point comparisons are
normally range tests:

(x >= expected - epsilon || x <= expected + epsilon)

You mean
(x >= expected - epsilon && x <= expected + epsilon)
I guess (if epsilon is positive).

Phil
 
P

Patricia Shanahan

Philipp said:
You mean
(x >= expected - epsilon && x <= expected + epsilon)
I guess (if epsilon is positive).

Thanks for the correction. I changed my mind part way through between
testing for inside the range and outside the range, and got it wrong as
a result.

Patricia
 
R

Roedy Green

If I am merely taking the sum of a number of doubles, say a few hundred of
them, to see if that sum is equal to a given value, should I be concerned
with floating point roundoff error? Doesn't that only occur when you start
multiplying and dividing floating poitn values, and not when you merely add
/ subtract them?

It depends:

1. if the numbers are all in the same order of magnitude e .g. all
between 0 and 1 you will get less roundoff errors than if one is a
trillion and the other is .000001.

2. Many numbers cannot be accurately represented as binary fractions,
e.g. .01. They are like repeaters in .33333 in grade school
arithmetic. Unless you have a deep understanding of what is going on
inside IEEE arithmetic, it is best to pretend a demon comes along at
the end of your computation and adds a small signed error to any
result.

See http://mindprod.com/jgloss/floatingpoint.html
http://mindprod.com/jgloss/ieee754.html
 
W

Wayne

Patricia said:
Thanks for the correction. I changed my mind part way through between
testing for inside the range and outside the range, and got it wrong as
a result.

Patricia

Since x and expected has similar magnitude, and epsilon is typically
a much smaller value, isn't this a better test?

if ( Math.abs(x - expected) <= epsilon )

Or to avoid a method call:

if ( x - expected <= epsilon || x - expected <= -epsilon )

(Just asking; I don't know as much as I think you do. But it seems
to me as if there is a good chance that your test ends up as
if ( x >= expected && x <= expected ), since expected +- epsilon
may end up equal to expected?)

-Wayne
 
P

Patricia Shanahan

Wayne said:
Since x and expected has similar magnitude, and epsilon is typically
a much smaller value, isn't this a better test?

if ( Math.abs(x - expected) <= epsilon )

Or to avoid a method call:

if ( x - expected <= epsilon || x - expected <= -epsilon )

(Just asking; I don't know as much as I think you do. But it seems
to me as if there is a good chance that your test ends up as
if ( x >= expected && x <= expected ), since expected +- epsilon
may end up equal to expected?)

The objective in setting epsilon is to look for a value with the
following two properties:

1. If the infinite precision answer would be expected, then the rounding
error in the calculation no greater than epsilon. Ideally, this is true
under worst case assumptions. In many cases we have to be content with
very high probability. Unless the calculation is exact, any value
satisfying this condition will be big enough to ensure that expected +/-
epsilon is not equal to expected.

2. For purposes of the application, a value within epsilon of expected
might just as well be expected.

Patricia
 
M

Mark Thornton

Wayne said:
Since x and expected has similar magnitude, and epsilon is typically
a much smaller value, isn't this a better test?

if ( Math.abs(x - expected) <= epsilon )

Or to avoid a method call:

if ( x - expected <= epsilon || x - expected <= -epsilon )

Math.abs will usually be an intrinsic, so your avoidance of the method
call might actually be slower.

Mark Thornton
 
A

Arne Vajhøj

Janna said:
If I am merely taking the sum of a number of doubles, say a few hundred of
them, to see if that sum is equal to a given value, should I be concerned
with floating point roundoff error? Doesn't that only occur when you start
multiplying and dividing floating poitn values, and not when you merely add
/ subtract them?

You may very well get a difference between the floating point result
and the equivalent mathematical result.

The question is: does it matter ?

If you add money then it probably does matter. And you should
use BigDecimal or similar.

If you are adding kilogram (or pounds) then it probably doesn't matter.

Arne
 
W

Wayne

Patricia said:
The objective in setting epsilon is to look for a value with the
following two properties:

1. If the infinite precision answer would be expected, then the rounding
error in the calculation no greater than epsilon. Ideally, this is true
under worst case assumptions. In many cases we have to be content with
very high probability. Unless the calculation is exact, any value
satisfying this condition will be big enough to ensure that expected +/-
epsilon is not equal to expected.

2. For purposes of the application, a value within epsilon of expected
might just as well be expected.

Patricia

I thought I'd post the results of comparing tests:

=================== Code ===================
// Comparison of FP equality testing methods
import static java.lang.System.out;
public class Epsilon {
public static void main ( String [] args ) {
float x = 500000.0f, expected = 500000.05f, epsilon = 0.05f;
out.printf( "x = %14.7f, expected = %14.7f, epsilon = %1.7f%n",
x, expected, epsilon );

out.println( "Test1: x >= expected "
+ "- epsilon && x <= expected + epsilon" );
out.print( " " +
(x >= expected - epsilon && x <= expected + epsilon) );
out.printf( " (expected - epsilon = %1.7f)%n",
(expected - epsilon) );

out.println( "Test2: Math.abs(x - expected) <= epsilon" );
out.println( " " + (Math.abs(x - expected) <= epsilon) );
out.printf( " Math.abs(x - expected) = %1.7f)%n",
Math.abs(x - expected) );
}
}
================== Output ==================

C:\Temp>java Epsilon
x = 500000.00000000, expected = 500000.06250000, epsilon = 0.0500000
Test1: x >= expected - epsilon && x <= expected + epsilon
true (expected - epsilon = 500000.0000000)
Test2: Math.abs(x - expected) <= epsilon
false
Math.abs(x - expected) = 0.0625000)

=============== End =======================

As can be seen, Patricia's test results in true whenever x and
expected are both >> (much greater than) epsilon, whereas my
test fails when the roundoff error is greater than epsilon.

I think this means test2 (my test) is testing if the FP round-off
error is significant, whereas test1 (Patricia's test) is testing
if two FP numbers can be considered equal, to the limit of
epsilon precision. I would think that for most applications
test1 is what is wanted.

(Patricia, Did I get it right this time? I certainly appreciate
your thoughtful and insightful posts in c.l.j.p.)

-Wayne
 
W

Wayne

Patricia said:
The objective in setting epsilon is to look for a value with the
following two properties:

1. If the infinite precision answer would be expected, then the rounding
error in the calculation no greater than epsilon. Ideally, this is true
under worst case assumptions. In many cases we have to be content with
very high probability. Unless the calculation is exact, any value
satisfying this condition will be big enough to ensure that expected +/-
epsilon is not equal to expected.

2. For purposes of the application, a value within epsilon of expected
might just as well be expected.

Patricia

I thought I'd post the results of comparing tests:

=================== Code ===================
// Comparison of FP equality testing methods
import static java.lang.System.out;
public class Epsilon {
public static void main ( String [] args ) {
float x = 500000.0f, expected = 500000.05f, epsilon = 0.05f;
out.printf( "x = %14.7f, expected = %14.7f, epsilon = %1.7f%n",
x, expected, epsilon );

out.println( "Test1: x >= expected "
+ "- epsilon && x <= expected + epsilon" );
out.print( " " +
(x >= expected - epsilon && x <= expected + epsilon) );
out.printf( " (expected - epsilon = %1.7f)%n",
(expected - epsilon) );

out.println( "Test2: Math.abs(x - expected) <= epsilon" );
out.println( " " + (Math.abs(x - expected) <= epsilon) );
out.printf( " Math.abs(x - expected) = %1.7f)%n",
Math.abs(x - expected) );
}
}
================== Output ==================

C:\Temp>java Epsilon
x = 500000.00000000, expected = 500000.06250000, epsilon = 0.0500000
Test1: x >= expected - epsilon && x <= expected + epsilon
true (expected - epsilon = 500000.0000000)
Test2: Math.abs(x - expected) <= epsilon
false
Math.abs(x - expected) = 0.0625000)

=============== End =======================

As can be seen, Patricia's test results in true whenever x equals
expected to the limit of the precision of those values, which may
mean they are equal even when the difference > epsilon as long as
both x and expected are >> (much greater than) epsilon, whereas my
test fails whenever the roundoff error is greater than epsilon.

I think this means test2 (my test) is testing if the FP round-off
error is significant, whereas test1 (Patricia's test) is testing
if two FP numbers can be considered equal, to the limit of the
precision or to the value of epsilon, whichever is greater.

I think that for most applications test1 is what is wanted. (That
is, who cares what the error is as long as it is "insignificant"?

When using type float, you only get 6 significant decimal digits, so
"x000000y" won't have a significant value for digit y no matter where
in the number you put the decimal point. So, if the two numbers
have the same 6 leftmost digits in a float, the two numbers should be
considered equal regardless of epsilon. Patricia's test has this
property, mine doesn't.

The only case where a smaller (then the significant digits) epsilon
matters is when errors can accumulate, say in code such as:

float total = 0.0f;
for ( many iterations ) {
float val = some_expression_with_round_off_error;
total += val;
}

(Patricia, did I get it right this time? I certainly appreciate
your thoughtful and insightful posts in c.l.j.p.) I dimly remember
I once took a numerical methods course that dealt with this issue;
I wish I could remember the details.

-Wayne
 
L

Lew

Wayne said:
When using type float, you only get 6 significant decimal digits, so

That's not quite accurate, as Patricia Shanahan has pointed out on this
thread. On the average a float has about 6-7 significant decimal digits'
precision, but in fact it has a binary precision, not a decimal precision.
Some numbers that can be expressed precisely within the set of float numbers
will require more than 6-7 decimal digits to express precisely, possibly an
infinite number.
 
P

Patricia Shanahan

Lew said:
That's not quite accurate, as Patricia Shanahan has pointed out on this
thread. On the average a float has about 6-7 significant decimal
digits' precision, but in fact it has a binary precision, not a decimal
precision. Some numbers that can be expressed precisely within the set
of float numbers will require more than 6-7 decimal digits to express
precisely, possibly an infinite number.

Actually, any finite length binary fraction is also a finite length
decimal fraction. Using "^" for exponentiation, a finite length binary
fraction can be expressed as a/2^n, for integer a and non-negative
integer n. That is equal to (a*5^n)/10^n.

It works because 2 is a factor of 10. In general, conversion from radix
A to radix B can always be done exactly, in a finite number of radix B
digits, if each prime factor of A is also a factor of B. The problem
with conversion from decimal to binary is that 5 is a prime factor of 10
but not of 2.

Patricia
 
P

Patricia Shanahan

Andrew said:
Arne Vajhøj wrote:
..

I have heard that if dealing with $, one should use 'cents'.
Or rather, use in int representation of cents, and only format
to dollars on output. Thoughts?

Java int arithmetic offers only one rounding policy, rounding towards
zero. BigDecimal offers 8 rounding modes, including throwing an
exception if rounding is needed. You *may* need to manage rounding
manually in BigDecimal. It is quite unlikely that you can avoid it for a
cents-significant financial calculation in int.

Patricia
 
L

Lew

Andrew said:
Arne Vajhøj wrote:
...

I have heard that if dealing with $, one should use 'cents'.
Or rather, use in int representation of cents, and only format
to dollars on output. Thoughts?

Others say to keep a long of mils (if we're talking U.S. currency units),
giving one extra decimal digit of accuracy at the expense of one in range.

int probably isn't wide enough for useful money calculations - even at a unit
of cents it will only give you up to a little over 20 million dollars.

BigDecimal handles scaling and has no upper limit, although the upper limit of
a long (approximately $ 10^16 if we base on mils) is probably large enough for
most purposes.
 
P

Patricia Shanahan

Wayne wrote:
....
The only case where a smaller (then the significant digits) epsilon
matters is when errors can accumulate, say in code such as:

float total = 0.0f;
for ( many iterations ) {
float val = some_expression_with_round_off_error;
total += val;
}

For non-trivial calculations, I would indeed only count on at most 4 or
5 significant digits from float. In practice, float is definitely useful
in the following conditions:

1. You only want a few digits.

2. You have a very large set of numbers, so large that halving the space
for floating point arrays and files is a significant advantage.

3. You have a numerical analyst on the staff to monitor stability of the
calculation and determine the proper epsilon values at various points in
the calculation.

I have seen it used very effectively, with all three conditions met, by
a petroleum exploration consultancy processing seismic tapes. It also
has some graphics applications, but the number of bits needed to pick
the right pixel keeps increasing...

Otherwise, using float is far more trouble than it is worth compared to
double.

Patricia
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,021
Latest member
AkilahJaim

Latest Threads

Top