double add problem

Joe Pribele · Jul 30, 2003

When I add these two numbers together I get a number that is off by very
small amount is this because double is a floating point.
This may not seem like much but when I add on couple more numbers and
then round I am off by 1 which is alot.

java
20.5114 + 1.5890000000000002 = 22.100399999999997 // not quite right

windows calculator
20.5114 + 1.5890000000000002 = 22.1004000000000002 // this is correct

public class test {
public static void main(String[] args ) {
System.out.println( 20.5114 + 1.5890000000000002 );
}
}

Marco Schmidt · Jul 30, 2003

Joe Pribele:

When I add these two numbers together I get a number that is off by very
small amount is this because double is a floating point.

Yes, floating point types by their very nature are inexact.

Check out Roedy's Java glossary or just ask Google, there are some
introductory texts on the topic.

Regards,
Marco

Mark Thornton · Jul 30, 2003

Joe said:
When I add these two numbers together I get a number that is off by very
small amount is this because double is a floating point.
This may not seem like much but when I add on couple more numbers and
then round I am off by 1 which is alot.

java
20.5114 + 1.5890000000000002 = 22.100399999999997 // not quite right

windows calculator
20.5114 + 1.5890000000000002 = 22.1004000000000002 // this is correct

public class test {
public static void main(String[] args ) {
System.out.println( 20.5114 + 1.5890000000000002 );
}
}

It seems that the current (Windows XP) version of calculator isn't using
doubles for calculation. It can produce values like
22.0004000000000001001000001, which has far more precision than a
double. If you want to do this in Java, use the java.math.BigDecimal
class. This is much slower than double, but will behave as expected.
Floating point arithmetic is faster, but more complex to understand and
use correctly.

Mark Thornton

Tim Slattery · Jul 30, 2003

Joe Pribele said:
When I add these two numbers together I get a number that is off by very
small amount is this because double is a floating point.

Floating point numbers are approximations, there's nothing you can do
about that. If you require a large degree of precision, check out the
BigDecimal class.

Tor Iver Wilhelmsen · Jul 30, 2003

Joe Pribele said:
20.5114 + 1.5890000000000002 = 22.100399999999997 // not quite right

Binary floating point numbers are not 100% accurate.

20.5114 + 1.5890000000000002 = 22.1004000000000002 // this is correct

Then approximate the calculator by using java.math.BigDecimal

Joe Pribele · Jul 30, 2003

@nntp.lndn.phub.net.cable.rogers.com>, jpribele@no_spam.bglgroup.com
says...

When I add these two numbers together I get a number that is off by very
small amount is this because double is a floating point.
This may not seem like much but when I add on couple more numbers and
then round I am off by 1 which is alot.

java
20.5114 + 1.5890000000000002 = 22.100399999999997 // not quite right

windows calculator
20.5114 + 1.5890000000000002 = 22.1004000000000002 // this is correct

public class test {
public static void main(String[] args ) {
System.out.println( 20.5114 + 1.5890000000000002 );
}
}

Thanks every one for the quick response. You confirmed my suspicions.

Joe

Roedy Green · Jul 30, 2003

When I add these two numbers together I get a number that is off by very
small amount is this because double is a floating point.

see http://mindprod.com/jgloss/floatingpoint.html

Tom McGlynn · Jul 31, 2003

Tor said:
Binary floating point numbers are not 100% accurate.

Then approximate the calculator by using java.math.BigDecimal

This subject comes up frequently and I've got some problems with
the typical discussion. It tends to say things that are not
true or at least misleading. This includes Roedy's discussion
in the glossary as well as the first statement in the reply above.
My feeling is this tends to make floating point numbers
more mysterious than they should be.

Binary floating point numbers as used by Java <are>
precise/accurate/exact. They represent a unique point on the real
number line. However the set of available numbers is not dense.
Only an infinitesimally small fraction (literally) of the
points on the number line are included in either of the
floating point representations thatJava uses.

There are two common cases where a Java process needs to choose
which one of these numbers that it can represent should be
used in the place of a number that it cannot. These approximations
are well defined, but give results different from what would
be the case if Java floats/doubles could exactly represent all numbers.

The first is the conversion from finite decimal floating
point numbers. Most numbers which have finite decimal representation
do not have a finite binary representation. E.g., the fraction 3/10 has
a finite decimal floating representation (0.3) but no finite
representation in binary floating point (it's something like
0.010011001100110011...B).

So when a user enters a string "0.3" and asks to convert that string
to a binary floating point number -- at either compile time or during
the execution of the program -- there just isn't any such number
available. The rules state that the closest available number will be
chosen, but in the vast majority of cases there will not be an exact
match. The conversion is well defined, but it is many-one. Many
(indeed an infinity) of real numbers could be converted to each of the
available numbers.

Once Java has numbers in binary, it allows the user to do arithmetic
on them. There are specific rules for how this arithmetic is done.
While these rules try to match what we are familiar with, often
the 'exact' result is a number that is not in the representable
set. E.g., with

double x= 3; double y=10; double z = x/y;

we again have a case where the value that would naively be expected
for z is not in the set of double values that Java supports. Java is
very explicit in specifying exactly which value will be selected, but
it's not quite the same value as one might anticipate. This is
of course what's happened in the example above. The number
22.1004000000000002 is not representable in Java, so it took
the nearest number in the representable set which is something
very close to 22.100399999999997. In fact it probably requires
over 50 digits to represent exactly but Java just writes out
enough digits to distinguish it from any of the other representable
numbers. [All numbers with finite binary representations also have
finite decimal representations, usually with about as many decimal
digits are there are in the binary representation.]

Roedy's Web page implies that all calculations are fuzzy. This is not
the case. If the calculated value is in the set of representable
numbers, the calculation is performed exactly, e.g., for adding,
subtracting or multiplying small integer values.
If not, then the nearest representable value is chosen.
Arithmetic is precisely defined -- there
is almost no room for different machines to chose different values --
but Java gives slightly different results than would be expected if all
real numbers were representable in the set of Java floating point
numbers.

Roedy's page also misleads users in the discussion of the
StrictMath library and the non-strictfp expressions. These have nothing
to do with guard bits in the normal sense of the word.

The StrictMath library is used to give the standard results from the
standard math functions (trig, log, etc). A Java implementation
is allowed to use a Math library which gives very slightly different
(but essentially as accurate) values, that uses a different
algorithm in the computation -- perhaps taking advantage of local
hardware. This has nothing to do with guard bits. I believe in Sun's
JDK's the two actually are the same.

There is also a strictfp qualifier that users can specify for methods
and blocks. Within a strictfp block, all intermediate results
must be in the representable set. However outside of strictfp,
intermediate results may have exponents which are outside the range
of those that are usually permitted. If no intermediate result
would overflow, underflow (or result in denormalized value), then
the strictfp qualifier makes no difference. So, for double precision
as long as the intermediate values have magnitudes between 1.e-300 and
1.e300 (or exactly 0), strictfp makes no difference. However
something like:

double x= 1.e-200;
x = x*x/x;

must return 0 in a strictfp expression: since 1.e-400 is smaller than
the smallest representable number it would underflow to 0. In a
non-strictfp expression, it is allowed to return 1.e-200 (but not
required to).

While strictfp and StrictMath address quite distinct issues,
the overwhelming majority of users can completely
ignore the existence of both.

This isn't meant as a criticism of Roedy or the other posters. However
I think that in trying to simplify the discussion there's a tendency to
use language that makes it hard to understand how floating point
numbers really work and make it seem like floating point arithmetic
isn't well defined.

Tom

Roedy Green · Aug 4, 2003

Binary floating point numbers as used by Java <are>
precise/accurate/exact.

That is a philosophical point. What does it mean to say an "number"
is accurate? It means "does it reflect the value it stands for?"

Ints can get values bang on. doubles often cannot.

Further, I don't believe the results of floating point operations are
guaranteed to be precisely correct to the bit -- e.g. presuming
infinite accuracy, then rounded.

Newbies are the ones puzzled by this. I think it easiest to explain
that floating point always has some fuzz. When they get older they
can learn the full truth.

Dale King · Aug 4, 2003

Roedy Green said:
That is a philosophical point. What does it mean to say an "number"
is accurate? It means "does it reflect the value it stands for?"

And floating point fits that defintion. There are a precise finite subset of
the real numbers that are represented in floating point and the encodings of
those values do reflect the values they stand for.

Ints can get values bang on. doubles often cannot.

That is not true. Ints can only get integers, just as doubles can only get
certain values. Both pick some finite subset of the set of real numbers. The
only difference is that the set chosen for floating point are not evenly
spaced. The distribution is more concentrated around zero and more sparse as
the values get larger. There is nothing imprecise about the operations.

Further, I don't believe the results of floating point operations are
guaranteed to be precisely correct to the bit -- e.g. presuming
infinite accuracy, then rounded.

They are for each individual operation.

Newbies are the ones puzzled by this. I think it easiest to explain
that floating point always has some fuzz. When they get older they
can learn the full truth.

I disagree. I do not believe in teaching someone a lie only to contradict it
later.

Tom McGlynn · Aug 5, 2003

Roedy said:
That is a philosophical point. What does it mean to say an "number"
is accurate? It means "does it reflect the value it stands for?"

Ints can get values bang on. doubles often cannot.

Int's can get values bang on if the underlying value is an integer
in the appropriate range. However if I try to store the value 2.2 in
an integer, the best it can do is the value 2. Similarly in integer
arithmetic 5/3 results in 1. This behavior is perfectly reasonable:
it doesn't reflect inaccuracy or fuzziness in the definition of ints
or arithmetic operations on them. Nor do the similar issues that
come up with floating point numbers.

Further, I don't believe the results of floating point operations are
guaranteed to be precisely correct to the bit -- e.g. presuming
infinite accuracy, then rounded.

IEEE arithmetic <is> guaranteed to be precise in exactly the way you
describe. From the JLS:

The Java programming language requires that floating-point arithmetic
behave as if every floating-point operator rounded its floating-point
result to the result precision. Inexact results must be rounded to the
representable value nearest to the infinitely precise result; if the
two nearest representable values are equally near, the one with its
least significant bit zero is chosen. This is the IEEE 754 standard's
default rounding mode known as round to nearest. (JLS 4.2.4)

Newbies are the ones puzzled by this. I think it easiest to explain
that floating point always has some fuzz. When they get older they
can learn the full truth.

My experience is that this approach leads to further confusion, whereas
a discussion that is couched in terms of the the set of representable
points seems to lead to a natural understanding of elementary and
advanced issues in floating point arithmetic, e.g., which values are in
the set of floats or doubles, or what strictfp really does.

Your glossary discussion of floating point includes the phrase
My general rule is, if at all possible, avoid floating point.
That's a natural reaction if one thinks of floating point numbers as
mysterious fuzzy things with inexact rules. But users should not
be looking to avoid floating point numbers. In many regimes they
are natural and seeking alternatives is counterproductive. Such
misleading elements of the glossary's floating point
discussion detract from its valuable comments on many issues.

Regards,
Tom McGlynn

Roedy Green · Aug 5, 2003

The Java programming language requires that floating-point arithmetic

certainly not functions like sin though?

Roedy Green · Aug 5, 2003

I disagree. I do not believe in teaching someone a lie only to contradict it
later.

if you read the essay I don't lie. I say that if a newbie acts as if
a demon added some fuzz to the result of your every calculation it
will keep him out of trouble.

see http://mindprod.com/jgloss/floatingpoint.html

Roedy Green · Aug 5, 2003

"The IBM Accurate Portable Mathematical library (IBM APMathLib) consists
of routines that compute some of the standard common transcendental
functions. The computed results are the exact theoretical values
correctly rounded (nearest or even) to the closest number representable
by the IEEE 754 double format."

I wonder what sort of speed penalty you get for that. You think
though that anyone designing such a library would tweak it so the
obvious points came out bang on sin ( Math.PI ) cos ( Math.PI ) tan (
Math.PI / 4 )
etc.

ghl · Aug 6, 2003

Roedy Green said:
if you read the essay I don't lie. I say that if a newbie acts as if
a demon added some fuzz to the result of your every calculation it
will keep him out of trouble.

This just reminded me of a comment I once heard about floating point
numbers:
Floating point numbers are like a pile of sand; every time you use it you
lose a little sand and pick up a little dirt.

Mark Thornton · Aug 6, 2003

Roedy said:
I wonder what sort of speed penalty you get for that. You think
though that anyone designing such a library would tweak it so the
obvious points came out bang on sin ( Math.PI ) cos ( Math.PI ) tan (
Math.PI / 4 )
etc.

No you can't tweak it like that. Math.PI must be the value of pi
correctly rounded to double. The sin, cos, etc methods must then be
applied to that rounded value not the infinite precision pi, and finally
the result is rounded again. So it is quite correct that
Math.sin(Math.PI) is not zero.

Mark Thornton

Tom McGlynn · Aug 6, 2003

Mark said:
No you can't tweak it like that. Math.PI must be the value of pi
correctly rounded to double. The sin, cos, etc methods must then be
applied to that rounded value not the infinite precision pi, and finally
the result is rounded again. So it is quite correct that
Math.sin(Math.PI) is not zero.

Mark Thornton

This is an example of how it helps to have a clean
understanding of how floating point numbers work. If one is thinking
of floating point numbers and calculations as 'fuzzy', then one could
imagine tweaking the fuzz in ways such as Roedy suggests.
However when one recognizes that floating point numbers have precisely
defined behavior, one is led inevitably to Mark's conclusions.

Looking at the source code for the library that Mark mentioned, it
looks like it works using fairly standard algorithms with a big
lookup table and with
a kind of extended precision double. Each underlying value
is broken into a base value and an offset and all additions
and multiplications involve function calls. I'd imagine
the penalty for using this library is relatively severe, something
in the factor of 3-30 range, but it's hard to tell without
testing.

By the by, it would be perfectly legal for a Java implementation
to use these accurate functions in the java.lang.Math class.
However even though these functions are more accurate than the
standard functions, they cannot be used in java.lang.StrictMath
which must give identical results on all platforms. There
portability is paramount.

Regards,
Tom McGlynn

Mark Thornton · Aug 6, 2003

Tom said:
This is an example of how it helps to have a clean
understanding of how floating point numbers work. If one is thinking
of floating point numbers and calculations as 'fuzzy', then one could
imagine tweaking the fuzz in ways such as Roedy suggests.
However when one recognizes that floating point numbers have precisely
defined behavior, one is led inevitably to Mark's conclusions.

Looking at the source code for the library that Mark mentioned, it
looks like it works using fairly standard algorithms with a big
lookup table and with
a kind of extended precision double. Each underlying value
is broken into a base value and an offset and all additions
and multiplications involve function calls. I'd imagine
the penalty for using this library is relatively severe, something
in the factor of 3-30 range, but it's hard to tell without
testing.

If you have a look at the code used for StrictMath, it isn't that simple
either.

By the by, it would be perfectly legal for a Java implementation
to use these accurate functions in the java.lang.Math class.
However even though these functions are more accurate than the
standard functions, they cannot be used in java.lang.StrictMath

It has been suggested that the specification of StrictMath be changed,
once a practical library giving perfectly rounded results has been
demonstrated. It was not known to be possible at the time the Math
specification was originally written.

Mark Thornton

Dale King · Aug 7, 2003

if you read the essay I don't lie. I say that if a newbie acts as if
a demon added some fuzz to the result of your every calculation it
will keep him out of trouble.

It will not keep him out of trouble it just keeps him ignorant. If you
want to see how far that ignorance can be carried, go take a look at this
very long thread from a while back:

http://groups.google.com/groups?th=22a6cb86dd19aa5

The fact is that you are treating floating point as if it is somehow
different from integers in this respect when in reality it isn't.

Going back to my numerical analysis class instead of fuzz, what you are
talking about is error. Each operation in a finite representation
produces the exact mathematical value plus an error term.

Take some mathematical operation which we will say is f(a,b). That is its
exact mathematical value. We will call the computer version of that
operation f'(a,b). Then

f'(a,b) = f(a,b) + e

With repeated operations the effects of these error terms can build up.

But don't delude yourself into thinking that the error term is only there
for floating point.

Taking Tom's examples we have the operation of converting a number a
value. The exact value would be nothing lost f(a) = a. For the case of
integers and a = 2.2 we have:

f.integer(2.2) = f(2.2) - 0.2

Storing 2.2 into an integer introduces an error term of -0.2. For a
double converting 2.2 into a double we get a double whose exact value is

2.20000000000000017763568394002504646778106689453125

Therefore for doubles we get:

f.double(2.2) = f(2.2) +
0.00000000000000017763568394002504646778106689453125

In this case the error term is much less than the integer operation.

Yet we perceive that integer operations are somehow more exact than
floating point. So how do we classify the differences between the two?

In numerical analysis, you are not so much interested in the exact value
of the error term but rather on what the bounds of the magnitude of it
is. In other words what is the maximum amount of error we can have. We
want to say |e| < some value.

For integers that is very easy. |e| <= 0.5. No operation can ever be off
more than 0.5.

For floating point it is that |e| < 0.5 ulp. Ulp stands for unit of last
place. A ulp is not a constant it depends on the magnitude of the value.
The size of a ulp is smallest near zero and gets larger as the value gets
larger. Plotting I believe would give you a somewhat stair-stepped
logarithmic shape or would it be exponential. I'll have to think about it
some more.

The other difference is that certain integer operations produce no error
terms. Addition, subtraction, and multiplication adds no error (we are
ignoring overflow for this discussion). Number conversion and division
can introduce error. For floating point any operation can introduce
additional error of 0.5 ulp.

One operation in particular that contributes to this notion of the
difference is that of conversion to a text string. (The subject of that
very long thread I mentioned above). This is an exact operation for
integers. The normal ways of converting a double to a text string using
String.valueOf( double ) introduces an error in the result. What that
method does is to produce the shortest string whose mathematical value
is within +/-0.5 ulp of the exact string. Another way to think of it is
that Double.parseDouble introduces some error e. String.valueOf( double )
produces an error of -e so they cancel each other out.

There is a way to convert to text that introduces no additional error. If
you convert the double to a BigDecimal object and invoke the toString
method on that, it is an exact operation. That is, in fact, how I
obtained the value 2.20000000000000017763568394002504646778106689453125
above.

I only wish I could have explained it this simply to Paul Lutus in that
thread, but he probably would not have listened anyway.

Another confusing thing to many people is a difference between the
default behavior of C and Java. While Java by default gives you a string
that is within 0.5 ulp, C will by default usually round to something like
6 decimal digits. C's printf introduces more error, but since it is
rounded to produce fewer digits it can convince people that C is more
accurate than Java.

Consider this code:

double a = 0.1;
double b = 1.1;
System.out.println( a + b );

Java will print "1.2000000000000002". If you do the equivalent code in C
using printf( "%f\n", a + b ) in place of the System.out.println you will
probably get "1.2". Someone who doesn't understand the things I have
explained here and just thinks that floating point is "fuzzy" will
conclude that Java is generating more fuzz and is less exact than C. In
reality if C is also using a 64-bit double then it computes the exact
same value.

By the way Roedy have you read this:

http://docs.sun.com/source/806-3568/ncg_goldberg.html

Judging by your glossary entry I'm not sure you have, particularly since
you did not provide a link to it. Frankly, I would much rather that
somebody read and understand that article than to just tell them that
floating point has some mystical fuzz.

Tom McGlynn · Aug 7, 2003

Dale said:
It will not keep him out of trouble it just keeps him ignorant. If you
want to see how far that ignorance can be carried, go take a look at this
very long thread from a while back:

http://groups.google.com/groups?th=22a6cb86dd19aa5

The fact is that you are treating floating point as if it is somehow
different from integers in this respect when in reality it isn't.

Click to expand...

Kind of interesting to go back to an old thread like that and
reread it. I only wish I could write with the clarity and
precision that Patricia Shanahan shows in the early parts
of that thread. I had not realized there was such a simple
method to printing out the exact value of a double (which you also
use below). Maybe people should submit their top 10 threads.
Could be a new Google service...

Regards,
Tom McGlynn

Java matrix problem	3	Sep 10, 2023
Logic Problem with BigInteger Method	2	Aug 26, 2023
Not a question, just proud and wanted to share this	1	Jan 29, 2022
Add recipes using JavaScript in table	20	Apr 17, 2023
Test case	1	May 10, 2023
Java method query	2	Mar 16, 2021
Struct Member Variables Problem	0	Jun 21, 2023
Java problem	1	Dec 26, 2019

double add problem

Joe Pribele

Marco Schmidt

Mark Thornton

Tim Slattery

Tor Iver Wilhelmsen

Joe Pribele

Roedy Green

Tom McGlynn

Roedy Green

Dale King

Tom McGlynn

Roedy Green

Roedy Green

Roedy Green

ghl

Mark Thornton

Tom McGlynn

Mark Thornton

Dale King

Tom McGlynn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads