Tor said:
Binary floating point numbers are not 100% accurate.
Then approximate the calculator by using java.math.BigDecimal
This subject comes up frequently and I've got some problems with
the typical discussion. It tends to say things that are not
true or at least misleading. This includes Roedy's discussion
in the glossary as well as the first statement in the reply above.
My feeling is this tends to make floating point numbers
more mysterious than they should be.
Binary floating point numbers as used by Java <are>
precise/accurate/exact. They represent a unique point on the real
number line. However the set of available numbers is not dense.
Only an infinitesimally small fraction (literally) of the
points on the number line are included in either of the
floating point representations thatJava uses.
There are two common cases where a Java process needs to choose
which one of these numbers that it can represent should be
used in the place of a number that it cannot. These approximations
are well defined, but give results different from what would
be the case if Java floats/doubles could exactly represent all numbers.
The first is the conversion from finite decimal floating
point numbers. Most numbers which have finite decimal representation
do not have a finite binary representation. E.g., the fraction 3/10 has
a finite decimal floating representation (0.3) but no finite
representation in binary floating point (it's something like
0.010011001100110011...B).
So when a user enters a string "0.3" and asks to convert that string
to a binary floating point number -- at either compile time or during
the execution of the program -- there just isn't any such number
available. The rules state that the closest available number will be
chosen, but in the vast majority of cases there will not be an exact
match. The conversion is well defined, but it is many-one. Many
(indeed an infinity) of real numbers could be converted to each of the
available numbers.
Once Java has numbers in binary, it allows the user to do arithmetic
on them. There are specific rules for how this arithmetic is done.
While these rules try to match what we are familiar with, often
the 'exact' result is a number that is not in the representable
set. E.g., with
double x= 3; double y=10; double z = x/y;
we again have a case where the value that would naively be expected
for z is not in the set of double values that Java supports. Java is
very explicit in specifying exactly which value will be selected, but
it's not quite the same value as one might anticipate. This is
of course what's happened in the example above. The number
22.1004000000000002 is not representable in Java, so it took
the nearest number in the representable set which is something
very close to 22.100399999999997. In fact it probably requires
over 50 digits to represent exactly but Java just writes out
enough digits to distinguish it from any of the other representable
numbers. [All numbers with finite binary representations also have
finite decimal representations, usually with about as many decimal
digits are there are in the binary representation.]
Roedy's Web page implies that all calculations are fuzzy. This is not
the case. If the calculated value is in the set of representable
numbers, the calculation is performed exactly, e.g., for adding,
subtracting or multiplying small integer values.
If not, then the nearest representable value is chosen.
Arithmetic is precisely defined -- there
is almost no room for different machines to chose different values --
but Java gives slightly different results than would be expected if all
real numbers were representable in the set of Java floating point
numbers.
Roedy's page also misleads users in the discussion of the
StrictMath library and the non-strictfp expressions. These have nothing
to do with guard bits in the normal sense of the word.
The StrictMath library is used to give the standard results from the
standard math functions (trig, log, etc). A Java implementation
is allowed to use a Math library which gives very slightly different
(but essentially as accurate) values, that uses a different
algorithm in the computation -- perhaps taking advantage of local
hardware. This has nothing to do with guard bits. I believe in Sun's
JDK's the two actually are the same.
There is also a strictfp qualifier that users can specify for methods
and blocks. Within a strictfp block, all intermediate results
must be in the representable set. However outside of strictfp,
intermediate results may have exponents which are outside the range
of those that are usually permitted. If no intermediate result
would overflow, underflow (or result in denormalized value), then
the strictfp qualifier makes no difference. So, for double precision
as long as the intermediate values have magnitudes between 1.e-300 and
1.e300 (or exactly 0), strictfp makes no difference. However
something like:
double x= 1.e-200;
x = x*x/x;
must return 0 in a strictfp expression: since 1.e-400 is smaller than
the smallest representable number it would underflow to 0. In a
non-strictfp expression, it is allowed to return 1.e-200 (but not
required to).
While strictfp and StrictMath address quite distinct issues,
the overwhelming majority of users can completely
ignore the existence of both.
This isn't meant as a criticism of Roedy or the other posters. However
I think that in trying to simplify the discussion there's a tendency to
use language that makes it hard to understand how floating point
numbers really work and make it seem like floating point arithmetic
isn't well defined.
Tom