higher precision doubles

BGB · Aug 7, 2011

On 8/7/2011 8:24 AM, Jan Burse wrote:
...

IEEE 754 permits extended forms of the two precisions, 32 bit and 64
bit. Essentially, Java floating point arithmetic is a simplification of
IEEE 754 without e.g. user selection of rounding mode.

newer IEEE 754 added several new types:
binary16, 16-bit binary floats (IIRC: 1 sign, 5 bits exponent, 10 bits
mantissa, bias=15);
binary128, 128-bit floats (IIRC: 1 sign, 15 exponent, 112 mantissa,
bias=16383).

decimal32, decimal64, decimal128. these are like the binary formats, but
use a fixed-width decimal representation and are power-of-10 based.

One can often make some particular combination of calculations give a
more precise answer by some variations in the arithmetic. However, this
does not give much indication of the real requirements.

Do calculations involving trig functions of integer multiples of pi need
to be exact in some application, even if trig functions of other angles
have normal rounding behavior? If so, why? Or do you need trig functions
in general to be more precise?

yeah, I have little idea what the OP wants and why either.

more precision will not suddenly make the calculations exact...

Eric Sosman · Aug 7, 2011

[...]
So here I can restate my question:

I would like for whatever reason work with 80bit
floats as defined above in Java. I am interested
in the full set of arithmetic functions, I/O and
trigonometric functions. How could I do that?

You have already said "So pointing me to JLS is like turning
cycles, only confirming that Java has only float and double." That
is, the JLS has already given you an answer -- but you don't like
the answer, and keep asking for a different one. Do you really
think your persistent rejection of "No" is productive?

"I want to climb the rainbow." "The rainbow is an illusion of
refraction and angles, not a physical climbable thing." "Why don't
you tell me how to climb the rainbow instead of saying `No' all the
time?"

Jan Burse · Aug 8, 2011

Eric said:
You have already said "So pointing me to JLS is like turning
cycles, only confirming that Java has only float and double." That
is, the JLS has already given you an answer -- but you don't like
the answer, and keep asking for a different one. Do you really
think your persistent rejection of "No" is productive?

You are the first one that says "no". One could of
course do something along the following lines:

public class DoubleExt {
private long mantissa;
private short exponent;
}

And then make a package that interfaces with some
of the known C libs for 80bit floats.

But would this be considered the best practice?

Best Regards

Jan Burse · Aug 8, 2011

Hi

Just assume I want to put the functionality into an Applet. Means
somehow the aspect of supporting a couple of plattforms and
architectures is important.

According to stack overflow there are a couple of solutions around:
http://stackoverflow.com/questions/...n-doubles-and-trigonometric-functions-in-java
Apfloat and JScience.

I guess creating objects and then dropping them again is not so
much a problem in modern VMs. At least I saw recently that for
example the 64-bit JDK is very good in continously reclaiming
such objects. And there are also papers that show that the Java
GC is more efficient than a typical C malloc/free, for some Java
specific reasons how the heap supplies objects.

Also one could implement mutable objects (instances) instead of
value objects (instances). So that instead of an API of the form:

myMath:
myImmutableObjectClass sin(myImmutableObjectClass x);

One could also do:

myMath:
sin(myMutableObjectClass x, myMutableObjectClass res);

And thus safe some object creations programmatically by reusing
objects were appropriate.

Bye

Jan Burse · Aug 8, 2011

Jan said:
According to stack overflow there are a couple of solutions around:
http://stackoverflow.com/questions/...n-doubles-and-trigonometric-functions-in-java

Apfloat and JScience.

But I did not yet have time to put my hands on Apfloat,
and it is also not clear what forms the basis for the
algorithms there.

Usually Taylor expansion leads to nowhere. For sin and
the like for example developing polynom coefficents from
a set of orthogonal chebyshev polynoms gives better
results.

But of course there is a tradeof between accuracy and
computation time. See for example page 17/18 of:
http://rnc7.loria.fr/astafiev.pdf

Bye

Jan Burse · Aug 8, 2011

BTW: Hope that the drop of indexes (shares, currency, etc..)
is not a problem of float calculation in todays high frequency
trading.

The following slides mention such a problem:
http://www.cl.cam.ac.uk/~jrh13/slides/jnao-02jun06/slides.pdf

Quote: Page 2
"Naive use of floating-point arithmetic
In 1982 the Vancouver stock exchange index was established
at a level of 1000. A couple of years later the index
was hitting lows of around 520. The cause was repeated
truncation of the index to 3 decimal digits on each
recalculation, several thousand times a day. On correction,
the stock index leapt immediately from 574.081 to 1098.882."

BGB · Aug 8, 2011

You are the first one that says "no". One could of
course do something along the following lines:

public class DoubleExt {
private long mantissa;
private short exponent;
}

And then make a package that interfaces with some
of the known C libs for 80bit floats.

But would this be considered the best practice?

actually, if you really want 80-bit floats in Java, you can use the
above as a starting point (just add a boolean or similar for the sign),
and proceed to implement the rest directly in Java (calling into C would
also work, and probably be more efficient, but JNI or JNA add their own
levels of pain, among other issues...).

addition and subtraction are mostly just shifting and adding and similar
(some fudging required).

multiplication is mostly doing a 128-bit integer multiply and discarding
the low 64 bits, adding the exponents. there are ways to "cheat"
(avoiding performing a 128-bit multiply) but these cost accuracy (some
fudging required, as the output of z=1.x*1.y is in the range of 1<=z<4).

division is mostly calculating the reciprocal of the second value, and
multiplying the first by this (can be done using Newton's Method).

square root is, again, Newton's Method.

the trigonometric functions can, in turn, be implemented using the
Taylor Series (building on the former arithmetic operators).

yes, these operations are not exactly necessarily "fast"...

Jan Burse · Aug 8, 2011

BGB said:
division is mostly calculating the reciprocal of the second value, and
multiplying the first by this (can be done using Newton's Method).

I don't know exactly what you understand by
this algorithm. But if I understand this as
calculating the reciprocal first adn then
multiplying, you will have as a result:

o(x*o(1/y))

Instead of

o(x/y).

you will usually get less accurate results
since by the reciprocal-math option method
two approximations will be involved.

This is just a guess from the hip. But here
you see a simple excel verifying the claim:

1 2 3 4 5 6 7 8
1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
5 0.00 0.00 0.00 0.00 0.00 -0.01 -0.01 0.00
6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
8 0.00 0.00 0.01 0.00 0.00 -0.01 0.00 0.00

Used formula:
=ROUND($G6/H$5;2)-ROUND($G6*ROUND(1/H$5;3);2)

1 2 3 4 5 6 7 8
1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
3 0.00 0.00 0.01 0.00 0.00 0.01 0.00 0.00
4 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00
5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
6 0.00 0.00 0.01 0.00 0.00 0.01 0.00 0.00
7 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00
8 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.00

Used formula:
=FLOOR($G6/H$5;2)-FLOOR($G6*FLOOR(1/H$5;3);2)

Errors are also seen for larger numbers of digits.

Bye

Joshua Cranmer · Aug 8, 2011

And BTW the negative sign in front of the zero anyway indicates
that some rounding was going on, and not a true zero was returned
by the computation. And you could try x == 0 || x == -0, if this
is possible, and you would still get false.

Several notes:
1. 0 is an integer, so -0 == 0.
2. 0.0 == -0.0 is still true, since both positive and negative zero
compare as equals. Figuring out how to distinguish between the two can
get really aggravating if you have one of the few cases where it
actually, truly matters.
3. Java still prints -0.0 (floating point numbers are signed magnitude
and therefore have both +0.0 and -0.0). And there are cases where -0.0
can come up without rounding: -1/Infinity is a classic one, as is the
"other" boundary in trigonometric cases. You can get a -0.0 without it
being the result of an underflow in negative numbers.
4. Don't compare equal to a floating point number, with the possible
exception of 0/Infinity/NaN (that's a fun one [1]!) to test for possible
overflow/underflow/invalid input cases.

[1] NaN is defined in IEEE 754 such that it fails all comparison tests.
Which means NaN < 0, NaN > 0, and, most bizarrely, NaN != NaN. So x ==
NaN will never return true.

Joshua Cranmer · Aug 8, 2011

Just assume I want to put the functionality into an Applet. Means
somehow the aspect of supporting a couple of plattforms and
architectures is important.

Using native APIs in an applet is impossible without signing your
applet. This would either imply using BigDecimal, signing your code, or
making do with a pure software implementation.

Also note that not all architectures support an 80-bit
extended-precision floating point format. Not that ARM sports an FPU,
but its most common FPU coprocessor fails to implement 80-bit width, and
I should mention that ARM processors are by far much more common than
the install base of x86. In fact, I think the only processor with a
significant install base that has hardware support for a floating point
number larger than 64 bits is the x86.

Also one could implement mutable objects (instances) instead of
value objects (instances). So that instead of an API of the form:

Small mutable objects like this are likely to negatively impact GC.

One could also do:

myMath:
sin(myMutableObjectClass x, myMutableObjectClass res);

If you're only using the mutable object for the return value, you are
probably taking a very small performance hit.

supercalifragilisticexpialadiamaticonormalizeringe · Aug 9, 2011

In general, a Java implementation must make its floating point
arithmetic give the same results as the 32 and 64 bit basic formats. If
"strictfp" is off, the implementation can carry extra extra exponent
bits, avoiding some cases of overflow and underflow, but is not required
to do so. Even without strictfp, the mantissa sizes and therefore the
precision are fixed.

How does that interact with JIT, though? On x86, the simplest way for
JIT to make non-strictfp code use the FPU would be to just load the
initial values into the (80-bit-wide!) registers and perform FADDs,
FMULs, etc. on them. As long as the computation stayed in registers the
higher precision then ought to remain in effect -- for JITted code.
Adding extra code to mask off 16 of the register bits (or the mantissa
subset of the extra bits) after every FP op would slow things down. Is
the JLS interpreted to require the JIT do this (for non-strictfp code)?
And, if not, what does the HotSpot JIT do in actuality?

BGB · Aug 9, 2011

I don't know exactly what you understand by
this algorithm. But if I understand this as
calculating the reciprocal first adn then
multiplying, you will have as a result:

o(x*o(1/y))

Instead of

o(x/y).

you will usually get less accurate results
since by the reciprocal-math option method
two approximations will be involved.

yes, but try to find a good way to directly calculate x/y...

it is a much easier problem to calculate x*(1.0/y)...

Newton's Method is a reasonably good way to calculate 1.0/y, and in my
case (with 128-bit floats), usually manages to converge in approx 12
iterations.

here is some code from my own implementation (in C):

fv128_t fv128_rcp(fv128_t a)
{
fv128_t f, g, xo, xn;
int i;

if(fv128_eq1p(a))
return(a);

i=fv128_getExponent(a);
xo=fv128_pow2x(fv128_const_one, -i);

for(i=0; i<100; i++)
{
f=fv128_mul(a, xo);
g=fv128_sub(fv128_const_two, f);
xn=fv128_mul(xo, g);
if(fv128_eqp(xo, xn))
return(xn);
xo=xn;
}
return(xn);
}

fv128_t fv128_div(fv128_t a, fv128_t b)
{
return(fv128_mul(a, fv128_rcp(b)));
}

<snip>

Jan Burse · Aug 9, 2011

BGB said:
yes, but try to find a good way to directly calculate x/y...
it is a much easier problem to calculate x*(1.0/y)...

Newton's Method is a reasonably good way to calculate 1.0/y, and in my
case (with 128-bit floats), usually manages to converge in approx 12
iterations.

here is some code from my own implementation (in C):

fv128_t fv128_rcp(fv128_t a)
{
fv128_t f, g, xo, xn;
int i;

if(fv128_eq1p(a))
return(a);

i=fv128_getExponent(a);
xo=fv128_pow2x(fv128_const_one, -i);

for(i=0; i<100; i++)
{
f=fv128_mul(a, xo);
g=fv128_sub(fv128_const_two, f);
xn=fv128_mul(xo, g);
if(fv128_eqp(xo, xn))
return(xn);
xo=xn;
}
return(xn);
}

fv128_t fv128_div(fv128_t a, fv128_t b)
{
return(fv128_mul(a, fv128_rcp(b)));
}

<snip>

Yeah good old Newton.

f(x) = a - 1/x.
zero at: 1/a.

f'(x) = 1 / x^2.
x_n+1 = x_n - f(x_n)/f'(x_n)
= x_n - (a - 1/x_n) / (1/x_n^2)
= x_n - a * x_n^2 + x_n
= x_n * (2 - a * x_n)

But diverge or stays constant in certain intervals.

Here for a=1.5:

x_1=-2/3: Asymptotically reaches -inf
x_1=-1/3: Asymptotically reaches -inf
x_1=0: Stays constant at 0
x_1=1/3: Converges to 2/3
x_1=2/3: Converges to 2/3
x_1=1: Converges to 2/3
x_1=1 1/3: Stays constant at 0 after x_2
x_1=1 2/3: Asymptotically reaches -inf

And does eventually not converge very good:

x_1=0.0001: Slowly recovers
x_1=1 1/3 + 0.0001: Slowly recovers

Hope your initial pick is good.

Bye

Jan Burse · Aug 9, 2011

Patricia said:
My understanding when strictfp was being added was that the x86
instruction set allowed for efficient rounding to the correct 32 and 64
bit mantissa widths, but did not allow for the overflow and underflow
processing to be done based on the correct exponent widths.

The typical overflow example is:

double x = Double.MAX_VALUE * 1.1 / 1.1;

When internally during an arithmetic expression evaluation
long double is used, Double.MAX_VALUE * 1.1 can be
fully represented thanks to the additional bits in mantissa
AND exponent. And eventually we would get back Double.MAX_VALUE
in x after rounding.

When internally no long doubles are used, then Double.MAX_VAlUE*1.1
gives already +inf or somesuch. Since Java does not throw an
exception, but is required to return some value. And then division
by 1.1 does not help anymore. Another example is the following:

float f = Float.MAX_VALUE;
float g = Float.MAX_VALUE;
doube x = f * g;

https://www.securecoding.cert.org/c...oint+calculation+consistency+across+platforms

The above document is a little bit sloppy, since it mentions
only additional bits in the exponent. But 80bit doubles have
both, additional bits in the mantissa and in the exponent.
So differences might be more subtle.

Bye

Joshua Cranmer · Aug 9, 2011

How does that interact with JIT, though? On x86, the simplest way for
JIT to make non-strictfp code use the FPU would be to just load the
initial values into the (80-bit-wide!) registers and perform FADDs,
FMULs, etc. on them. As long as the computation stayed in registers the
higher precision then ought to remain in effect -- for JITted code.
Adding extra code to mask off 16 of the register bits (or the mantissa
subset of the extra bits) after every FP op would slow things down. Is
the JLS interpreted to require the JIT do this (for non-strictfp code)?
And, if not, what does the HotSpot JIT do in actuality?

All modern x86 processors sport the SSE-style instructions, which can do
32-bit and 64-bit instructions (also in a SIMD format) without touching
the FPU, and I suspect that these are slightly faster than using the x87
FPU instructions. I wouldn't be surprised if the JIT emitted SSE in the
vast majority of cases, so that JIT'd non-strictfp code would end up
returning the same results as JIT'd strictfp code.

BGB · Aug 9, 2011

Yeah good old Newton.

f(x) = a - 1/x.
zero at: 1/a.

f'(x) = 1 / x^2.
x_n+1 = x_n - f(x_n)/f'(x_n)
= x_n - (a - 1/x_n) / (1/x_n^2)
= x_n - a * x_n^2 + x_n
= x_n * (2 - a * x_n)

But diverge or stays constant in certain intervals.

Here for a=1.5:

x_1=-2/3: Asymptotically reaches -inf
x_1=-1/3: Asymptotically reaches -inf
x_1=0: Stays constant at 0
x_1=1/3: Converges to 2/3
x_1=2/3: Converges to 2/3
x_1=1: Converges to 2/3
x_1=1 1/3: Stays constant at 0 after x_2
x_1=1 2/3: Asymptotically reaches -inf

And does eventually not converge very good:

x_1=0.0001: Slowly recovers
x_1=1 1/3 + 0.0001: Slowly recovers

Hope your initial pick is good.

dunno...

quick test, and the posted code gives me 0.66666666...
tried with 1.0/(-1.5), and got +Inf, so this may be a bug (I think the
code is not well suited to negatives, so I may need to add a sign hack
or similar).

so, yeah, some of this is not ideally well tested...

the initial pick for 1.5 should be 0.5 or similar.

initial pick is:
2^(-(exp-bias))

Arne VajhÃ¸j · Aug 9, 2011

All modern x86 processors sport the SSE-style instructions, which can do
32-bit and 64-bit instructions (also in a SIMD format) without touching
the FPU, and I suspect that these are slightly faster than using the x87
FPU instructions. I wouldn't be surprised if the JIT emitted SSE in the
vast majority of cases, so that JIT'd non-strictfp code would end up
returning the same results as JIT'd strictfp code.

Is that possible in 32 bit mode?

Practically all CPU's today are 64 bit capable, but many still run
32 bit desktop OS'es.

Arne

Joshua Cranmer · Aug 9, 2011

Is that possible in 32 bit mode?

Practically all CPU's today are 64 bit capable, but many still run
32 bit desktop OS'es.

I don't recall x86's mode-switching semantics off the top of my head,
but I do believe that it is possible to run 64-bit instructions in
32-bit mode. The problem is the C ABI, particularly register saves and
restores, don't handle 64-bit stuff properly when compiled with 32-bit
targets for compilers.

SSE2 dates back to something like the Pentium II, so it's not a 64-bit
mode thing. Although I think SSE2 itself may only be limited to single
precision floats.

Poking around the hotspot source code for Java 7 does indicate that sse2
support is in the JIT, although I don't know the entire set of
circumstances that triggers it.

Jan Burse · Aug 10, 2011

Patricia said:
On 8/9/2011 5:00 AM, Jan Burse wrote:
...
...

My understanding from discussions at the time is that they were not
being sloppy. They had already solved dealing with the extra mantissa
bits, and remember the JLS defines *exactly* how numbers must round.

Patricia

Don't understand me wrong. I don't refer to the JLS or
VM as being sloppy. Only the cert.org article, they
write in the introduction:

.. can provide extended floating-point support with
exponents that contain more bits ..

They should mention exponents AND mantissa. So we can
turn their example:

double x = Double.MAX_VALUE * 1.1 / 1.1;

Into a mantissa problem by replacing it with:

double x = 1 + 1.0E-nn - 1;

With nn sufficiently big but not too big. Under 80bit
x can receive the value 1.0E-nn whereas under 64bit
the result is zero.

So differences need not always be that dramatic
that an overflow/underflow occurs/does not occur.

Pitty I don't have an example at hand where the
differences are dramatic and without
reaching into overflow/underflow.

Well doing it like that:

double x = 1 / (1 + 1.0E-nn - 1).

You get in 64bit a NaN and under 80bit no NaN, just
something close to 1.0Enn.

So the cert article and the Wikipedia Article could
mention that under non-strict computation not only
overflows and underflows can disappear, but also
NaNs can suddently disappear.

So if an application relies on NaNs appearing
in certain situations, this application might
behave differently under 64bit and 80bit. Could
result in different runtimes to reach some closure
condition etc..

Best Regards

BGB · Aug 10, 2011

SSE2, yes, is available in 32-bit mode.
it is not used by default by C compilers though.
x86-64 made it mandatory, and 64-bit ABIs used it by default for doing
floating point.

I don't recall x86's mode-switching semantics off the top of my head,
but I do believe that it is possible to run 64-bit instructions in
32-bit mode. The problem is the C ABI, particularly register saves and
restores, don't handle 64-bit stuff properly when compiled with 32-bit
targets for compilers.

not exactly...

64-bit operations can only be used (at all) in "long mode", whereas so
can 32-bit operations. otherwise, one is in "legacy mode" which only has
32-bit instructions.

it is, however, possible to create a 32-bit OS in long mode (mostly the
same as before), which could in-turn run 64-bit code in processes.
however, running in long-mode, one can no longer make use of VMM86
(Virtual 86) mode, segmented addressing, or several other rarely-used
features (TSS-based processes anyone?...), and several instructions are
officially dropped (IIRC, they were dropped from the Opteron, but Intel
partly re-added them in their implementation, and AMD followed Intel's
lead AFAIK).

I am not currently aware of any OS's which have done the above.

AFAIK, the OS would be mostly the same as in 32-bit legacy mode, apart
from needing to use new page-tables and a few other things.

SSE2 dates back to something like the Pentium II, so it's not a 64-bit
mode thing. Although I think SSE2 itself may only be limited to single
precision floats.

no. SSE was added in the Pentium3, and SSE2 and later IIRC in the
Pentium4 and AMD Athlon lines.

Pentium 2 only had MMX, which was considerably worse (64-bit byte and
short vectors aliased to FPU registers type stuff), which was rarely
used as most people would rather have a working FPU than lame byte vectors.

SSE added new registers and vector floating point operations.
SSE2 added scalar floats and doubles, double-vectors, and most MMX
instructions were retrofitted onto SSE (allowing byte and word-vector
operations and similar).

SSE3/... mostly added more modest extensions.

XOP (AMD) and AVX (Intel) basically add a much more drastic set of new
features (as well as allowing basically 3-5 register instruction forms
and 256-bit YMM registers, which can hold 4 doubles or 8 floats).

Poking around the hotspot source code for Java 7 does indicate that sse2
support is in the JIT, although I don't know the entire set of
circumstances that triggers it.

probably CPU support.
there are few good reasons not to use it anymore.

now as for performance (x87 vs SSE), there are tradeoffs either way:
naive scalar SSE can be slower than well-generated x87, but is generally
faster than naive x87.

vector SSE is, at this point, generally somewhat faster than trying to
use x87 (in most cases, a few operations are faster on x87 absent using
SSE3 or SSE4 instructions, such as vector dot-product, ...). vector SSE
was slightly slower than x87 in the Pentium3.

x87 still has a few features which SSE doesn't, such as the
trigonometric functions, ... so x87 can still be used for these (manual
calculation is slower).

Rounding doubles	6	Feb 12, 2008
Choosing the right epsilon for comparing doubles	8	Feb 2, 2014
I need help in understanding these files on my phone, Could someone help me understand these files? Urgent help needed. Please help.	3	Jun 4, 2023
Chatbot	0	Oct 8, 2024
Precision issue in python	5	Feb 20, 2010
Multi precision floating point	159	Nov 28, 2007
Seeking co-founders for my company.	3	Sep 8, 2024
Higher-Order Procedures Tutorial (long)	13	Dec 27, 2006

higher precision doubles

BGB

Eric Sosman

Jan Burse

Jan Burse

Jan Burse

Jan Burse

BGB

Jan Burse

Joshua Cranmer

Joshua Cranmer

supercalifragilisticexpialadiamaticonormalizeringe

BGB

Jan Burse

Jan Burse

Joshua Cranmer

BGB

Arne VajhÃ¸j

Joshua Cranmer

Jan Burse

BGB

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads