floating point calculation problem

bei · Jan 6, 2010

Hello,

I was bothered by the following problem for a while and could not
understand how it happened.

They are a lot of floating point calculation in the code.
Basically, i am doing an interpolation from a random position to the
closest grid point position. It is a 1d problem.
The domain is from (-L, L) and N is the number of grid point at a half
domain.
The grid size is dv, where dv = L/N; The grid is cell centered (-0.5
in the code below).
L, dv are declared to be double type and N to be integer type.

Below is a part of the code that bothered me (where v is the position,
j is the closest grid point index number)

int j0 = (int)(floor((v/dv+(N-0.5))));
double a = -2.5*pow(((v/dv+(N-0.5))-j0), 2.0);
double b = 1.5*pow(((v/dv+(N-0.5))-j0), 3.0);
double q = 1.0-2.5*pow(((v/dv+(N-0.5))-j0), 2.0)+1.5*pow(((v/dv+
(N-0.5))-j0), 3.0);
double q0 = 1.0 +a + b; (q0 is equal to q in the language of math.)

the output is
v = 2.818975726362143e+00;
dv = 4.394531250000000e-03;
N = 1146;
j0 = 1786;
a = -2.369681598890769e+00;
b = 1.384255444361780e+00;

nothing wrong until here, below is the problem
q = 1.0-2.5*pow(((v/dv+(N-0.5))-j0), 2.0)+1.5*pow(((v/dv+(N-0.5))-j0),
3.0)
= -1.369681598890769e+00; (wrong answer)
q0 = 1.0 +a + b
= 1.457384547101137e-02; (correct answer)

It looked like if a formula involvs a lot of floating point
calculation, it is safer to save the temporary data into a variable.

Why this is happened and how could I fix this problem? Any suggestion
is appreciated. Thank you!

bei · Jan 6, 2010

Hello,

I was bothered by the following problem for a while and could not
understand how it happened.

They are a lot of floating point calculation in the code.
Basically, i am doing an interpolation from a random position to the
closest grid point position. It is a 1d problem.
The domain is from (-L, L) and N is the number of grid point at a half
domain.
The grid size is dv, where dv = L/N; The grid is cell centered (-0.5
in the code below).
L, dv are declared to be double type and N to be integer type.

Below is a part of the code that bothered me (where v is the position,
j is the closest grid point index number)

int j0 = (int)(floor((v/dv+(N-0.5))));
double a = -2.5*pow(((v/dv+(N-0.5))-j0), 2.0);
double b = 1.5*pow(((v/dv+(N-0.5))-j0), 3.0);
double q = 1.0-2.5*pow(((v/dv+(N-0.5))-j0), 2.0)+1.5*pow(((v/dv+
(N-0.5))-j0), 3.0);
double q0 = 1.0 +a + b; (q0 is equal to q in the language of math.)

the output is
v = 2.818975726362143e+00;
dv = 4.394531250000000e-03;
N = 1146;
j0 = 1786;
a = -2.369681598890769e+00;
b = 1.384255444361780e+00;

nothing wrong until here, below is the problem
q = 1.0-2.5*pow(((v/dv+(N-0.5))-j0), 2.0)+1.5*pow(((v/dv+(N-0.5))-j0),
3.0)
= -1.369681598890769e+00; (wrong answer)
q0 = 1.0 +a + b
= 1.457384547101137e-02; (correct answer)

It looked like if a formula involvs a lot of floating point
calculation, it is safer to save the temporary data into a variable.

Why this is happened and how could I fix this problem? Any suggestion
is appreciated. Thank you!

btw, i am using g++ 4.3.3 with -O3 optimization turned on. The machine
is 64bits.

Rune Allnor · Jan 6, 2010

btw, i am using g++ 4.3.3 with -O3 optimization turned on. The machine
is 64bits.

The first place to look is data types. You have j0 as an int,
and all the other factors as doubles. That's the kind of thing
that almost always causes surprises and trouble. Declare j0 as
double first and try again. Repeat for all integers in the
computations.

Next have a look at the pow function. Since you use powers of
small integers, do something like

pbase = (((v/dv+(N-0.5))-j0);
p2 = pbase*pbase;
p3 = p2*pbase;

instead of using the pow() function. This would almost
certainly be significantly faster than pow().

As for why the problem occurs, I would *guess* that the
optimizer recognizes that pow() is called twice in the same
statement, with the same mantissa and different exponents.
If this is a template implementation (which might well be
the case), the optimizer might try to reduce the number of
repeated computations in the two calls, messing things up
in the process.

Rune

bei · Jan 6, 2010

The first place to look is data types. You have j0 as an int,
and all the other factors as doubles. That's the kind of thing
that almost always causes surprises and trouble. Declare j0 as
double first and try again. Repeat for all integers in the
computations.

Next have a look at the pow function. Since you use powers of
small integers, do something like

pbase = (((v/dv+(N-0.5))-j0);
p2 = pbase*pbase;
p3 = p2*pbase;

instead of using the pow() function. This would almost
certainly be significantly faster than pow().

As for why the problem occurs, I would *guess* that the
optimizer recognizes that pow() is called twice in the same
statement, with the same mantissa and different exponents.
If this is a template implementation (which might well be
the case), the optimizer might try to reduce the number of
repeated computations in the two calls, messing things up
in the process.

Rune

Rune, very helpful suggestion. Thank you.

bei · Jan 7, 2010

"bei" <[email protected]> ha scritto nel messaggio

there is no problem here:
#include <iostream>
using namespace std;

int main(void)
{double v = 2.818975726362143e+00;
double dv = 4.394531250000000e-03;
double a = -2.369681598890769e+00;
double b = 1.384255444361780e+00;
double q, q0;
unsigned j0 = 1786;
unsigned N = 1146;

q = 1.0-2.5*pow(v/dv+N-0.5-j0, 2.0)
+1.5*pow(v/dv+N-0.5-j0, 3.0);
q0= 1.0 +a + b;
cout.setf(ios::scientific);
cout.precision(15);

cout <<"q==" << q << " ; q0=" << q0 << "\n";

return 0;}

has result
q==1.457384547101137e-02 ; q0=1.457384547101115e-02

I got the same answer. I didn't get any problem if I cut the port of
the trouble code and ran it as above. Rune might be right. The problem
could be the pow() and v/dv+N-0.5-j0, 2.0, I modified the code as he
suggested, so far it looks ok.

floating point calculation problem	0	Jan 6, 2010
Java OpenJDK Floating Point Dare	3	Jan 17, 2023
C++ SSE and SSE2 compiler settings, and their Floating Point effects.	0	May 31, 2022
Portable marshalling of floating point data	3	Dec 10, 2009
Floating point linkage	37	Oct 13, 2013
template alike approach for floating point	2	Aug 29, 2007
Floating Point comparison problem	26	Feb 15, 2008
Dark-corners floating-point behavior in C++	12	Sep 7, 2005

floating point calculation problem

bei

bei

Rune Allnor

bei

bei

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads