# floating point calculation problem

Discussion in 'C++' started by bei, Jan 6, 2010.

1. ### beiGuest

Hello,

I was bothered by the following problem for a while and could not
understand how it happened.

They are a lot of floating point calculation in the code.
Basically, i am doing an interpolation from a random position to the
closest grid point position. It is a 1d problem.
The domain is from (-L, L) and N is the number of grid point at a half
domain.
The grid size is dv, where dv = L/N; The grid is cell centered (-0.5
in the code below).
L, dv are declared to be double type and N to be integer type.

Below is a part of the code that bothered me (where v is the position,
j is the closest grid point index number)

int j0 = (int)(floor((v/dv+(N-0.5))));
double a = -2.5*pow(((v/dv+(N-0.5))-j0), 2.0);
double b = 1.5*pow(((v/dv+(N-0.5))-j0), 3.0);
double q = 1.0-2.5*pow(((v/dv+(N-0.5))-j0), 2.0)+1.5*pow(((v/dv+
(N-0.5))-j0), 3.0);
double q0 = 1.0 +a + b; (q0 is equal to q in the language of math.)

the output is
v = 2.818975726362143e+00;
dv = 4.394531250000000e-03;
N = 1146;
j0 = 1786;
a = -2.369681598890769e+00;
b = 1.384255444361780e+00;

nothing wrong until here, below is the problem
q = 1.0-2.5*pow(((v/dv+(N-0.5))-j0), 2.0)+1.5*pow(((v/dv+(N-0.5))-j0),
3.0)
q0 = 1.0 +a + b

It looked like if a formula involvs a lot of floating point
calculation, it is safer to save the temporary data into a variable.

Why this is happened and how could I fix this problem? Any suggestion
is appreciated. Thank you!

bei, Jan 6, 2010

2. ### beiGuest

On Jan 5, 7:09 pm, bei <> wrote:
> Hello,
>
> I was bothered by the following problem for a while and could not
> understand how it happened.
>
> They are a lot of floating point calculation in the code.
> Basically, i am doing an interpolation from a random position to the
> closest grid point position. It is a 1d problem.
> The domain is from (-L, L) and N is the number of grid point at a half
> domain.
> The grid size is dv, where dv = L/N; The grid is cell centered (-0.5
> in the code below).
> L, dv are declared to be double type and N to be integer type.
>
> Below is a part of the code that bothered me (where v is the position,
> j is the closest grid point index number)
>
>  int j0 = (int)(floor((v/dv+(N-0.5))));
>  double a = -2.5*pow(((v/dv+(N-0.5))-j0), 2.0);
>  double b = 1.5*pow(((v/dv+(N-0.5))-j0), 3.0);
>  double q = 1.0-2.5*pow(((v/dv+(N-0.5))-j0), 2.0)+1.5*pow(((v/dv+
> (N-0.5))-j0), 3.0);
>  double q0 = 1.0 +a + b; (q0 is equal to q in the language of math.)
>
> the output is
> v = 2.818975726362143e+00;
> dv = 4.394531250000000e-03;
> N = 1146;
> j0 = 1786;
> a = -2.369681598890769e+00;
> b = 1.384255444361780e+00;
>
> nothing wrong until here, below is the problem
> q = 1.0-2.5*pow(((v/dv+(N-0.5))-j0), 2.0)+1.5*pow(((v/dv+(N-0.5))-j0),
> 3.0)
> q0 = 1.0 +a + b
>
> It looked like if a formula involvs a lot of floating point
> calculation, it is safer to save the temporary data into a variable.
>
> Why this is happened and how could I fix this problem? Any suggestion
> is appreciated. Thank you!

btw, i am using g++ 4.3.3 with -O3 optimization turned on. The machine
is 64bits.

bei, Jan 6, 2010

3. ### Rune AllnorGuest

On 6 Jan, 04:11, bei <> wrote:
> On Jan 5, 7:09 pm, bei <> wrote:
>
>
>
>
>
> > Hello,

>
> > I was bothered by the following problem for a while and could not
> > understand how it happened.

>
> > They are a lot of floating point calculation in the code.
> > Basically, i am doing an interpolation from a random position to the
> > closest grid point position. It is a 1d problem.
> > The domain is from (-L, L) and N is the number of grid point at a half
> > domain.
> > The grid size is dv, where dv = L/N; The grid is cell centered (-0.5
> > in the code below).
> > L, dv are declared to be double type and N to be integer type.

>
> > Below is a part of the code that bothered me (where v is the position,
> > j is the closest grid point index number)

>
> >  int j0 = (int)(floor((v/dv+(N-0.5))));
> >  double a = -2.5*pow(((v/dv+(N-0.5))-j0), 2.0);
> >  double b = 1.5*pow(((v/dv+(N-0.5))-j0), 3.0);
> >  double q = 1.0-2.5*pow(((v/dv+(N-0.5))-j0), 2.0)+1.5*pow(((v/dv+
> > (N-0.5))-j0), 3.0);
> >  double q0 = 1.0 +a + b; (q0 is equal to q in the language of math.)

>
> > the output is
> > v = 2.818975726362143e+00;
> > dv = 4.394531250000000e-03;
> > N = 1146;
> > j0 = 1786;
> > a = -2.369681598890769e+00;
> > b = 1.384255444361780e+00;

>
> > nothing wrong until here, below is the problem
> > q = 1.0-2.5*pow(((v/dv+(N-0.5))-j0), 2.0)+1.5*pow(((v/dv+(N-0.5))-j0),
> > 3.0)
> >    = -1.369681598890769e+00; (wrong answer)
> > q0 = 1.0 +a + b
> >     = 1.457384547101137e-02; (correct answer)

>
> > It looked like if a formula involvs a lot of floating point
> > calculation, it is safer to save the temporary data into a variable.

>
> > Why this is happened and how could I fix this problem? Any suggestion
> > is appreciated. Thank you!

>
> btw, i am using g++ 4.3.3 with -O3 optimization turned on. The machine
> is 64bits.

The first place to look is data types. You have j0 as an int,
and all the other factors as doubles. That's the kind of thing
that almost always causes surprises and trouble. Declare j0 as
double first and try again. Repeat for all integers in the
computations.

Next have a look at the pow function. Since you use powers of
small integers, do something like

pbase = (((v/dv+(N-0.5))-j0);
p2 = pbase*pbase;
p3 = p2*pbase;

instead of using the pow() function. This would almost
certainly be significantly faster than pow().

As for why the problem occurs, I would *guess* that the
optimizer recognizes that pow() is called twice in the same
statement, with the same mantissa and different exponents.
If this is a template implementation (which might well be
the case), the optimizer might try to reduce the number of
repeated computations in the two calls, messing things up
in the process.

Rune

Rune Allnor, Jan 6, 2010
4. ### beiGuest

On Jan 5, 10:16 pm, Rune Allnor <> wrote:
> On 6 Jan, 04:11, bei <> wrote:
>
>
>
> > On Jan 5, 7:09 pm, bei <> wrote:

>
> > > Hello,

>
> > > I was bothered by the following problem for a while and could not
> > > understand how it happened.

>
> > > They are a lot of floating point calculation in the code.
> > > Basically, i am doing an interpolation from a random position to the
> > > closest grid point position. It is a 1d problem.
> > > The domain is from (-L, L) and N is the number of grid point at a half
> > > domain.
> > > The grid size is dv, where dv = L/N; The grid is cell centered (-0.5
> > > in the code below).
> > > L, dv are declared to be double type and N to be integer type.

>
> > > Below is a part of the code that bothered me (where v is the position,
> > > j is the closest grid point index number)

>
> > >  int j0 = (int)(floor((v/dv+(N-0.5))));
> > >  double a = -2.5*pow(((v/dv+(N-0.5))-j0), 2.0);
> > >  double b = 1.5*pow(((v/dv+(N-0.5))-j0), 3.0);
> > >  double q = 1.0-2.5*pow(((v/dv+(N-0.5))-j0), 2.0)+1.5*pow(((v/dv+
> > > (N-0.5))-j0), 3.0);
> > >  double q0 = 1.0 +a + b; (q0 is equal to q in the language of math.)

>
> > > the output is
> > > v = 2.818975726362143e+00;
> > > dv = 4.394531250000000e-03;
> > > N = 1146;
> > > j0 = 1786;
> > > a = -2.369681598890769e+00;
> > > b = 1.384255444361780e+00;

>
> > > nothing wrong until here, below is the problem
> > > q = 1.0-2.5*pow(((v/dv+(N-0.5))-j0), 2.0)+1.5*pow(((v/dv+(N-0.5))-j0),
> > > 3.0)
> > >    = -1.369681598890769e+00; (wrong answer)
> > > q0 = 1.0 +a + b
> > >     = 1.457384547101137e-02; (correct answer)

>
> > > It looked like if a formula involvs a lot of floating point
> > > calculation, it is safer to save the temporary data into a variable.

>
> > > Why this is happened and how could I fix this problem? Any suggestion
> > > is appreciated. Thank you!

>
> > btw, i am using g++ 4.3.3 with -O3 optimization turned on. The machine
> > is 64bits.

>
> The first place to look is data types. You have j0 as an int,
> and all the other factors as doubles. That's the kind of thing
> that almost always causes surprises and trouble. Declare j0 as
> double first and try again. Repeat for all integers in the
> computations.
>
> Next have a look at the pow function. Since you use powers of
> small integers, do something like
>
> pbase = (((v/dv+(N-0.5))-j0);
> p2 = pbase*pbase;
> p3 = p2*pbase;
>
> instead of using the pow() function. This would almost
> certainly be significantly faster than pow().
>
> As for why the problem occurs, I would *guess* that the
> optimizer recognizes that pow() is called twice in the same
> statement, with the same mantissa and different exponents.
> If this is a template implementation (which might well be
> the case), the optimizer might try to reduce the number of
> repeated computations in the two calls, messing things up
> in the process.
>
> Rune

Rune, very helpful suggestion. Thank you.

bei, Jan 6, 2010
5. ### beiGuest

On Jan 6, 9:27 am, "io_x" <> wrote:
> "bei" <> ha scritto nel messaggionews:...
>
>
>
> > Hello,

>
> > I was bothered by the following problem for a while and could not
> > understand how it happened.

> .......
> > the output is
> > v = 2.818975726362143e+00;
> > dv = 4.394531250000000e-03;
> > N = 1146;
> > j0 = 1786;
> > a = -2.369681598890769e+00;
> > b = 1.384255444361780e+00;

>
> > nothing wrong until here, below is the problem
> > q = 1.0-2.5*pow(((v/dv+(N-0.5))-j0), 2.0)+1.5*pow(((v/dv+(N-0.5))-j0),
> > 3.0)
> >   = -1.369681598890769e+00; (wrong answer)
> > q0 = 1.0 +a + b
> >    = 1.457384547101137e-02; (correct answer)

>
> > It looked like if a formula involvs a lot of floating point
> > calculation, it is safer to save the temporary data into a variable.

>
> > Why this is happened and how could I fix this problem? Any suggestion
> > is appreciated. Thank you!

>
> there is no problem here:
> #include  <iostream>
> using namespace std;
>
> int  main(void)
> {double     v = 2.818975726362143e+00;
>  double    dv = 4.394531250000000e-03;
>  double     a = -2.369681598890769e+00;
>  double     b = 1.384255444361780e+00;
>  double    q, q0;
>  unsigned  j0 = 1786;
>  unsigned   N = 1146;
>
>  q = 1.0-2.5*pow(v/dv+N-0.5-j0, 2.0)
>         +1.5*pow(v/dv+N-0.5-j0, 3.0);
>  q0= 1.0 +a + b;
>  cout.setf(ios::scientific);
>  cout.precision(15);
>
>  cout <<"q==" << q << " ; q0=" << q0 << "\n";
>
>  return  0;}
>
> has result
> q==1.457384547101137e-02 ; q0=1.457384547101115e-02

I got the same answer. I didn't get any problem if I cut the port of
the trouble code and ran it as above. Rune might be right. The problem
could be the pow() and v/dv+N-0.5-j0, 2.0, I modified the code as he
suggested, so far it looks ok.

bei, Jan 7, 2010