Two implementations of simple math equation yield different results

A

Avi

I need to implement the following calculation:
f = (a*b) + (c*d)
where a,b,c,d are given double values and f is a double variable for
the result

I found out that using two different implementations gives two
different results:
1) In the first implementation I compute the entire equation in memory
2) In the second implementation I store intermediate to variables, and
then load them to compute the final result.
The results differ by a tiny amount (1E-15). I suspect that the reason
for the difference is from round off errors of the compiler.
Nonetheless, I need to know the reason for the difference as I need to
have an absolute reference for optimization work.

The two ways are as follows:

1)
Calculating two double values and then summing them to a single double
value
double f1 = (a*b) + (c*d)

2)
Calculating two double values and storing them to two double variables
Summing the two double variables to a single double value
double e1 = (a*b);
double e2 = (c*d);
double f2 = e1 + e2;

The difference is calculated as follows:
double diff = f2-f1;

I expect the difference to be exactly 0 but instead, I get a tiny
value.
Could someone explain the reason for differences in the result?

I’m compiling with gcc

Thanks
Avi
 
J

James Kanze

I need to implement the following calculation:
f = (a*b) + (c*d)
where a,b,c,d are given double values and f is a double
variable for the result
I found out that using two different implementations gives two
different results:
1) In the first implementation I compute the entire equation in memory

You mean in CPU and registers. Variables are in memory.
2) In the second implementation I store intermediate to
variables, and then load them to compute the final result.
The results differ by a tiny amount (1E-15). I suspect that
the reason for the difference is from round off errors of the
compiler.

The C++ standard allows a compiler to use extended precision for
intermediate results, only "forcing" the result to the final
precision when it is assigned to a variable.
Nonetheless, I need to know the reason for the difference as I
need to have an absolute reference for optimization work.
The two ways are as follows:
1)
Calculating two double values and then summing them to a
single double value
double f1 = (a*b) + (c*d)

All calculations may be done in extended precision.
2)
Calculating two double values and storing them to two double variables
Summing the two double variables to a single double value
double e1 = (a*b);

This forces the result to double.
double e2 = (c*d);

As does this as well.
double f2 = e1 + e2;
The difference is calculated as follows:
double diff = f2-f1;
I expect the difference to be exactly 0 but instead, I get a
tiny value. Could someone explain the reason for differences
in the result?

It's very implementation dependent.
I'm compiling with gcc

More to the point, you're compiling on a machine which uses
extended precision in its floating point unit, and its floating
point registers. (An Intel, perhaps.)

The reason for this freedom is speed. Forcing the results of
each operation to be rounded to double can be done, but would
slow things down considerably (at least on Intel). And for the
most part, increased precision is not considered a defect.
 
G

Greg Herlihy

It's very implementation dependent.


More to the point, you're compiling on a machine which uses
extended precision in its floating point unit, and its floating
point registers.  (An Intel, perhaps.)

The reason for this freedom is speed.  Forcing the results of
each operation to be rounded to double can be done, but would
slow things down considerably (at least on Intel).  And for the
most part, increased precision is not considered a defect.

But the extended floating point precision is a defect if - as in this
case - consistent numeric results are required. Fortunately, all Intel
x86 chips since the Pentium 3 support double precision (64-bit)
floating point in their SSE units. So, the solution would be to have
the compiler generate SSE floating point instructions ("-msse2 -
mfpmath=sse") instead of 387 instructions. According to the gcc
documentation, this strategy has other benefits as well:

"The resulting code should be considerably faster in the majority of
cases and avoid the numerical instability problems of 387 code, but
may break some existing code that expects temporaries to be 80-bit."

Greg
 
A

Avi

One way to identify and ignore small differences is to set a small
threshold, say:
double epsilon = 1E-6
and compare the difference against the threshold as follows:

if (abs(diff) < epsilon)
{
// difference is considered as noise
}
{
// difference is significant
}

I’m looking for a predefined threshold that the differences can be
measured against.
For example, the difference between 1 and the smallest value greater
than 1 that is representable for the double data type is 2.22045e-16.
I get it using: numeric_limits<double>::epsilon()

The differences that I’m seeing between the two implementations (in
the order of 1-E15) are bigger than the
numeric_limits<double>::epsilon() value (2.22045e-16).

I understand that the accumulated round off errors can be bigger than
the numeric_limits<double>::epsilon() value.

Is there a systematic threshold that can be used to define when the
differences are noise and when they are significant?
Or should I just pick a number which is small enough in my mind (say
epsilon = 1 E-6)?

Thanks,
Avi
 
A

Avi

One way to identify and ignore small differences is to set a small
threshold, say:
double epsilon = 1E-6
and compare the difference against the threshold as follows:

if (abs(diff) < epsilon)
{
// difference is considered as noise
}
{
// difference is significant
}

I’m looking for a predefined threshold that the differences can be
measured against.
For example, the difference between 1 and the smallest value greater
than 1 that is representable for the double data type is 2.22045e-16.
I get it using: numeric_limits<double>::epsilon()

The differences that I’m seeing between the two implementations (in
the order of 1-E15) are bigger than the
numeric_limits<double>::epsilon() value (2.22045e-16).

I understand that the accumulated round off errors can be bigger than
the numeric_limits<double>::epsilon() value.

Is there a systematic threshold that can be used to define when the
differences are noise and when they are significant?
Or should I just pick a number which is small enough in my mind (say
epsilon = 1 E-6)?

Thanks,
Avi
 
A

Avi

Thanks for your answers. It clarified the reasons for the differences
and helped me set a mechanism to distinguish between round off noise
and significant differences.

Avi
 
J

James Kanze

But the extended floating point precision is a defect if - as
in this case - consistent numeric results are required.

By defect, I presume you mean with regards to what is needed,
not according to the standard. I'm not that good in numerics to
argue one way or the other. Intuitively, I would agree with
you, but I know that opinions about this differ amongst the
specialists.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top