Floating Point Precision

michael.mcgarry · Sep 9, 2005

Hi,

I have a question about floating point precision in C.

What is the minimum distinguishable difference between 2 floating point
numbers? Does this differ for various computers?

Is this the EPSILON? I know in float.h a FLT_EPSILON is defined to be
10^-5. Does this mean that the computer cannot distinguish between 2
numbers that differ by less than this epsilon?

A problem I am seeing is a difference in values from a floating point
computation for a run on a Windows machine compared to a run on a Linux
machine. The values differ by 10^-6.

Thanks for any help,

Michael

Tim Prince · Sep 9, 2005

Hi,

I have a question about floating point precision in C.

What is the minimum distinguishable difference between 2 floating point
numbers? Does this differ for various computers?

The only reason it would be the same is that most computers support IEEE
754, at least to this extent. This is already Off Topic for c.l.c.

Is this the EPSILON? I know in float.h a FLT_EPSILON is defined to be
10^-5. Does this mean that the computer cannot distinguish between 2
numbers that differ by less than this epsilon?

FLT_EPSILON is the positive difference between 1.0f and the next higher
representable number of float data type. I would be disappointed in and
C textbook which did not explain this.

A problem I am seeing is a difference in values from a floating point
computation for a run on a Windows machine compared to a run on a Linux
machine. The values differ by 10^-6.

You could expect such differences, in float data, even between two
versions of the same compiler, or between different optimization or code
generation options of the same compiler, even on the same OS. If you
want these differences to be smaller, use double data type. Check what
the FAQ says on this subject.

Michael Mair · Sep 9, 2005

Hi,

I have a question about floating point precision in C.

What is the minimum distinguishable difference between 2 floating point
numbers? Does this differ for various computers?

Is this the EPSILON? I know in float.h a FLT_EPSILON is defined to be
10^-5. Does this mean that the computer cannot distinguish between 2
numbers that differ by less than this epsilon?

A problem I am seeing is a difference in values from a floating point
computation for a run on a Windows machine compared to a run on a Linux
machine. The values differ by 10^-6.

I suggest you have a look at
<http://en.wikipedia.org/wiki/IEEE_754>
and especially the reference from there to the Goldberg paper,
<http://docs.sun.com/source/806-3568/ncg_goldberg.html>
to understand how floating point numbers work.

Some notes:

The "next bigger" or "next smaller" number from a given floating
point number p is not always at the same distance but depends on
p.

EPSILON is the smallest number eps such that 1+eps != 1, so
1+eps is the next number after 1. If the base of the floating
point types is b, then the next number after b is b*(1+eps)
and not b+eps.

Right in the same vein, there are numbers which cannot be
represented by floating point numbers (e.g. the numbers between
1 and 1+eps), so errors are introduced and propagated throughout
your computations; there are rounding errors as well, so you
basically need a little bit of numerical analysis to know that
your results are still reasonably accurate.
Equality is not a relation you should rely on. Even working
with relative errors can give you a headache when working with
sets of potentially equal values.

C does not guarantee much about floating point numbers.
The few guarantees you do have are mainly the limits given in
<float.h> -- everything else depends on your implementation
(which often is comprised of platform, operating system,
compiler).

On a related note:
The "natural" floating point type in C is double. Use float only
if you have severe memory problems or can _prove_ that the
accuracy is sufficient for your purposes. (Hardly surprising, it
often is not.)

Cheers
Michael

Walter Roberson · Sep 9, 2005

I have a question about floating point precision in C.

What is the minimum distinguishable difference between 2 floating point
numbers?

You could probably work it out in terms of FLT_RADIX and FLT_MANT_DIG
but you have the small problem that the value you are describing is
not representable as a normalized number -- you might have the
case where A > B and yet A - B is not representable in normalized form.

Does this differ for various computers?

Yes, definitely.

Is this the EPSILON? I know in float.h a FLT_EPSILON is defined to be
10^-5. Does this mean that the computer cannot distinguish between 2
numbers that differ by less than this epsilon?

No, FLT_EPSILON is such that 1.0 is distinguishable from 1.0 + FLT_EPSILON

A problem I am seeing is a difference in values from a floating point
computation for a run on a Windows machine compared to a run on a Linux
machine. The values differ by 10^-6.

The absolute value doesn't tell us much -- if you are working with
values in the range 1E50 then 1E-6 is miniscule, but if you are
working with values in the range 1E-30 then 1E-6 is huge.

There are a lot of different reasons why computations come out differently
on different computers -- too many to list them all in one message.

As an example: on Pentiums, the native double precision size is 80 bits
but the C double precision size is 64 bits. If some steps of the
calculations are carried out at 80 bits, you can end up with different
results. There are sometimes compiler options that control whether
native-size register-to-register calculations are allowed, or whether
the machine must round to the storable precision at each stage.

But that's far from the only reason.

Kevin D. Quitt · Sep 9, 2005

Have a look at ftp://ftp.quitt.net/Outgoing/goldbergFollowup.pdf

Eric Sosman · Sep 9, 2005

Hi,

I have a question about floating point precision in C.

What is the minimum distinguishable difference between 2 floating point
numbers? Does this differ for various computers?

The smallest discernible difference depends on the
magnitude of the numbers: the computer can surely distinguish
1.0 from 1.1, but might not be able to tell 100000000000000.0
from 100000000000000.1 even though the two pairs of values
differ (mathematically speaking) by the same amount. You've
got to be concerned with relative error, not with absolute
error.

And yes: The relative error (loosely speaking, the precision)
will differ from one machine to another.

Is this the EPSILON? I know in float.h a FLT_EPSILON is defined to be
10^-5. Does this mean that the computer cannot distinguish between 2
numbers that differ by less than this epsilon?

First, I think you've misunderstood what FLT_EPSILON is.
It is the difference between 1.0f and the "next" float value,
the smallest float larger than 1.0f that is distinguishable
from 1.0f. That is, FLT_EPSILON is one way of describing the
precision of float values on the system at hand. Note that
although 1.0f is distinguishable from 1.0f+FLT_EPSILON,
1000000f need not be distinguishable from 1000000f+FLT_EPSILON.

Second, FLT_EPSILON is not necessarily 1E-5: the C Standard
requires that it be no greater than 1E-5, but permits lower
values (greater precision) for machines that support them.

A problem I am seeing is a difference in values from a floating point
computation for a run on a Windows machine compared to a run on a Linux
machine. The values differ by 10^-6.

Without knowing what the values are, there's no way to tell
whether a difference of 1E-6 is huge or tiny. If the values
are supposed to be Planck's Constant (~6.6E-34), 1E-6 represents
an enormous error. If they're supposed to be Avogadro's Number
(~6.0E23) the difference is completely insignificant.

For the purposes of argument, let's say the values are in
the vicinity of 1. Then a difference of 1E-6 in float arithmetic
on a machine where FLT_EPSILON is 1E-5 is nothing to worry about;
you've already done better than you had any right to expect.

Beyond that, we get into the analysis of the origins and
propagation of errors, a field known as "Numerical Analysis."
The topic is simple at first but deceptively so, because it
fairly rapidly becomes the stuff of PhD theses. A widely-
available paper called (IIRC) "What Every Computer Scientist
Should Know about Floating-Point Arithmetic" would be worth
your while to read.

Simon Biber · Sep 9, 2005

Michael said:
EPSILON is the smallest number eps such that 1+eps != 1, so
1+eps is the next number after 1. If the base of the floating
point types is b, then the next number after b is b*(1+eps)
and not b+eps.

The epsilon value should be the difference between 1 and the next
representable number after 1.

But consider the value x, defined as three quarters of the epsilon value.
float x = 0.75 * FLT_EPSILON;

now, your condition (1 + x) != 1 is very likely to be true. The result
of the addition on the left hand side is not representable, but it
should round to the closest representable value, which is the next value
after 1, even though x is less than FLT_EPSILON.

Perhaps the following condition is better?
eps > 0 && (1 + eps) - 1 == eps

Here's what I get on my computer:

FLT_EPSILON is 0.000000119209289550781250000000
x is 0.000000089406967163085937500000
1 + x is 1.000000119209289550781250000000
(1 + x) - 1 is 0.000000119209289550781250000000

x is 3/4 of FLT_EPSILON. Adding x to 1 rounds up to 1 + FLT_EPSILON.
Taking 1 back off again leaves the true FLT_EPSILON, not the 3/4 of it
that we started with.

Michael Mair · Sep 9, 2005

Simon said:
The epsilon value should be the difference between 1 and the next
representable number after 1.

Yep, I was imprecise. Let M be the set of all floating point
numbers representable by the respective floating point type;
then EPSILON = min {eps \in M | eps > 0 and 1+eps != 1}.

But consider the value x, defined as three quarters of the epsilon value.
float x = 0.75 * FLT_EPSILON;

now, your condition (1 + x) != 1 is very likely to be true. The result
of the addition on the left hand side is not representable, but it
should round to the closest representable value, which is the next value
after 1, even though x is less than FLT_EPSILON.

This is a question of the rounding mode.

Perhaps the following condition is better?
eps > 0 && (1 + eps) - 1 == eps

Yes, indeed. My mistake was that I had the classical
eps = 1.0S;
while ((T) (1.0S+eps/FLT_RADIX) != 1.0S)
eps /= FLT_RADIX;
in mind (where S is the appropriate type suffix or nothing for type T;
the cast can be necessary for avoiding excess precision
->FLT_EVAL_METHOD. The usual caveats for gcc and FP arithmetics on x86
and similar apply, though.)

Here's what I get on my computer:

FLT_EPSILON is 0.000000119209289550781250000000
x is 0.000000089406967163085937500000
1 + x is 1.000000119209289550781250000000
(1 + x) - 1 is 0.000000119209289550781250000000

x is 3/4 of FLT_EPSILON. Adding x to 1 rounds up to 1 + FLT_EPSILON.
Taking 1 back off again leaves the true FLT_EPSILON, not the 3/4 of it
that we started with.

See above. Round to zero is possible (->FLT_ROUNDS).

Cheers
Michael

Simon Biber · Sep 10, 2005

Michael said:
Yep, I was imprecise. Let M be the set of all floating point
numbers representable by the respective floating point type;
then EPSILON = min {eps \in M | eps > 0 and 1+eps != 1}.

That still suffers from the rounding mode issue. There are many possible
values of eps that are members of M, are greater than zero but ess than
the true epsilon value, and when added to one may round up to a value
that is not equal to 1.

Unless you mean the + operator to be an abstract mathematical thing that
can return any real number, rather than the one that must operate within
the given floating-point type.

You need to make clear whether:
+: M X M -> M (+ is of a type that maps a pair of M to a single M)
or:
+: M X M -> R (+ is of a type that maps a pair of M to a real)

Emmanuel Delahaye · Sep 10, 2005

What is the minimum distinguishable difference between 2 floating point
numbers? Does this differ for various computers?

float : FLT_EPSILON (<float.h>)
double : DBL_EPSILON (<float.h>)

[C99]

long double : LDBL_EPSILON ( said:
Is this the EPSILON?
Yup.

I know in float.h a FLT_EPSILON is defined to be
10^-5.

On /this/ implementation.

Does this mean that the computer cannot distinguish between 2
numbers that differ by less than this epsilon?

On this computer, I dunno. On /this/ implementation of the C-language,
yes.

A problem I am seeing is a difference in values from a floating point
computation for a run on a Windows machine compared to a run on a Linux
machine. The values differ by 10^-6.

Could be. In general terms, floating points representation is (nearly)
always an approximation. Use 'double' for a better precision. (C99
supports long double).

--
Emmanuel
The C-FAQ: http://www.eskimo.com/~scs/C-faq/faq.html
The C-library: http://www.dinkumware.com/refxc.html

"C is a sharp tool"

Barry Schwarz · Sep 10, 2005

long double : LDBL_EPSILON ( said:
(e-mail address removed) wrote on 09/09/05 :

What is the minimum distinguishable difference between 2 floating point
numbers? Does this differ for various computers?

Click to expand...

float : FLT_EPSILON (<float.h>)
double : DBL_EPSILON (<float.h>)

[C99]

long double : LDBL_EPSILON ( said:

Is this the EPSILON?
Yup.

I know in float.h a FLT_EPSILON is defined to be
10^-5.

Click to expand...

On /this/ implementation.

Does this mean that the computer cannot distinguish between 2
numbers that differ by less than this epsilon?

Click to expand...

On this computer, I dunno. On /this/ implementation of the C-language,
yes.

On this implementation and only for numbers between 1 and 1+epsilon.
It will more than likely distinguish between 10^-9 and 10^-8, two
numbers which differ by something much less than epsilon.

Could be. In general terms, floating points representation is (nearly)
always an approximation. Use 'double' for a better precision. (C99
supports long double).

<<Remove the del for email>>

Michael Mair · Sep 10, 2005

Simon said:
That still suffers from the rounding mode issue. There are many possible
values of eps that are members of M, are greater than zero but ess than
the true epsilon value, and when added to one may round up to a value
that is not equal to 1.

Gah. I wanted 1+eps \in M -- I should post at times when I am really
awake :-/

Unless you mean the + operator to be an abstract mathematical thing that
can return any real number, rather than the one that must operate within
the given floating-point type.

You need to make clear whether:
+: M X M -> M (+ is of a type that maps a pair of M to a single M)
or:
+: M X M -> R (+ is of a type that maps a pair of M to a real)

I meant the latter but worked as if I had the former :-(

Cheers
Michael

pete · Sep 10, 2005

Emmanuel said:
(C99 supports long double).

C89 does too.

Keyser Soze · Sep 11, 2005

Hi,

I have a question about floating point precision in C.

What is the minimum distinguishable difference between 2 floating point
numbers? Does this differ for various computers?

Is this the EPSILON? I know in float.h a FLT_EPSILON is defined to be
10^-5. Does this mean that the computer cannot distinguish between 2
numbers that differ by less than this epsilon?

A problem I am seeing is a difference in values from a floating point
computation for a run on a Windows machine compared to a run on a Linux
machine. The values differ by 10^-6.

Thanks for any help,

Michael

This is along post and I know there will be "comments"

People have lots of issues with the way Microsoft handles floating point
number on Windows systems. IMHO is sucks.

It seems that hacks left in from Intel's Pentium FPU problems may account
for some to the weirdness.

So now for the long part of this post.

Here is a program that attempts for find the number of "real" bits in the
floating point support by using only standard C functionality.

Well you know that's a lie about "standard C" whenever Microsoft and Windows
are involved.

/*

file: flt_precision.c

Find number of significant bits in the floating point fraction.

Sample output for Microsoft VC6:

float size 4
double size 8
long double size 8
Max delta for float 16777215, bits 24
Max delta for double 9007199254740991, bits 53
Max delta for long double 9007199254740991, bits 53

Sample output for gcc 2.95.3:

float size 4
double size 8
long double size 12
Max delta for float 16777215, bits 24
Max delta for double 9007199254740991, bits 53
Max delta for long double 18446744073709551615, bits 64

Notes:

The EPISLON value for each float data type should be
in your float.h standard library.

You should check to make sure that your implementation
matches your library.

The Microsoft compiler for windows does not support
long double at greater resolution that double.

*/

#include <stdio.h>
#include <float.h>

static void printFloat(float * f, int bits)
{
printf("Max delta for float %.0f, bits %d\n", *f, bits);
}

static void printDouble(double * d, int bits)
{

printf("Max delta for double %.f, bits %d\n", *d, bits);
}

/*

This code will take a some explaining.

1) printf does not deal well long floats accurately.

The solution to number one is to store the
floation point fraction as an unsigned interger.

2) Microsoft does not support long long data types.

The solution to number two is to use a Microsoft
non-portable data type.

*/

static void printLongDouble(long double * ld, int bits)
{

/*
#define MICROSOFT_STUPID_C
*/
#ifdef MICROSOFT_STUPID_C
unsigned _int64 delta;

delta = (unsigned _int64)(*ld);
printf("Max delta for long double %I64u, bits %d\n", delta, bits);
#else
long long delta;

delta = (unsigned long long)(*ld);
printf("Max delta for long double %llu, bits %d\n", delta, bits);
#endif

}

int main(int argc, char* argv[])
{
float f, f2, fp1, fd2;
double d, d2, dp1, dd2;
long double ld, ld2, ldp1, ldd2;
int bits;

printf ("float size %d\n",sizeof(f));
printf ("double size %d\n",sizeof(d));
printf ("long double size %d\n",sizeof(ld));

f = 1.0;
f2 = 2.0;
fp1 = 1.0;
fd2 = 1.0;
bits = 0;
do
{
bits++;
fd2 = fd2 / f2;
fp1 = f + fd2;
} while (f != fp1);

f = ((f / (fd2 * f2) - f) * f2) + f;
printFloat(&f, bits);

d = 1.0;
d2 = 2.0;
dp1 = 1.0;
dd2 = 1.0;
bits = 0;
do
{
bits++;
dd2 = dd2 / d2;
dp1 = d + dd2;
} while (d != dp1);

d = ((d / (dd2 * d2) - d) * d2) + d;
printDouble(&d, bits);

ld = 1.0;
ld2 = 2.0;
ldp1 = 1.0;
ldd2 = 1.0;
bits = 0;
do
{
bits++;
ldd2 = ldd2 / ld2;
ldp1 = ld + ldd2;
} while (ld != ldp1);

ld = ((ld / (ldd2 * ld2) - ld) * ld2) + ld;
printLongDouble(&ld, bits);

return 0;
}

P.J. Plauger · Sep 11, 2005

People have lots of issues with the way Microsoft handles floating point
number on Windows systems. IMHO is sucks.

Emphasis on the H, I assume. You haven't shown anything wrong with
it.

It seems that hacks left in from Intel's Pentium FPU problems may account
for some to the weirdness.

You haven't shown any "hacks" or "weirdness".

So now for the long part of this post.

Here is a program that attempts for find the number of "real" bits in the
floating point support by using only standard C functionality.

Well you know that's a lie about "standard C" whenever Microsoft and
Windows are involved.

You mean that *you* are lying about your code being Standard C,
because you don't know how to solve this particular problem
in Standard C under Windows?

I've worked with Microsoft C for nearly 20 years now, particularly
in the area of conformance. IME, it conforms very well, certainly
for the past decade. They chose a while back to give long double
the same representation as double, for better consistency across
multiple architectures as I understand it. While this sacrifices
some possible functionality on Intel architectures, it *is*
conforming.

So what's your point?

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

Dik T. Winter · Sep 11, 2005

> f = 1.0;
> f2 = 2.0;
> fp1 = 1.0;
> fd2 = 1.0;
> bits = 0;
> do
> {
> bits++;
> fd2 = fd2 / f2;
> fp1 = f + fd2;
> } while (f != fp1);

When the rounding mode is round to +inf this only terminates when
fd2 did underflow, and so equals 0. And if underflow only gives
the smallest positive floating point number, it will not terminate.

> f = ((f / (fd2 * f2) - f) * f2) + f;

Divide by zero exception. As I remember from something similar I wrote
a long long time ago, the stopping criterium should be
while (fp1 - f != fd2),
and I think with that criterium your next calculation can be simplified.
But be afraid off pre-adjusting processors that truncate during the
pre-adjust.

Java OpenJDK Floating Point Dare	3	Jan 17, 2023
C++ SSE and SSE2 compiler settings, and their Floating Point effects.	0	May 31, 2022
Floating point linkage	37	Oct 13, 2013
Accessing array elements via floating point formats.	33	Dec 10, 2010
portable floating-point read/write	11	Feb 12, 2010
How to use single precision floating point?	10	Aug 7, 2010
Floating point minimum and maximum exponent values	2	Jul 16, 2013
floating point in c99	7	Aug 17, 2010

Floating Point Precision

michael.mcgarry

Tim Prince

Michael Mair

Walter Roberson

Kevin D. Quitt

Eric Sosman

Simon Biber

Michael Mair

Simon Biber

Emmanuel Delahaye

Barry Schwarz

Michael Mair

pete

Keyser Soze

P.J. Plauger

Dik T. Winter

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads