The machine epsilon

jacob navia · Jun 29, 2007

Eric said:
Richard Tobin wrote On 06/29/07 08:53,:

I don't think so. The fraction of a normalized, non-
zero, finite IEEE number has a value 0.5 <= f < 1, so
unity is represented as two to the first times one-half:
2^1 * .100...000(2). The unbiased exponent value in the
representation of unity is therefore one, not zero.

I think you forget the implicit bit Eric.

jacob navia · Jun 29, 2007

Eric said:
Eric Sosman wrote On 06/29/07 07:31,:

jacob said:

[...]
If you add a number smaller than
this to 1.0, the result will be 1.0. For the different representations
we have in the standard header <float.h>:

Click to expand...

Finally, we get to an understandable definition: x is the
FP epsilon if 1+x is the smallest representable number greater
than 1 (when evaluated in the appropriate type). [...]

Click to expand...

Now that I think of it, the two descriptions are
not the same. Mine is correct, as far as I know, but
Jacob's is subtly wrong. (Hint: Rounding modes.)

The standard says:
5.2.4.2.2:
DBL_EPSILON
the difference between 1 and the least value greater than 1 that is
representable in the given floating point type, b^(1-p)

Since we have 53 bits in the mantissa we have
2^(1-53)--> 2.2204460492503131E-16
as shown by my program!

BY THE WAY I added an exercise:

Exercise

jacob navia · Jun 29, 2007

Richard said:
If you put those bits into a double, you don't get the epsilon (try
it). You are writing it as a de-normalised representation, but if you
put those bits in a double they would be interpreted as a normalized
value, equal to 1+epsilon.

The IEEE representation of DBL_EPSILON is

sign bit: 0
exponent bits: 01111001011 (representing -52)
mantissa bits: 000.... (representing 1.0, because of the hidden bit)

-- Richard

#include <stdio.h>
#include <float.h>
int main(void)
{
double d = DBL_EPSILON;
int *p = (int *)&d;

printf("%a\n",DBL_EPSILON);
printf("0x%x 0x%x\n",p[0],p[1]);
}

The output is
0x1.0000000000000p-052
0x0 0x3cb00000
3cb --> 971.
971 - 1023 --> -52
The rest is zero, since there is a hidden bit.

The effective representation of DBL_EPSILON
is then:
sign:0
exponent: 971 (-52 since 971-1023 is -52)
Bias: 1023
mantissa: Zero (since we have a hidden bit of zero)

jacob navia · Jun 29, 2007

Army1987 said:
jacob navia said:

The C standard paragraph J.5.6: Common extensions:
J.5.6 Other arithmetic types
Additional arithmetic types, such as _ _int128 or double double, and
their appropriate conversions are defined (6.2.5, 6.3.1). Additional
floating types may have more range or precision than long double, may be
used for evaluating expressions of other floating types, and may be
used to define float_t or double_t.

Click to expand...

Where does it allow you to use an identifier which doesn't begin
with an underscore and the standard never reserves?
What happens if i write a program with the lines
#include <float.h>
int main(int QFLT_EPSILON, char *argv[])
(yes, I'm allowed to do that) and try to compile it with your
compiler?

You will have to add the -ansic flag to your compilation flags

Eric Sosman · Jun 29, 2007

jacob navia wrote On 06/29/07 10:56,:

Eric said:
Eric said:

Eric Sosman wrote On 06/29/07 07:31,:

jacob navia wrote:

[...]
If you add a number smaller than
this to 1.0, the result will be 1.0. For the different representations
we have in the standard header <float.h>:

Finally, we get to an understandable definition: x is the
FP epsilon if 1+x is the smallest representable number greater
than 1 (when evaluated in the appropriate type). [...]

Click to expand...

Now that I think of it, the two descriptions are
not the same. Mine is correct, as far as I know, but
Jacob's is subtly wrong. (Hint: Rounding modes.)

Click to expand...

The standard says:
5.2.4.2.2:
DBL_EPSILON
the difference between 1 and the least value greater than 1 that is
representable in the given floating point type, b^(1-p)

Yes, but what you said in your tutorial was "If you add
a number smaller than this [epsilon] to 1.0, the result will
be 1.0," and that is not necessarily true. For example, on
the machine in front of me at the moment,

1.0 + DBL_EPSILON * 3 / 4

is greater than one, even though `DBL_EPSILON * 3 / 4' is
25% smaller than DBL_EPSILON.

CBFalconer · Jun 29, 2007

jacob said:
Eric Sosman wrote:
.... snip ...

I think you forget the implicit bit Eric.

In some systems. Not necessarily C. Do try to stay on topic.

jacob navia · Jun 29, 2007

Eric said:
jacob navia wrote On 06/29/07 10:56,:

Eric said:

Eric Sosman wrote On 06/29/07 07:31,:

jacob navia wrote:

[...]
If you add a number smaller than
this to 1.0, the result will be 1.0. For the different representations
we have in the standard header <float.h>:
Finally, we get to an understandable definition: x is the
FP epsilon if 1+x is the smallest representable number greater
than 1 (when evaluated in the appropriate type). [...]
Now that I think of it, the two descriptions are
not the same. Mine is correct, as far as I know, but
Jacob's is subtly wrong. (Hint: Rounding modes.)

Click to expand...

The standard says:
5.2.4.2.2:
DBL_EPSILON
the difference between 1 and the least value greater than 1 that is
representable in the given floating point type, b^(1-p)

Click to expand...

Yes, but what you said in your tutorial was "If you add
a number smaller than this [epsilon] to 1.0, the result will
be 1.0," and that is not necessarily true. For example, on
the machine in front of me at the moment,

1.0 + DBL_EPSILON * 3 / 4

is greater than one, even though `DBL_EPSILON * 3 / 4' is
25% smaller than DBL_EPSILON.

This is because your machine makes calculations in extended
precision since DBL_EPSILON must by definition be the smallest
quantity where 1+DBL_EPSILON != 1

jacob navia · Jun 29, 2007

CBFalconer said:
In some systems. Not necessarily C. Do try to stay on topic.

The C standard assumes IEEE 754 representation Chuck.

Walter Roberson · Jun 29, 2007

jacob navia said:
The C standard assumes IEEE 754 representation Chuck.

C89 doesn't.

Army1987 · Jun 29, 2007

jacob navia said:
Army1987 said:

jacob navia said:

Richard Heathfield wrote:
jacob navia said:

<snip>

For the different
representations we have in the standard header <float.h>:

#define FLT_EPSILON 1.19209290e-07F // float
#define DBL_EPSILON 2.2204460492503131e-16 // double
#define LDBL_EPSILON 1.084202172485504434007452e-19L //long double
// qfloat epsilon truncated so that it fits in this page...
#define QFLT_EPSILON 1.09003771904865842969737513593110651 ... E-106
Conforming implementations must not define QFLT_EPSILON in <float.h>
The C standard paragraph J.5.6: Common extensions:
J.5.6 Other arithmetic types
Additional arithmetic types, such as _ _int128 or double double, and
their appropriate conversions are defined (6.2.5, 6.3.1). Additional
floating types may have more range or precision than long double, may be
used for evaluating expressions of other floating types, and may be
used to define float_t or double_t.

Click to expand...

Where does it allow you to use an identifier which doesn't begin
with an underscore and the standard never reserves?
What happens if i write a program with the lines
#include <float.h>
int main(int QFLT_EPSILON, char *argv[])
(yes, I'm allowed to do that) and try to compile it with your
compiler?

Click to expand...

You will have to add the -ansic flag to your compilation flags

Thanks. (Not that I might ever be going to do that, but if I can
disable that, it is no worse (at least under this aspect) than gcc
not allowing me to name a variable random if I don't use -ansi
because of the POSIX function named that way.)

Army1987 · Jun 29, 2007

jacob navia said:
The C standard assumes IEEE 754 representation Chuck.

It doesn't. It says that an implementation only can define
__STDC_IEC_559__ when it comforms to that standard. C doesn't even
require FLT_RADIX to be a power of 2.

(Hey, n1124.pdf has a blank between the two underscores...)

Army1987 · Jun 29, 2007

jacob navia said:
Eric said:

jacob navia wrote On 06/29/07 10:56,:

Eric Sosman wrote:

Eric Sosman wrote On 06/29/07 07:31,:

jacob navia wrote:

[...]
If you add a number smaller than this to 1.0, the result will be 1.0. For the different representations we have in the
standard header <float.h>:
Finally, we get to an understandable definition: x is the
FP epsilon if 1+x is the smallest representable number greater
than 1 (when evaluated in the appropriate type). [...]
Now that I think of it, the two descriptions are
not the same. Mine is correct, as far as I know, but
Jacob's is subtly wrong. (Hint: Rounding modes.)

The standard says:
5.2.4.2.2:
DBL_EPSILON
the difference between 1 and the least value greater than 1 that is representable in the given floating point type, b^(1-p)

Click to expand...

Yes, but what you said in your tutorial was "If you add
a number smaller than this [epsilon] to 1.0, the result will
be 1.0," and that is not necessarily true. For example, on
the machine in front of me at the moment,

1.0 + DBL_EPSILON * 3 / 4

is greater than one, even though `DBL_EPSILON * 3 / 4' is
25% smaller than DBL_EPSILON.

Click to expand...

This is because your machine makes calculations in extended
precision since DBL_EPSILON must by definition be the smallest
quantity where 1+DBL_EPSILON != 1

Or simplily 1.0 + DBL_EPSILON * 3 / 4 gets rounded up to
1.0 + DBL_EPSILON.
In C99 I would use nextafter(1.0, 2.0) - 1.0.

CBFalconer · Jun 29, 2007

jacob said:
The C standard assumes IEEE 754 representation Chuck.

No it doesn't. It allows it.

jacob navia · Jun 29, 2007

Army1987 said:
jacob navia said:

Eric said:

jacob navia wrote On 06/29/07 10:56,:
Eric Sosman wrote:

Eric Sosman wrote On 06/29/07 07:31,:

jacob navia wrote:

[...]
If you add a number smaller than this to 1.0, the result will be 1.0. For the different representations we have in the
standard header <float.h>:
Finally, we get to an understandable definition: x is the
FP epsilon if 1+x is the smallest representable number greater
than 1 (when evaluated in the appropriate type). [...]
Now that I think of it, the two descriptions are
not the same. Mine is correct, as far as I know, but
Jacob's is subtly wrong. (Hint: Rounding modes.)

The standard says:
5.2.4.2.2:
DBL_EPSILON
the difference between 1 and the least value greater than 1 that is representable in the given floating point type, b^(1-p)
Yes, but what you said in your tutorial was "If you add
a number smaller than this [epsilon] to 1.0, the result will
be 1.0," and that is not necessarily true. For example, on
the machine in front of me at the moment,

1.0 + DBL_EPSILON * 3 / 4

is greater than one, even though `DBL_EPSILON * 3 / 4' is
25% smaller than DBL_EPSILON.

Click to expand...

This is because your machine makes calculations in extended
precision since DBL_EPSILON must by definition be the smallest
quantity where 1+DBL_EPSILON != 1

Click to expand...

Or simplily 1.0 + DBL_EPSILON * 3 / 4 gets rounded up to
1.0 + DBL_EPSILON.
In C99 I would use nextafter(1.0, 2.0) - 1.0.

That's true!

I forgot about the rounding issue completely. The good "test" would be
DBL_EPSILON*1/4

Joe Wright · Jun 29, 2007

jacob said:
Eric said:

jacob navia wrote On 06/29/07 10:56,:

Eric Sosman wrote:

Eric Sosman wrote On 06/29/07 07:31,:

jacob navia wrote:

[...]
If you add a number smaller than this to 1.0, the result will be
1.0. For the different representations we have in the standard
header <float.h>:
Finally, we get to an understandable definition: x is the
FP epsilon if 1+x is the smallest representable number greater
than 1 (when evaluated in the appropriate type). [...]
Now that I think of it, the two descriptions are
not the same. Mine is correct, as far as I know, but
Jacob's is subtly wrong. (Hint: Rounding modes.)

The standard says:
5.2.4.2.2:
DBL_EPSILON
the difference between 1 and the least value greater than 1 that is
representable in the given floating point type, b^(1-p)

Click to expand...

Yes, but what you said in your tutorial was "If you add
a number smaller than this [epsilon] to 1.0, the result will
be 1.0," and that is not necessarily true. For example, on
the machine in front of me at the moment,

1.0 + DBL_EPSILON * 3 / 4

is greater than one, even though `DBL_EPSILON * 3 / 4' is
25% smaller than DBL_EPSILON.

Click to expand...

This is because your machine makes calculations in extended
precision since DBL_EPSILON must by definition be the smallest
quantity where 1+DBL_EPSILON != 1

I'm not sure whose point I'm taking but I offer the following:
Some time ago I wrote a utility, one of whose facilities is
to display binary representations of Floating Point numbers.
The following is the representation of DBL_EPSILON.

00111100 10110000 00000000 00000000 00000000 00000000 00000000 00000000
Exp = 971 (-51)
111 11001101
Man = .10000 00000000 00000000 00000000 00000000 00000000 00000000
2.2204460492503131e-16

Now this is DBL_EPSILON + 1.0

00111111 11110000 00000000 00000000 00000000 00000000 00000000 00000001
Exp = 1023 (1)
000 00000001
Man = .10000 00000000 00000000 00000000 00000000 00000000 00000001
1.0000000000000002e+00

Army1987 · Jun 29, 2007

jacob navia said:
Army1987 said:

jacob navia said:

Eric Sosman wrote:
jacob navia wrote On 06/29/07 10:56,:
Eric Sosman wrote:

Eric Sosman wrote On 06/29/07 07:31,:

jacob navia wrote:

[...]
If you add a number smaller than this to 1.0, the result will be 1.0. For the different representations we have in the
standard header <float.h>:
Finally, we get to an understandable definition: x is the
FP epsilon if 1+x is the smallest representable number greater
than 1 (when evaluated in the appropriate type). [...]
Now that I think of it, the two descriptions are
not the same. Mine is correct, as far as I know, but
Jacob's is subtly wrong. (Hint: Rounding modes.)

The standard says:
5.2.4.2.2:
DBL_EPSILON
the difference between 1 and the least value greater than 1 that is representable in the given floating point type, b^(1-p)
Yes, but what you said in your tutorial was "If you add
a number smaller than this [epsilon] to 1.0, the result will
be 1.0," and that is not necessarily true. For example, on
the machine in front of me at the moment,

1.0 + DBL_EPSILON * 3 / 4

is greater than one, even though `DBL_EPSILON * 3 / 4' is
25% smaller than DBL_EPSILON.

This is because your machine makes calculations in extended
precision since DBL_EPSILON must by definition be the smallest
quantity where 1+DBL_EPSILON != 1

Click to expand...

Or simplily 1.0 + DBL_EPSILON * 3 / 4 gets rounded up to
1.0 + DBL_EPSILON.
In C99 I would use nextafter(1.0, 2.0) - 1.0.

Click to expand...

That's true!

I forgot about the rounding issue completely. The good "test" would be
DBL_EPSILON*1/4

NO! The rounding mode needn't be to nearest!

#include <stdio.h>

int main(void)

{

volatile double oneplus = 2, epsilon = 1;

while (1 + epsilon/2 > 1) {

epsilon /= 2;

oneplus = 1 + epsilon;

}

epsilon = oneplus - 1;

printf("DBL_EPSILON is %g\n", epsilon);

return 0;

}

And i'm not very sure wheter it'd work if FLT_RADIX were not a power of 2.

Keith Thompson · Jun 29, 2007

jacob navia said:
The C standard assumes IEEE 754 representation Chuck.

It most certainly does not, and it never has.

jacob navia · Jun 29, 2007

Keith said:
It most certainly does not, and it never has.

In my copy of the standard there is a lengthy
Annex F (normative) IEC 60559 floating-point arithmetic

This annex specifies C language support for the IEC 60559 floating-point
standard. The
IEC 60559 floating-point standard is specifically Binary floating-point
arithmetic for
microprocessor systems, second edition (IEC 60559:1989), previously
designated
IEC 559:1989 and as IEEE Standard for Binary Floating-Point Arithmetic
(ANSI/IEEE 754âˆ’1985). IEEE Standard for Radix-Independent Floating-Point
Arithmetic (ANSI/IEEE 854âˆ’1987) generalizes the binary standard to remove
dependencies on radix and word length. IEC 60559 generally refers to the
floating-point
standard, as in IEC 60559 operation, IEC 60559 format, etc. An
implementation that
defines _ _STDC_IEC_559_ _ shall conform to the specifications in this
annex. Where
a binding between the C language and IEC 60559 is indicated, the IEC
60559-specified
behavior is adopted by reference, unless stated otherwise.

So, obviously in some systems no standard floating
point will be used, but that should be extremely rare.

Keith Thompson · Jun 29, 2007

jacob navia said:
Army1987 said:

jacob navia said:

Eric Sosman wrote:
jacob navia wrote On 06/29/07 10:56,: [...]
The standard says:
5.2.4.2.2:
DBL_EPSILON
the difference between 1 and the least value greater than 1 that
is representable in the given floating point type, b^(1-p)
Yes, but what you said in your tutorial was "If you add
a number smaller than this [epsilon] to 1.0, the result will
be 1.0," and that is not necessarily true. For example, on
the machine in front of me at the moment,

1.0 + DBL_EPSILON * 3 / 4

is greater than one, even though `DBL_EPSILON * 3 / 4' is
25% smaller than DBL_EPSILON.

This is because your machine makes calculations in extended
precision since DBL_EPSILON must by definition be the smallest
quantity where 1+DBL_EPSILON != 1

Click to expand...

Or simplily 1.0 + DBL_EPSILON * 3 / 4 gets rounded up to
1.0 + DBL_EPSILON.
In C99 I would use nextafter(1.0, 2.0) - 1.0.

Click to expand...

That's true!

I forgot about the rounding issue completely. The good "test" would be
DBL_EPSILON*1/4

And what if inexact results are always rounded up? I don't believe
anything in the C standard disallows that.

Eric Sosman · Jun 29, 2007

jacob navia wrote On 06/29/07 13:43,:

Eric said:
Eric said:

jacob navia wrote On 06/29/07 10:56,:

Eric Sosman wrote:

Eric Sosman wrote On 06/29/07 07:31,:

jacob navia wrote:

[...]
If you add a number smaller than
this to 1.0, the result will be 1.0. For the different representations
we have in the standard header <float.h>:

Finally, we get to an understandable definition: x is the
FP epsilon if 1+x is the smallest representable number greater
than 1 (when evaluated in the appropriate type). [...]

Now that I think of it, the two descriptions are
not the same. Mine is correct, as far as I know, but
Jacob's is subtly wrong. (Hint: Rounding modes.)

The standard says:
5.2.4.2.2:
DBL_EPSILON
the difference between 1 and the least value greater than 1 that is
representable in the given floating point type, b^(1-p)

Click to expand...

Yes, but what you said in your tutorial was "If you add
a number smaller than this [epsilon] to 1.0, the result will
be 1.0," and that is not necessarily true. For example, on
the machine in front of me at the moment,

1.0 + DBL_EPSILON * 3 / 4

is greater than one, even though `DBL_EPSILON * 3 / 4' is
25% smaller than DBL_EPSILON.

Click to expand...

This is because your machine makes calculations in extended
precision since DBL_EPSILON must by definition be the smallest
quantity where 1+DBL_EPSILON != 1

Wrong again, Jacob. Twice.

First, the machine I'm using does not even have an
extended precision, much less use one. Its double is
a plain vanilla IEEE 64-bit double, and that's that.
What it does have, though, in common with other IEEE
implementations, is the ability to deliver a correctly
rounded answer. 1+DBL_EPSILON*3/4 is (mathematically)
closer to 1+DBL_EPSILON than it is to 1, so the sum
rounds up rather than down. (Did you not read the hint
I wrote earlier, and that you have now quoted twice?
"Not," I guess.)

Second, you have again mis-stated what the Standard
says (even after quoting the Standard's own text). The
Standard does *not* say that epsilon is the smallest
value which when added to 1 yields a sum greater than 1;
it says that epsilon is the difference between 1 and the
next larger representable number. If you think the two
statements are equivalent, you have absolutely no call
to be writing a tutorial on floating-point arithmetic!

For IEEE double, there are about four and a half
thousand million million distinct values strictly less
than DBL_EPSILON which when added to 1 will produce the
sum 1+DBL_EPSILON in "round to nearest" mode, which is
the mode in effect when a C program starts.

Machine epsilon: conclusion	46	Jun 30, 2007
Quasi 0	19	Jun 4, 2012
Typecast long double->double seems to go wrong	4	May 24, 2004
Speaking of sum programs (let's keep Razi out of it) here's my	0	May 7, 2008
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
Request for help	22	Sep 20, 2007
Machine precision	24	Oct 20, 2003
What are the minimum numbers of bits required to represent C stdfloat and double components?	6	Apr 19, 2008

The machine epsilon

jacob navia

jacob navia

jacob navia

jacob navia

Eric Sosman

CBFalconer

jacob navia

jacob navia

Walter Roberson

Army1987

Army1987

Army1987

CBFalconer

jacob navia

Joe Wright

Army1987

Keith Thompson

jacob navia

Keith Thompson

Eric Sosman

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads