The machine epsilon

J

jacob navia

Eric said:
Richard Tobin wrote On 06/29/07 08:53,:

I don't think so. The fraction of a normalized, non-
zero, finite IEEE number has a value 0.5 <= f < 1, so
unity is represented as two to the first times one-half:
2^1 * .100...000(2). The unbiased exponent value in the
representation of unity is therefore one, not zero.

I think you forget the implicit bit Eric.
 
J

jacob navia

Eric said:
Eric Sosman wrote On 06/29/07 07:31,:
jacob said:
[...]
If you add a number smaller than
this to 1.0, the result will be 1.0. For the different representations
we have in the standard header <float.h>:

Finally, we get to an understandable definition: x is the
FP epsilon if 1+x is the smallest representable number greater
than 1 (when evaluated in the appropriate type). [...]

Now that I think of it, the two descriptions are
not the same. Mine is correct, as far as I know, but
Jacob's is subtly wrong. (Hint: Rounding modes.)

The standard says:
5.2.4.2.2:
DBL_EPSILON
the difference between 1 and the least value greater than 1 that is
representable in the given floating point type, b^(1-p)

Since we have 53 bits in the mantissa we have
2^(1-53)--> 2.2204460492503131E-16
as shown by my program!


BY THE WAY I added an exercise:

Exercise
 
J

jacob navia

Richard said:
If you put those bits into a double, you don't get the epsilon (try
it). You are writing it as a de-normalised representation, but if you
put those bits in a double they would be interpreted as a normalized
value, equal to 1+epsilon.

The IEEE representation of DBL_EPSILON is

sign bit: 0
exponent bits: 01111001011 (representing -52)
mantissa bits: 000.... (representing 1.0, because of the hidden bit)

-- Richard
#include <stdio.h>
#include <float.h>
int main(void)
{
double d = DBL_EPSILON;
int *p = (int *)&d;

printf("%a\n",DBL_EPSILON);
printf("0x%x 0x%x\n",p[0],p[1]);
}

The output is
0x1.0000000000000p-052
0x0 0x3cb00000
3cb --> 971.
971 - 1023 --> -52
The rest is zero, since there is a hidden bit.

The effective representation of DBL_EPSILON
is then:
sign:0
exponent: 971 (-52 since 971-1023 is -52)
Bias: 1023
mantissa: Zero (since we have a hidden bit of zero)
 
J

jacob navia

Army1987 said:
jacob navia said:
The C standard paragraph J.5.6: Common extensions:
J.5.6 Other arithmetic types
Additional arithmetic types, such as _ _int128 or double double, and
their appropriate conversions are defined (6.2.5, 6.3.1). Additional
floating types may have more range or precision than long double, may be
used for evaluating expressions of other floating types, and may be
used to define float_t or double_t.
Where does it allow you to use an identifier which doesn't begin
with an underscore and the standard never reserves?
What happens if i write a program with the lines
#include <float.h>
int main(int QFLT_EPSILON, char *argv[])
(yes, I'm allowed to do that) and try to compile it with your
compiler?

You will have to add the -ansic flag to your compilation flags
 
E

Eric Sosman

jacob navia wrote On 06/29/07 10:56,:
Eric said:
Eric Sosman wrote On 06/29/07 07:31,:
jacob navia wrote:

[...]
If you add a number smaller than
this to 1.0, the result will be 1.0. For the different representations
we have in the standard header <float.h>:

Finally, we get to an understandable definition: x is the
FP epsilon if 1+x is the smallest representable number greater
than 1 (when evaluated in the appropriate type). [...]

Now that I think of it, the two descriptions are
not the same. Mine is correct, as far as I know, but
Jacob's is subtly wrong. (Hint: Rounding modes.)


The standard says:
5.2.4.2.2:
DBL_EPSILON
the difference between 1 and the least value greater than 1 that is
representable in the given floating point type, b^(1-p)

Yes, but what you said in your tutorial was "If you add
a number smaller than this [epsilon] to 1.0, the result will
be 1.0," and that is not necessarily true. For example, on
the machine in front of me at the moment,

1.0 + DBL_EPSILON * 3 / 4

is greater than one, even though `DBL_EPSILON * 3 / 4' is
25% smaller than DBL_EPSILON.
 
J

jacob navia

Eric said:
jacob navia wrote On 06/29/07 10:56,:
Eric said:
Eric Sosman wrote On 06/29/07 07:31,:

jacob navia wrote:

[...]
If you add a number smaller than
this to 1.0, the result will be 1.0. For the different representations
we have in the standard header <float.h>:
Finally, we get to an understandable definition: x is the
FP epsilon if 1+x is the smallest representable number greater
than 1 (when evaluated in the appropriate type). [...]
Now that I think of it, the two descriptions are
not the same. Mine is correct, as far as I know, but
Jacob's is subtly wrong. (Hint: Rounding modes.)

The standard says:
5.2.4.2.2:
DBL_EPSILON
the difference between 1 and the least value greater than 1 that is
representable in the given floating point type, b^(1-p)

Yes, but what you said in your tutorial was "If you add
a number smaller than this [epsilon] to 1.0, the result will
be 1.0," and that is not necessarily true. For example, on
the machine in front of me at the moment,

1.0 + DBL_EPSILON * 3 / 4

is greater than one, even though `DBL_EPSILON * 3 / 4' is
25% smaller than DBL_EPSILON.

This is because your machine makes calculations in extended
precision since DBL_EPSILON must by definition be the smallest
quantity where 1+DBL_EPSILON != 1
 
A

Army1987

jacob navia said:
Army1987 said:
jacob navia said:
Richard Heathfield wrote:
jacob navia said:

<snip>

For the different
representations we have in the standard header <float.h>:

#define FLT_EPSILON 1.19209290e-07F // float
#define DBL_EPSILON 2.2204460492503131e-16 // double
#define LDBL_EPSILON 1.084202172485504434007452e-19L //long double
// qfloat epsilon truncated so that it fits in this page...
#define QFLT_EPSILON 1.09003771904865842969737513593110651 ... E-106
Conforming implementations must not define QFLT_EPSILON in <float.h>
The C standard paragraph J.5.6: Common extensions:
J.5.6 Other arithmetic types
Additional arithmetic types, such as _ _int128 or double double, and
their appropriate conversions are defined (6.2.5, 6.3.1). Additional
floating types may have more range or precision than long double, may be
used for evaluating expressions of other floating types, and may be
used to define float_t or double_t.
Where does it allow you to use an identifier which doesn't begin
with an underscore and the standard never reserves?
What happens if i write a program with the lines
#include <float.h>
int main(int QFLT_EPSILON, char *argv[])
(yes, I'm allowed to do that) and try to compile it with your
compiler?

You will have to add the -ansic flag to your compilation flags
Thanks. (Not that I might ever be going to do that, but if I can
disable that, it is no worse (at least under this aspect) than gcc
not allowing me to name a variable random if I don't use -ansi
because of the POSIX function named that way.)
 
A

Army1987

jacob navia said:
The C standard assumes IEEE 754 representation Chuck.

It doesn't. It says that an implementation only can define
__STDC_IEC_559__ when it comforms to that standard. C doesn't even
require FLT_RADIX to be a power of 2.

(Hey, n1124.pdf has a blank between the two underscores...)
 
A

Army1987

jacob navia said:
Eric said:
jacob navia wrote On 06/29/07 10:56,:
Eric Sosman wrote:

Eric Sosman wrote On 06/29/07 07:31,:

jacob navia wrote:

[...]
If you add a number smaller than this to 1.0, the result will be 1.0. For the different representations we have in the
standard header <float.h>:
Finally, we get to an understandable definition: x is the
FP epsilon if 1+x is the smallest representable number greater
than 1 (when evaluated in the appropriate type). [...]
Now that I think of it, the two descriptions are
not the same. Mine is correct, as far as I know, but
Jacob's is subtly wrong. (Hint: Rounding modes.)


The standard says:
5.2.4.2.2:
DBL_EPSILON
the difference between 1 and the least value greater than 1 that is representable in the given floating point type, b^(1-p)

Yes, but what you said in your tutorial was "If you add
a number smaller than this [epsilon] to 1.0, the result will
be 1.0," and that is not necessarily true. For example, on
the machine in front of me at the moment,

1.0 + DBL_EPSILON * 3 / 4

is greater than one, even though `DBL_EPSILON * 3 / 4' is
25% smaller than DBL_EPSILON.

This is because your machine makes calculations in extended
precision since DBL_EPSILON must by definition be the smallest
quantity where 1+DBL_EPSILON != 1

Or simplily 1.0 + DBL_EPSILON * 3 / 4 gets rounded up to
1.0 + DBL_EPSILON.
In C99 I would use nextafter(1.0, 2.0) - 1.0.
 
J

jacob navia

Army1987 said:
jacob navia said:
Eric said:
jacob navia wrote On 06/29/07 10:56,:
Eric Sosman wrote:

Eric Sosman wrote On 06/29/07 07:31,:

jacob navia wrote:

[...]
If you add a number smaller than this to 1.0, the result will be 1.0. For the different representations we have in the
standard header <float.h>:
Finally, we get to an understandable definition: x is the
FP epsilon if 1+x is the smallest representable number greater
than 1 (when evaluated in the appropriate type). [...]
Now that I think of it, the two descriptions are
not the same. Mine is correct, as far as I know, but
Jacob's is subtly wrong. (Hint: Rounding modes.)

The standard says:
5.2.4.2.2:
DBL_EPSILON
the difference between 1 and the least value greater than 1 that is representable in the given floating point type, b^(1-p)
Yes, but what you said in your tutorial was "If you add
a number smaller than this [epsilon] to 1.0, the result will
be 1.0," and that is not necessarily true. For example, on
the machine in front of me at the moment,

1.0 + DBL_EPSILON * 3 / 4

is greater than one, even though `DBL_EPSILON * 3 / 4' is
25% smaller than DBL_EPSILON.
This is because your machine makes calculations in extended
precision since DBL_EPSILON must by definition be the smallest
quantity where 1+DBL_EPSILON != 1

Or simplily 1.0 + DBL_EPSILON * 3 / 4 gets rounded up to
1.0 + DBL_EPSILON.
In C99 I would use nextafter(1.0, 2.0) - 1.0.

That's true!

I forgot about the rounding issue completely. The good "test" would be
DBL_EPSILON*1/4
 
J

Joe Wright

jacob said:
Eric said:
jacob navia wrote On 06/29/07 10:56,:
Eric Sosman wrote:

Eric Sosman wrote On 06/29/07 07:31,:

jacob navia wrote:

[...]
If you add a number smaller than this to 1.0, the result will be
1.0. For the different representations we have in the standard
header <float.h>:
Finally, we get to an understandable definition: x is the
FP epsilon if 1+x is the smallest representable number greater
than 1 (when evaluated in the appropriate type). [...]
Now that I think of it, the two descriptions are
not the same. Mine is correct, as far as I know, but
Jacob's is subtly wrong. (Hint: Rounding modes.)


The standard says:
5.2.4.2.2:
DBL_EPSILON
the difference between 1 and the least value greater than 1 that is
representable in the given floating point type, b^(1-p)

Yes, but what you said in your tutorial was "If you add
a number smaller than this [epsilon] to 1.0, the result will
be 1.0," and that is not necessarily true. For example, on
the machine in front of me at the moment,

1.0 + DBL_EPSILON * 3 / 4

is greater than one, even though `DBL_EPSILON * 3 / 4' is
25% smaller than DBL_EPSILON.

This is because your machine makes calculations in extended
precision since DBL_EPSILON must by definition be the smallest
quantity where 1+DBL_EPSILON != 1

I'm not sure whose point I'm taking but I offer the following:
Some time ago I wrote a utility, one of whose facilities is
to display binary representations of Floating Point numbers.
The following is the representation of DBL_EPSILON.

00111100 10110000 00000000 00000000 00000000 00000000 00000000 00000000
Exp = 971 (-51)
111 11001101
Man = .10000 00000000 00000000 00000000 00000000 00000000 00000000
2.2204460492503131e-16

Now this is DBL_EPSILON + 1.0

00111111 11110000 00000000 00000000 00000000 00000000 00000000 00000001
Exp = 1023 (1)
000 00000001
Man = .10000 00000000 00000000 00000000 00000000 00000000 00000001
1.0000000000000002e+00
 
A

Army1987

jacob navia said:
Army1987 said:
jacob navia said:
Eric Sosman wrote:
jacob navia wrote On 06/29/07 10:56,:
Eric Sosman wrote:

Eric Sosman wrote On 06/29/07 07:31,:

jacob navia wrote:

[...]
If you add a number smaller than this to 1.0, the result will be 1.0. For the different representations we have in the
standard header <float.h>:
Finally, we get to an understandable definition: x is the
FP epsilon if 1+x is the smallest representable number greater
than 1 (when evaluated in the appropriate type). [...]
Now that I think of it, the two descriptions are
not the same. Mine is correct, as far as I know, but
Jacob's is subtly wrong. (Hint: Rounding modes.)

The standard says:
5.2.4.2.2:
DBL_EPSILON
the difference between 1 and the least value greater than 1 that is representable in the given floating point type, b^(1-p)
Yes, but what you said in your tutorial was "If you add
a number smaller than this [epsilon] to 1.0, the result will
be 1.0," and that is not necessarily true. For example, on
the machine in front of me at the moment,

1.0 + DBL_EPSILON * 3 / 4

is greater than one, even though `DBL_EPSILON * 3 / 4' is
25% smaller than DBL_EPSILON.

This is because your machine makes calculations in extended
precision since DBL_EPSILON must by definition be the smallest
quantity where 1+DBL_EPSILON != 1

Or simplily 1.0 + DBL_EPSILON * 3 / 4 gets rounded up to
1.0 + DBL_EPSILON.
In C99 I would use nextafter(1.0, 2.0) - 1.0.

That's true!

I forgot about the rounding issue completely. The good "test" would be
DBL_EPSILON*1/4

NO! The rounding mode needn't be to nearest!

#include <stdio.h>

int main(void)

{

volatile double oneplus = 2, epsilon = 1;

while (1 + epsilon/2 > 1) {

epsilon /= 2;

oneplus = 1 + epsilon;

}

epsilon = oneplus - 1;

printf("DBL_EPSILON is %g\n", epsilon);

return 0;

}

And i'm not very sure wheter it'd work if FLT_RADIX were not a power of 2.
 
J

jacob navia

Keith said:
It most certainly does not, and it never has.

In my copy of the standard there is a lengthy
Annex F (normative) IEC 60559 floating-point arithmetic

This annex specifies C language support for the IEC 60559 floating-point
standard. The
IEC 60559 floating-point standard is specifically Binary floating-point
arithmetic for
microprocessor systems, second edition (IEC 60559:1989), previously
designated
IEC 559:1989 and as IEEE Standard for Binary Floating-Point Arithmetic
(ANSI/IEEE 754−1985). IEEE Standard for Radix-Independent Floating-Point
Arithmetic (ANSI/IEEE 854−1987) generalizes the binary standard to remove
dependencies on radix and word length. IEC 60559 generally refers to the
floating-point
standard, as in IEC 60559 operation, IEC 60559 format, etc. An
implementation that
defines _ _STDC_IEC_559_ _ shall conform to the specifications in this
annex. Where
a binding between the C language and IEC 60559 is indicated, the IEC
60559-specified
behavior is adopted by reference, unless stated otherwise.

So, obviously in some systems no standard floating
point will be used, but that should be extremely rare.
 
K

Keith Thompson

jacob navia said:
Army1987 said:
jacob navia said:
Eric Sosman wrote:
jacob navia wrote On 06/29/07 10:56,: [...]
The standard says:
5.2.4.2.2:
DBL_EPSILON
the difference between 1 and the least value greater than 1 that
is representable in the given floating point type, b^(1-p)
Yes, but what you said in your tutorial was "If you add
a number smaller than this [epsilon] to 1.0, the result will
be 1.0," and that is not necessarily true. For example, on
the machine in front of me at the moment,

1.0 + DBL_EPSILON * 3 / 4

is greater than one, even though `DBL_EPSILON * 3 / 4' is
25% smaller than DBL_EPSILON.

This is because your machine makes calculations in extended
precision since DBL_EPSILON must by definition be the smallest
quantity where 1+DBL_EPSILON != 1
Or simplily 1.0 + DBL_EPSILON * 3 / 4 gets rounded up to
1.0 + DBL_EPSILON.
In C99 I would use nextafter(1.0, 2.0) - 1.0.

That's true!

I forgot about the rounding issue completely. The good "test" would be
DBL_EPSILON*1/4

And what if inexact results are always rounded up? I don't believe
anything in the C standard disallows that.
 
E

Eric Sosman

jacob navia wrote On 06/29/07 13:43,:
Eric said:
jacob navia wrote On 06/29/07 10:56,:
Eric Sosman wrote:


Eric Sosman wrote On 06/29/07 07:31,:


jacob navia wrote:


[...]
If you add a number smaller than
this to 1.0, the result will be 1.0. For the different representations
we have in the standard header <float.h>:

Finally, we get to an understandable definition: x is the
FP epsilon if 1+x is the smallest representable number greater
than 1 (when evaluated in the appropriate type). [...]

Now that I think of it, the two descriptions are
not the same. Mine is correct, as far as I know, but
Jacob's is subtly wrong. (Hint: Rounding modes.)


The standard says:
5.2.4.2.2:
DBL_EPSILON
the difference between 1 and the least value greater than 1 that is
representable in the given floating point type, b^(1-p)

Yes, but what you said in your tutorial was "If you add
a number smaller than this [epsilon] to 1.0, the result will
be 1.0," and that is not necessarily true. For example, on
the machine in front of me at the moment,

1.0 + DBL_EPSILON * 3 / 4

is greater than one, even though `DBL_EPSILON * 3 / 4' is
25% smaller than DBL_EPSILON.


This is because your machine makes calculations in extended
precision since DBL_EPSILON must by definition be the smallest
quantity where 1+DBL_EPSILON != 1

Wrong again, Jacob. Twice.

First, the machine I'm using does not even have an
extended precision, much less use one. Its double is
a plain vanilla IEEE 64-bit double, and that's that.
What it does have, though, in common with other IEEE
implementations, is the ability to deliver a correctly
rounded answer. 1+DBL_EPSILON*3/4 is (mathematically)
closer to 1+DBL_EPSILON than it is to 1, so the sum
rounds up rather than down. (Did you not read the hint
I wrote earlier, and that you have now quoted twice?
"Not," I guess.)

Second, you have again mis-stated what the Standard
says (even after quoting the Standard's own text). The
Standard does *not* say that epsilon is the smallest
value which when added to 1 yields a sum greater than 1;
it says that epsilon is the difference between 1 and the
next larger representable number. If you think the two
statements are equivalent, you have absolutely no call
to be writing a tutorial on floating-point arithmetic!

For IEEE double, there are about four and a half
thousand million million distinct values strictly less
than DBL_EPSILON which when added to 1 will produce the
sum 1+DBL_EPSILON in "round to nearest" mode, which is
the mode in effect when a C program starts.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,816
Messages
2,569,711
Members
45,499
Latest member
BreannaWhi

Latest Threads

Top