The machine epsilon

Army1987 · Jun 29, 2007

jacob navia said:
In my copy of the standard there is a lengthy
Annex F (normative) IEC 60559 floating-point arithmetic

This annex specifies C language support for the IEC 60559 floating-point standard. The
IEC 60559 floating-point standard is specifically Binary floating-point arithmetic for
microprocessor systems, second edition (IEC 60559:1989), previously designated
IEC 559:1989 and as IEEE Standard for Binary Floating-Point Arithmetic
(ANSI/IEEE 754?1985). IEEE Standard for Radix-Independent Floating-Point
Arithmetic (ANSI/IEEE 854?1987) generalizes the binary standard to remove
dependencies on radix and word length. IEC 60559 generally refers to the floating-point
standard, as in IEC 60559 operation, IEC 60559 format, etc. An implementation that
defines _ _STDC_IEC_559_ _ shall conform to the specifications in this annex. Where
a binding between the C language and IEC 60559 is indicated, the IEC 60559-specified
behavior is adopted by reference, unless stated otherwise.

So, obviously in some systems no standard floating
point will be used, but that should be extremely rare.

Read the first words of the last sentence you copynpasted.
The standard explicitly allows FLT_RADIX to be even a power of 10.
Read 5.2.4.2.2 throughout.

jacob navia · Jun 29, 2007

Eric said:
jacob navia wrote On 06/29/07 13:43,:

Eric said:

jacob navia wrote On 06/29/07 10:56,:

Eric Sosman wrote:

Eric Sosman wrote On 06/29/07 07:31,:

jacob navia wrote:

[...]
If you add a number smaller than
this to 1.0, the result will be 1.0. For the different representations
we have in the standard header <float.h>:
Finally, we get to an understandable definition: x is the
FP epsilon if 1+x is the smallest representable number greater
than 1 (when evaluated in the appropriate type). [...]
Now that I think of it, the two descriptions are
not the same. Mine is correct, as far as I know, but
Jacob's is subtly wrong. (Hint: Rounding modes.)

The standard says:
5.2.4.2.2:
DBL_EPSILON
the difference between 1 and the least value greater than 1 that is
representable in the given floating point type, b^(1-p)
Yes, but what you said in your tutorial was "If you add
a number smaller than this [epsilon] to 1.0, the result will
be 1.0," and that is not necessarily true. For example, on
the machine in front of me at the moment,

1.0 + DBL_EPSILON * 3 / 4

is greater than one, even though `DBL_EPSILON * 3 / 4' is
25% smaller than DBL_EPSILON.

Click to expand...

This is because your machine makes calculations in extended
precision since DBL_EPSILON must by definition be the smallest
quantity where 1+DBL_EPSILON != 1

Click to expand...

Wrong again, Jacob. Twice.

First, the machine I'm using does not even have an
extended precision, much less use one. Its double is
a plain vanilla IEEE 64-bit double, and that's that.
What it does have, though, in common with other IEEE
implementations, is the ability to deliver a correctly
rounded answer. 1+DBL_EPSILON*3/4 is (mathematically)
closer to 1+DBL_EPSILON than it is to 1, so the sum
rounds up rather than down. (Did you not read the hint
I wrote earlier, and that you have now quoted twice?
"Not," I guess.)

Second, you have again mis-stated what the Standard
says (even after quoting the Standard's own text). The
Standard does *not* say that epsilon is the smallest
value which when added to 1 yields a sum greater than 1;
it says that epsilon is the difference between 1 and the
next larger representable number. If you think the two
statements are equivalent, you have absolutely no call
to be writing a tutorial on floating-point arithmetic!

For IEEE double, there are about four and a half
thousand million million distinct values strictly less
than DBL_EPSILON which when added to 1 will produce the
sum 1+DBL_EPSILON in "round to nearest" mode, which is
the mode in effect when a C program starts.

Great Eric, unnormalized numbers exist.

What's your point?

Jacob is always wrong.

Granted.

I am always wrong. Happy?

Walter Roberson · Jun 29, 2007

Eric Sosman wrote:

Great Eric, unnormalized numbers exist.

What's your point?

Eric isn't talking about unnormalized numbers.

Consider DBL_EPSILON . It is noticably less than 1/2 so in IEEE
754 format, it will be normalized as some negative exponent
followed by a hidden 1 followed by some 53 bit binary fraction.
Take that 53 bit binary fraction as an integer and subtract 1 from
it, and construct a double with the same exponent as DBL_EPSILON
but the reduced fraction. Call the result, say, NEARLY_DBL_EPSILON .
Now, take 1 + NEARLY_DBL_EPSILON . What is the result?
Algebraicly, it isn't quite 1 + DBL_EPSILON, but floating point
arithmetic doesn't obey normal algebra, so the result depends upon
the IEEE rounding mode in effect. If "round to nearest" (or
perhaps some other modes) is in effect, the result is close enough to
1 + DBL_EPSILON that the processor will round the result to
1 + DBL_EPSILON . C only promises accuracy to at best 1 ULP
("Unit in the Last Place"), so this isn't "wrong" (and what
exactly is right or wrong in such a case is arguable.)

Flash Gordon · Jun 29, 2007

jacob navia wrote, On 29/06/07 22:07:

Great Eric, unnormalized numbers exist.

What's your point?

His point was that the information you were putting in your tutorial was
incorrect, and your refutation of the correction was incorrect.

Jacob is always wrong.

Granted.

I am always wrong. Happy?

No. People would be happy if you would gracefully accept corrections.
Why do you think that if someone finds an error in what you post they
are out to get you?

Mark McIntyre · Jun 29, 2007

Richard Tobin said:

Shurely shome mishtake?

On your part, yes.

The internal evidence of his articles suggests otherwise.

And now you're being gratuitous too. Give Jacob his due - he's trying
to write a C tutorial, he's very sensibly asking for a review here,
and is prepared to take in comments even from people who he has had
run-ins with before. Stop being so childish.
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

Eric Sosman · Jun 29, 2007

jacob navia wrote On 06/29/07 17:07,:

Eric said:
Eric said:

jacob navia wrote On 06/29/07 13:43,:

Eric Sosman wrote:

jacob navia wrote On 06/29/07 10:56,:

Eric Sosman wrote:

Eric Sosman wrote On 06/29/07 07:31,:

jacob navia wrote:

[...]
If you add a number smaller than
this to 1.0, the result will be 1.0. For the different representations
we have in the standard header <float.h>:

Finally, we get to an understandable definition: x is the
FP epsilon if 1+x is the smallest representable number greater
than 1 (when evaluated in the appropriate type). [...]

Now that I think of it, the two descriptions are
not the same. Mine is correct, as far as I know, but
Jacob's is subtly wrong. (Hint: Rounding modes.)

The standard says:
5.2.4.2.2:
DBL_EPSILON
the difference between 1 and the least value greater than 1 that is
representable in the given floating point type, b^(1-p)

Yes, but what you said in your tutorial was "If you add
a number smaller than this [epsilon] to 1.0, the result will
be 1.0," and that is not necessarily true. For example, on
the machine in front of me at the moment,

1.0 + DBL_EPSILON * 3 / 4

is greater than one, even though `DBL_EPSILON * 3 / 4' is
25% smaller than DBL_EPSILON.

This is because your machine makes calculations in extended
precision since DBL_EPSILON must by definition be the smallest
quantity where 1+DBL_EPSILON != 1

Click to expand...

Wrong again, Jacob. Twice.

First, the machine I'm using does not even have an
extended precision, much less use one. Its double is
a plain vanilla IEEE 64-bit double, and that's that.
What it does have, though, in common with other IEEE
implementations, is the ability to deliver a correctly
rounded answer. 1+DBL_EPSILON*3/4 is (mathematically)
closer to 1+DBL_EPSILON than it is to 1, so the sum
rounds up rather than down. (Did you not read the hint
I wrote earlier, and that you have now quoted twice?
"Not," I guess.)

Second, you have again mis-stated what the Standard
says (even after quoting the Standard's own text). The
Standard does *not* say that epsilon is the smallest
value which when added to 1 yields a sum greater than 1;
it says that epsilon is the difference between 1 and the
next larger representable number. If you think the two
statements are equivalent, you have absolutely no call
to be writing a tutorial on floating-point arithmetic!

For IEEE double, there are about four and a half
thousand million million distinct values strictly less
than DBL_EPSILON which when added to 1 will produce the
sum 1+DBL_EPSILON in "round to nearest" mode, which is
the mode in effect when a C program starts.

Click to expand...

Great Eric, unnormalized numbers exist.

Not in IEEE float or double (extended formats are
less tightly specified, and might allow unnormalized
numbers; I'm not sure). IEEE float and double have
"denormal numbers," which are a different matter.

And for what it's worth, none of the four and a
half thousand million million values I mentioned is a
denormal. All are in the approximate range 1.1e-16
through 2.2e-16, while the largest IEEE double denormal
is about 2.2e-308.

Are you sure you're the right person to write a
tutorial on floating-point numbers? The mistakes you
have made in this thread are of an elementary nature
uncharacteristic of someone even moderately familiar
with the topic. You might want to consider offering
your readers a URL to someone else's writeup instead,
lest you commit self-Schildtification.

What's your point?

Jacob is always wrong.

Granted.

I am always wrong. Happy?

That is not my point; "always" overstates the case.
My point is that Jacob, right or wrong, doesn't appear
to read with much care, even when what he's reading is
a response to his own request for assistance. "If you
see any errors/ambiguities/etc please just answer in
this thread" so Jacob can feel he's being persecuted.

Keith Thompson · Jun 30, 2007

jacob navia said:
In my copy of the standard there is a lengthy
Annex F (normative) IEC 60559 floating-point arithmetic

This annex specifies C language support for the IEC 60559
floating-point standard. The IEC 60559 floating-point standard is
specifically Binary floating-point arithmetic for microprocessor
systems, second edition (IEC 60559:1989), previously designated IEC
559:1989 and as IEEE Standard for Binary Floating-Point Arithmetic
(ANSI/IEEE 754âˆ’1985). IEEE Standard for Radix-Independent
Floating-Point Arithmetic (ANSI/IEEE 854âˆ’1987) generalizes the
binary standard to remove dependencies on radix and word length. IEC
60559 generally refers to the floating-point standard, as in IEC
60559 operation, IEC 60559 format, etc. An implementation that
defines _ _STDC_IEC_559_ _ shall conform to the specifications in
this annex. Where a binding between the C language and IEC 60559 is
indicated, the IEC 60559-specified behavior is adopted by reference,
unless stated otherwise.

Right. The relevant sentence is:

An implementation that defines __STDC_IEC_559__ shall conform to
the specifications in this annex.

See also C99 6.18.8p2, "Predefined macro names":

The following macro names are conditionally defined by the
implementation:

__STDC_IEC_559__ The integer constant 1, intended to indicate
conformance to the specifications in annex F (IEC 60559
floating-point arithmetic).

[...]

An implementation is not required to conform to annex F. It's merely
required to do so *if* it defines __STDC_IEC_559__.

So, obviously in some systems no standard floating
point will be used, but that should be extremely rare.

Why should they be rare? Most new systems these days do implement
IEEE floating-point, or at least use the formats, but there are still
systems that don't (VAX, some older Crays, IBM mainframes). Annex F
merely provides a framework for implementations that do support IEEE
FP, but it's explicitly optional.

I can also imagine an implementation that uses the IEEE floating-point
formats, but doesn't meet the requirements of Annex F; such an
implementation could not legally define __STDC_IEC_559__. (I have no
idea whether such implementations exist, or how common they are.)

And of course the __STDC_IEC_559__ and annex F are new in C99, so they
don't apply to the majority of implementations that don't conform to
the C99 standard.

Keith Thompson · Jun 30, 2007

Here's *my* floating-point tutorial:

Read "What every computer scientist should know about floating-point
arithmetic", by David Goldberg.

jacob navia · Jun 30, 2007

Army1987 said:
jacob navia said:

Army1987 said:

"jacob navia" <[email protected]> ha scritto nel messaggio Eric Sosman wrote:
jacob navia wrote On 06/29/07 10:56,:
Eric Sosman wrote:

Eric Sosman wrote On 06/29/07 07:31,:

jacob navia wrote:

[...]
If you add a number smaller than this to 1.0, the result will be 1.0. For the different representations we have in the
standard header <float.h>:
Finally, we get to an understandable definition: x is the
FP epsilon if 1+x is the smallest representable number greater
than 1 (when evaluated in the appropriate type). [...]
Now that I think of it, the two descriptions are
not the same. Mine is correct, as far as I know, but
Jacob's is subtly wrong. (Hint: Rounding modes.)

The standard says:
5.2.4.2.2:
DBL_EPSILON
the difference between 1 and the least value greater than 1 that is representable in the given floating point type, b^(1-p)
Yes, but what you said in your tutorial was "If you add
a number smaller than this [epsilon] to 1.0, the result will
be 1.0," and that is not necessarily true. For example, on
the machine in front of me at the moment,

1.0 + DBL_EPSILON * 3 / 4

is greater than one, even though `DBL_EPSILON * 3 / 4' is
25% smaller than DBL_EPSILON.

This is because your machine makes calculations in extended
precision since DBL_EPSILON must by definition be the smallest
quantity where 1+DBL_EPSILON != 1
Or simplily 1.0 + DBL_EPSILON * 3 / 4 gets rounded up to
1.0 + DBL_EPSILON.
In C99 I would use nextafter(1.0, 2.0) - 1.0.

Click to expand...

That's true!

I forgot about the rounding issue completely. The good "test" would be
DBL_EPSILON*1/4

Click to expand...

NO! The rounding mode needn't be to nearest!

#include <stdio.h>

int main(void)

{

volatile double oneplus = 2, epsilon = 1;

while (1 + epsilon/2 > 1) {

epsilon /= 2;

oneplus = 1 + epsilon;

}

epsilon = oneplus - 1;

printf("DBL_EPSILON is %g\n", epsilon);

return 0;

}

And i'm not very sure wheter it'd work if FLT_RADIX were not a power of 2.

This program produces in my machine (x86)
DBL_EPSILON is 0

jacob navia · Jun 30, 2007

Army1987 said:
NO! The rounding mode needn't be to nearest!

According to the C standard:
(Annex F.7.3)

At program startup the floating-point environment is initialized as
prescribed by IEC 60559:
— All floating-point exception status flags are cleared.

— The rounding direction mode is rounding to nearest. (!!!!!)

— The dynamic rounding precision mode (if supported) is set so that
results are not shortened.

Harald van =?UTF-8?B?RMSzaw==?= · Jun 30, 2007

jacob said:
According to the C standard:
(Annex F.7.3)

At program startup the floating-point environment is initialized as
prescribed by IEC 60559:
â€” All floating-point exception status flags are cleared.

â€” The rounding direction mode is rounding to nearest. (!!!!!)

â€” The dynamic rounding precision mode (if supported) is set so that
results are not shortened.

As has been pointed out to you, annex F does not apply unless
__STDC_IEC_559__ is defined by the implementation.

JT · Jun 30, 2007

Army1987 said:
NO! The rounding mode needn't be to nearest!

jacob navia said:
According to the C standard:
(Annex F.7.3)

At program startup the floating-point environment
is initialized as prescribed by IEC 60559:
- All floating-point exception status flags are cleared.
- The rounding direction mode is rounding to nearest. (!!!!!)

To Jacob: Eric Sosman already SAID THAT on June 29
that "rounding to nearest" is the start-up default.
But the program can change the default after start up.

So the FLAWED PARAPHRASE you keep insisting is not
a universal fact. You should use the OFFICIAL DEFINITION
of epsilon, rather than the FLAWED paraphrase
you KEEP insisting.

Wrong again, Jacob. Twice.
First...
Second...
there are about four and a half
thousand million million distinct values
strictly less than DBL_EPSILON which
when added to 1 will produce the
sum 1+DBL_EPSILON in "round to nearest" mode,
which is the mode in effect when a C program starts.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Machine epsilon: conclusion	46	Jun 30, 2007
Quasi 0	19	Jun 4, 2012
Typecast long double->double seems to go wrong	4	May 24, 2004
Speaking of sum programs (let's keep Razi out of it) here's my	0	May 7, 2008
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
Request for help	22	Sep 20, 2007
Machine precision	24	Oct 20, 2003
What are the minimum numbers of bits required to represent C stdfloat and double components?	6	Apr 19, 2008

The machine epsilon

Army1987

jacob navia

Walter Roberson

Flash Gordon

Mark McIntyre

Eric Sosman

Keith Thompson

Keith Thompson

jacob navia

jacob navia

Harald van =?UTF-8?B?RMSzaw==?=

JT

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads