Float comparison

CBFalconer · May 19, 2009

Keith said:
That's nice. Now would you care to answer the question that I
actually asked?

It's obviously NO. An fp-object-value came from a fp-object.

CBFalconer · May 19, 2009

Richard said:
CBFalconer said:

Replace fp-value with real value, and the answer is YES.

Click to expand...

The problem being that, with vanishingly few[1] exceptions, you
can't introduce a real value into a C program.

[1] Vanishingly few in comparison to the total number of reals.

You seem unable to concentrate. There have been multiple messages
illustrating the process of specifying an exact real to a C
program.

Keith Thompson · May 19, 2009

Richard Heathfield said:
Ike Naar said:

Terry Pratchett has commented on the relationship between
exclamation mark density and sanity.

!I! !c!o!u!l!d!n!'!t! !p!o!s!s!i!b!l!y! !c!o!m!m!e!n!t!.!

CBFalconer · May 19, 2009

Ben said:
.... snip ...

I can't parse this at all. Is ymin not the largest real that
(theoretically) converts to x (y's predecessor)? Is EPSILON not
1/8 in your example system? Does "(y-epsilon)==(y*(1-EPSILON))"
use real or floating point operations (i.e. how much does the
last "these" refer to?

I tried to complete your example by you disagree with my result.
You tell us. Give x = 1.0 what are y, xmax and ymin in your
3-bit floating point example? It sounds as if you think this
idea is a simple one, so show us how simple it is by showing us
a what a few numbers represent (i.e,. what ranges) in your
simple example system.

I wish you had quoted my original. I spent about 15 minutes
looking for it here, and didn't find it. ... found it, much nearer
than I was looking. (I have 343 messages in this thread).

---- requote ----
Lets see if I can leap over the confusion. Consider a system with
a 4 bit significand, and an 8 bit exponent. The exponent uses the
value 128 to signify times 2 to the 0th power. We suppress the
msbit in the significand and replace it with a sign bit. The
significand can hold 0 through 15. Thus:

Exponent Significand Means
128 0 = 0x0 1.0
128 1 = 0x1 1.0 + 1/8
128 2 = 0x2 1.0 + 1/4
128 4 = 0x4 1.0 + 1/2
128 7 = 0x7 1.0 + 7/8
128 8 = 0x8 -1.0 /* the sign bit appeared */
128 9 = 0x9 -1.0 - 1/8
128 10 = 0xa -1.0 - 1/4
128 12 = 0xc -1.0 - 1/2
128 15 = 0xf -1.0 - 7/8

if we raise the exponent by 1, we double the value in Means. If we
lower it by one, we halve the values in Means. I hope we are
agreed so far.

Now, what is the EPSILON involved here. Obviously if we add 1/8 to
1.0, we get the next value. But that doesn't consider the rounding
done by the hardware. We only need to add 1/16 to get that
effect. What is the value 1/16 in that system?

127 0 1/2
126 0 1/4
125 0 1/8
124 0 1/16 /* Aha */

1.0 + 1/16 will round up to 1.0 + 1/8. /* assume usual rounding */

What does this result look like? See above. Only one least
significant bit is changed. So we have found EPSILON to be 1/16,
and the result from nextafter would be 1.0 + 1/8.

---- end requote ----

Alright, this has already gone over the calculation of xmax and
then generating y. Summarized:

x = 1.0
xmax = 1.0 + 1/16 (= x + EPSILON = x*(1+EPSILON) = x*(1+1/16)
y = 1.0 + 1/8

now we calculate ymin by using y*(1-EPSILON). Substute the value
of y:

ymin = (1.0+1/8)*(1-1/16)

This is NOT the same as xmax. It differs by 1/8 * 1/16.

I hope this answers your question.

CBFalconer · May 19, 2009

Keith said:
Please humor us anyway.

For now, a concrete example for your implementation will suffice.

See my answer to Ben Bacarisse in the past few minutes, using my
small sample fp system.

CBFalconer · May 19, 2009

Keith said:
I honestly couldn't tell that you were joking.

You've been making numerous technical claims here that are, in
my opinion, not only incorrect but nonsensical. It's difficult
to distinguish between your nonsensical claims that you actually
believe and the ones that are meant to be humorous.

(If it turns out that this whole thing was your idea of a joke,
I will not be happy.)

Relax and be happy. I am dead serious. The phrasing was
'humorous', but the subject was not. Did you never labor mightily
over an antagonistic grill to produce burnt hamburgers?

CBFalconer · May 19, 2009

Keith said:
Are you *trying* to confuse the issue?

So, are both statements true?

Now I am confused myself, so I better not answer. I think we are
getting mixed up in the 'real valued' and the 'fp-valued'
epsilons. The real one changes with each different fp-object-value
(the x*(1+EPSILON) factor, as compared to x+epsilon, where epsilon
is the real valued one. I am just mulling here, not trying to give
a direct answer.

CBFalconer · May 19, 2009

Keith said:
And you concluded from this that actual released
implementations are likely to be buggy how, exactly? Never
mind, I probably don't want to know.

No, I said I expected them to be more likely to show up. Do you
expect every generator of a library to take equal care? Do you
expect a library from Navia to be as accurate as one from
Dinkumware? Don't forget that Navia has claimed to have
implemented C99.

CBFalconer · May 19, 2009

Keith said:
.... snip ...

I think your knowledge of fp systems is flawed, or at least
inconsistent with the C standard.

I don't. I think the C standard is incomplete in this regard.

I've asked you this before but ... if your model is valid, then it
is a set of fundamental facts about floating-point numbers, facts
without which it is difficult or impossible to understand what FP
numbers really mean. Why then does the standard say so little
about these ranges of yours? If you had written C99 5.2.4.2.2, I
presume the word "range" would have appeard many times; why does
the standard use the word "range" only in reference to the full
range of a floating-point type, never to the range represented by
a single value?

When you say 'floating-point numbers' above, are you talking about
real values, or values that the fp-object can hold and spit back?
I don't think it matters too much there, but it illustrates the
things that need caution.

I have to be guessing about the why. Because it does get
complicated by the use of real real values, and those
approximations held by the fp-object, etc. For most operations it
all doesn't matter - the fp-object accuracy is more than
sufficient. I'm sure I still have some things mixed up. But there
are times when the 'range' shows up in spades - differences between
nearly equal fp-objects stand out. That sort of thing strongly
affects matrix inversion.

Keith Thompson · May 19, 2009

CBFalconer said:
Yes it does (usually). That is not a static value. The C system
has generated initialization code, that is executed on the entry to
the function, which reserves the space and initializes it. It MAY
have generated a fp-constant to be jammed in, but it more likely
generates a 1.0 and a 3.0 and tells the system to perform a divide
and store the result. When the 1 and the 3 exist, and not the
fp-value, you have the actual one-third real value indicated.
After the storage, it has been approximated.

That is absolutely wrong. I honestly can't imagine where you get
these ideas. Sorry to be harsh, but this is frustrating.

(It's likely that the expression 1.0/3.0 will be optimized to a
constant value and no division operation will actually be executed.
In fact, I'd be mildly astonished by a compiler that *didn't* perform
this simple optimization. But I'll ignore that and stick to the
semantics of the abstract machine, in which the division actually
occurs.)

We have an expression 1.0/3.0, with subexpressions 1.0 and 3.0. Each
subexpression is of type double, with a value that can be represented
exactly; there's no loss of precision in the evaluation of the
floating-point constants. At this point, after the constants are
evaluted but before the division is evaluated, we have two distinct FP
values, 1.0 and 3.0.

Then the division is evaluated. The division operation takes two
double operands, with values 1.0 and 3.0, and yields a floating-point
result; on my system, that result is exactly
0.333333333333333314829616256247390992939472198486328125 .

At this point, we went directly from two FP values, 1.0 and 3.0, to a
single FP value,
0.333333333333333314829616256247390992939472198486328125 . There may
have been some intermediate results computed during the evaluation of
the division operator, but any such intermediates are outside the
scope of C and of this discussion. (Quibble: the standard allows
intermediate results to be computed to greater precision than
specified, but not to then infinite precision that would be needed to
represent one-third exactly; we can ignore that without loss of
generality.)

At no time was the real value one-third computed, stored, or
represented. The operands 1.0 and 3.0 do not represent one-third,
they represent two distinct FP numbers. The result of the division
does not represent one-third, it represents an FP value that's a close
approximation of one-third.

C (non-static) initialized objects are quirky. They really consist
of calls to some routine or other to set the value in the object,
and maybe to calculate it. That's why they can be fairly complex.
The code is still there.

The only thing that's relevent here is that the expression 1.0/3.0 is
evaluated and the result is stored in x.

Ok, here's another example; imagine a complete program surrounding
this fragment if you like:

double x = 1.0/3.0;
double y = 0.333333333333333314829616256247390992939472198486328125;

Assuming the FP result of 1.0/3.0 is as I've said above, after the
above declarations have been executed, is there any difference between
x and y? Note that (x == y) will yield 1. Note also that the real
number one-third never exists during the initialization of y; when I
write 0.333333333333333314829616256247390992939472198486328125, I
simply mean 0.333333333333333314829616256247390992939472198486328125,
a number that is not equal to one-third.

The initialization of y assigns that particular value to y. The
initialization of x does exactly the same thing.

"A difference that makes no difference *is* no difference."

Keith Thompson · May 19, 2009

CBFalconer said:
Keith said:

CBFalconer said:

Keith Thompson wrote:

... snip ...

The function has a defined input and output. The input defines
the real value 1/3. The output is the fp-value representing that.

No, the inputs are the int value 1 and the int value 3.

No. I can put them in a structure:

Click to expand...

[snip]

But in the code you posted you didn't put them in a structure, so
what's your point?

Click to expand...

But the only purpose of any code was to illustrate ways of
specifying EXACTLY the real 1/3.

It failed. The relevant code does not specify the real value 1/3;
presenting irrelevant code that does so doesn't change that.

And thus to illustrate the rough
point at which that specification was converted into an fp-object
approximated value. If you had worked with transcendentals
generating the specification would have been harder.

(1/3 isn't transcendental; it's just not representable in
floating-point.)

Keith Thompson · May 19, 2009

CBFalconer said:
Alright, this has already gone over the calculation of xmax and
then generating y. Summarized:

x = 1.0
xmax = 1.0 + 1/16 (= x + EPSILON = x*(1+EPSILON) = x*(1+1/16)
y = 1.0 + 1/8

now we calculate ymin by using y*(1-EPSILON). Substute the value
of y:

ymin = (1.0+1/8)*(1-1/16)

This is NOT the same as xmax. It differs by 1/8 * 1/16.

I hope this answers your question.

And here we have a result that I have difficulty believing is what you
intended.

Let's put all this in decimal notation, so we can compare things more
easily:

x = 1.0
xmax = 1.0625
ymin = 1.0546875
y = 1.125

x and y are adjacent FP numbers; nextafter(x, +INFINITY) == y.

So the real ranges represented by these two distinct FP values
overlap.

Does that really make sense to you?

Assume the FP type we're dealing with here is called "tinyfloat", and
consider the following:

double x = 1.0;
double foo = 1.05859375; // halfway between ymin and xmax
double y = 1.125;

printf("x = %f\n", x);
printf("y = %f\n", y);
printf("foo = %f\n", foo);

I hope you'll agree that the first two lines of output will be:

x = 1.0
y = 1.125

What is the third line of output?

Ben Bacarisse · May 19, 2009

CBFalconer said:
I wish you had quoted my original. I spent about 15 minutes
looking for it here, and didn't find it. ...

I did. You snipped it in your reply to me. Now you say I should have
thought to put it back? Really!

found it, much nearer
than I was looking. (I have 343 messages in this thread).

---- end requote ----

Alright, this has already gone over the calculation of xmax and
then generating y. Summarized:

x = 1.0
xmax = 1.0 + 1/16 (= x + EPSILON = x*(1+EPSILON) = x*(1+1/16)
y = 1.0 + 1/8

now we calculate ymin by using y*(1-EPSILON). Substute the value
of y:

ymin = (1.0+1/8)*(1-1/16)

This is NOT the same as xmax. It differs by 1/8 * 1/16.

I hope this answers your question.

It helps. I think the problem is what you mean by EPSILON. You have
repeatedly quoted the section from the C standard that talks about
DBL_EPSILON and FLT_EPSILON and also, I think, stated that this is
what you mean by the term.

5.2.4.2.2 p11 says of the three epsilons:

"the difference between 1 and the least value greater than 1 that is
representable in the given floating point type, b^(1-p)"

In your example p=4 (C counts the hidden 1 bit in normalised numbers
as a digit) and b=2. EPSILON is 2^-3 or 1/8, not 1/16. This is also
clear from the wording. 1+1/8 is the least value greater than 1 that
is representable in the system.

How can you expect people to follow what you mean with this sort of
confusion? Here, in case it helps, are the ranges for a few numbers
around 1 using the formulae you have often quoted (but I did not
believe because of the overlap) using both your odd EPSILON (1/16) and
the real one (1/8):

range of f using EPSILON=1/16 range of f using EPSILON=1/8
f f*(1-1/16) f*(1+1/61) f*(1-1/8) f*(1+1/8)
-------------------------------------------------------------------
7/8 105/128 119/128 49/64 63/64
8/8 120/128 136/128 65/64 73/64
9/8 135/128 153/128 63/64 81/64
10/8 150/128 170/128 70/64 90/64
11/8 165/128 187/128 77/64 99/64

Notice that you get overlapping ranges with both. These numbers are
the exact rationals that the formula produces.

Are either of these the ranges you have in mind? What, finally, is
EPSILON and does it have any relationship to the ones in the C
standard?

Keith Thompson · May 19, 2009

CBFalconer said:
Now I am confused myself, so I better not answer. I think we are
getting mixed up in the 'real valued' and the 'fp-valued'
epsilons. The real one changes with each different fp-object-value
(the x*(1+EPSILON) factor, as compared to x+epsilon, where epsilon
is the real valued one. I am just mulling here, not trying to give
a direct answer.

The I implore you to stop using the term "epsilon" or "EPSILON". Feel
free to use DBL_EPSILON if and only if you're referring to the
constant defined in 5.2.4.2.2p11. You have muddied the meaning of
"EPSILON" so thoroughly that I'm no longer willing to figure out what
you mean by it.

Keith Thompson · May 19, 2009

CBFalconer said:
No, I said I expected them to be more likely to show up.

[...]

You said "likely", not "more likely". I took that to mean that ...

Oh, forget it. I will take your speculation on this point seriously
if and only if you can cite a released implementation with a buggy
nextafter() function. Let's drop it.

Keith Thompson · May 19, 2009

CBFalconer said:
I don't. I think the C standard is incomplete in this regard.

When you say 'floating-point numbers' above, are you talking about
real values, or values that the fp-object can hold and spit back?
I don't think it matters too much there, but it illustrates the
things that need caution.

I mean both; they're essentially the same. The standard defines, for
each floating-point number (excluding NaNs and infinities) the real
number that is its value in the model.

And FP objects are irrelevant here; a value needn't be stored in an
object, and storing it in an object doesn't change it.

(An aside: The standard's definition of "value" in 3.17 is limited to
values of objects, but I believe that's merely an oversight that we
can ignore for our current purpose. Several fundamental definitions
in the standard are similarly flawed. Note that the definition of
"expression" in C99 6.5p1 clearly says that an expression can compute
a value, so its clear that objects are not necessary to understand
values.)

I have to be guessing about the why. Because it does get
complicated by the use of real real values, and those
approximations held by the fp-object, etc. For most operations it
all doesn't matter - the fp-object accuracy is more than
sufficient. I'm sure I still have some things mixed up. But there
are times when the 'range' shows up in spades - differences between
nearly equal fp-objects stand out. That sort of thing strongly
affects matrix inversion.

Differences between nearly equal FP values stand out just as strongly
when you use a model in which each FP value (or fp-object if you
insist) corresponds to a single real value. Your range model is not
helpful.

Section 5.2.4.2.2 goes into a great deal of fine detail about
floating-point numbers. If these ranges were such a fundamental
concept, if it weren't possible to understand floating-point numbers
without understanding that each one represents a range of real
numbers, the standard would say so. This was not an accidental
omission, nor was it something left out because it was too
complicated. These ranges of yours simply aren't part of the model.

5.2.4.2.2p1-2 describes the model in a way that is absolutely
incompatible with your claims.

Richard Tobin · May 20, 2009

Richard Heathfield said:
Fine. Please show me how to specify the (positive) square root of 2
to a C program. Not an approximation, please, but the *exact*
number, in its full, real, and indeed irrational glory.

It is of course perfectly possible to represent it, just not as a
floating point number. For example:

enum operator { SUM, DIFFERENCE, PRODUCT, QUOTIENT, POWER };
struct expression { ... }

.... you can see where this is going. And of course there are real
symbolic algebra systems that work like this.

I suspect that Chuck's view is that an expression like sqrt(2.0) is of
this kind: it exactly represents the square root of 2.0, and it's only
when it's evaluated that it gets approximated by a floating point
value.

-- Richard

Richard Bos · May 20, 2009

CBFalconer said:
Certainly it did. It handed the integers 1 and 3 to a divide
routine.

If it handed the _integers_ 1 and 3 to a divide routine, the result
would have been the integer 0, and irrelevant to the representation
and/or values of floating point numbers.
If it handed the floating point numbers 1.0 and 3.0 to a divide routine,
that divide routine _explicitly_ computed _the nearest approximation of_
one-third in the floating point system.
Nowhere, but nowhere, is the real value one-third computed by any ISO C
division expression.

The divide routine labored mightily, but failed to complete the
division, so it returned the nearest value it could calculate.

I'm afraid that indicated a tragic but obvious misunderstanding of how
floating point operations work. They _always_ work by approximation, and
the approximation is only exact when you're lucky, not the other way
'round.

Richard

Richard Bos · May 20, 2009

CBFalconer said:
People do not seem to appreciate my attempt to inject a touch of
humor/humour.

Consider that, if your jokes sound as dumb as your attempts at serious
agrument, _you might be wrong_.

Richard

Richard Bos · May 20, 2009

Richard said:
Can anyone hazard a guess how many time Chucky has proudly mentioned
"Epsilon" as it were somehow making him seem big and clever?

I suspect he hasn't read Brave New World.

Richard

Need Helping adding Square root code to an existing calculator. (Absolute begginer?)	0	Jan 12, 2025
How to alter the program so that when user types z or Z or 0, the program sets both a and b to zero?	0	Oct 10, 2022
Where is my mistake? Why is s equal to minus infinity at some loop iterations?	0	Oct 9, 2022
Comparison of Integer and Pointer (that's supposed to be an Integer). Where did I go wrong?	0	Nov 19, 2022
Structures and chained lists questions :	1	Feb 12, 2011
Rich Text Format (RTF) Document Builder in C++: Code and Features	0	Sep 28, 2025
Runtime Error with __gcd? (floating point exception)	1	Nov 27, 2024
Secure Keyboard v2.0 Modern C++ Virtual Keyboard for Windows (Glassmorphism UI, Clipboard Auto-Clear)	0	Mar 26, 2026

Float comparison

CBFalconer

CBFalconer

Keith Thompson

CBFalconer

CBFalconer

CBFalconer

CBFalconer

CBFalconer

CBFalconer

Keith Thompson

Keith Thompson

Keith Thompson

Ben Bacarisse

Keith Thompson

Keith Thompson

Keith Thompson

Richard Tobin

Richard Bos

Richard Bos

Richard Bos

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads