Float comparison

Ben Bacarisse · May 18, 2009

CBFalconer said:
No, there are not. xmax is the smallest number we can hand to this
fp-system and have it store it as y, where y == nextafter(x, x+1);

You said "xmax is a value that, when stored in a fp-object, will
produce a value larger than x when read back from that fp-object".
There are many such xmax no matter what you say. I suspected you
wanted to say the least such xmax which is why I went on:

Only if you ignore the multiplication by x or y. Since x < y then
x*(1+EPS) is not equal to y*(1-EPS). Those operations are
performed on the fp-system, so we know that 1+EPS is not 1.0, etc.
We are feeding the system actual real values, and taking what it
gives back.

No, you have omitted the step of passing the value to the fp-system
to store and looking at what it actually stored.

No I have not. If ymin < xmax what happens to those reals between
them? x and y are consecutive floats, and xmax is the smallest real
that gets stored as y so none of the numbers in this gap get stored as
y. Similarly ymin is largest real that gets stored as x so numbers
between ymin and xman don't get stored as x, so what happens if we
feed such a number to the fp-system? Are these numbers not
representable? Not in the "range" of any float? (From below the
answer seems to be "yes".)

xmax is the smallest such real THAT CAN BE expressed in the
fp-system. We can't avoid this leaping from real to fp-system and
back. At least I don't see how.

Oh dear. This is a whole new concept. How do we determine, in your
model, the reals that can be expressed and the reals that can't? In
the conventional view of FP this is simple, but I don't know what you
mean by the term.

Flash Gordon · May 18, 2009

CBFalconer said:
Keith Thompson wrote:
.... snip ...

The upper point (which I have been calling xmax) is exactly at
1.0+DBL_EPSILON. This is spelled out by the C standard.

The C standard does not spell out what you call ranges. If it did you
could point to a paragraph which said something like, "the range of
values represented by the floating point representation is..."

Everything
else is implementation defined. The lower point (xmin) is probably
at 1.0-DBL_EPSILON/2.

Presumable you mean whatever nextafter(1.0,0) would produce.

Note that these two numbers are not within
the range for 1.0, but do delimit it.

If you form that value (1.0+DBL_EPSILON) and store it in a double,
you should get the same result as from nextafter(1.0, 2.0). Watch
out for the tendency of some systems to use added precision, so you
have to be sure the value has been stored and recovered.

so what is the range of the floating-point-value 1.0+DBL_EPSILON?

(The 'attention' message was directed at Flash, not you).

I've been paying more attention than you think. Enough attention to
remember a number of questions that you have not answered even though
they have been asked more than once.

Flash Gordon · May 18, 2009

CBFalconer said:
yes, correct.

OK. In that case your range model can be changed at executiontime.
Specifically by a call to fesetround(). so the only way to find out what
your range is for a given variable at a given point in the program is to
examine the code to find out what the last call to fesetround was before
it was assigned.

.... snip ...

C only allows for them for values smaller than the smallest
possible normalized value.

I know what they are.

So this problem is normally out of the
way. In effect the 'ranges' for the unnormalized small numbers are
much wider (in relation to the number) than any other ranges.

They still need to be allowed for in your model.

I am
speaking of the usual fp-system which normalizes everything in
order to increase the accuracy by the one saved bit in the
significand.

Everyone else is talking about the model which applies to ALL C
implementations, not one that applies to only a sub-set of them.

But handy, because you can see what causes the 'range' directly.

I have already explained that I know about floating point
representations and have implemented some floating point arithmetic in
assembler.

Now you are bringing in the programming. Certainly you CAN do
such. I am talking about what you can deduce from the fp-object
alone.

The point is, you CANNOT know that the number is inaccurate WITHOUT
having looked at the code, and if you look at what the feawrroubd
funcion does you will see that your ranges cannot be defined without
looking at the program either. Looking at just the following definition
double d;
You have know a-prori knowledge about whether it will contain an exact
value or an approximation.

Looking at this code
#include <stdio.h>
#include <math.h>

int main(void)
{
double d;
int i;

for (i=0,d=0.0; i<10; i++, d=nextafter(d,1000.0)) {
printf("%d %a\n",i,d);
}
}

You know that d contains exact values.

Replace nextafter(d,1000.0) with d+0.1 and d contains approximations.

CBFalconer · May 18, 2009

Flash said:
CBFalconer said:

Keith said:

Keith Thompson wrote:
... snip ...
Let's try a simple concrete question. Given the declaration
double x = 1.0;
what *exactly* is the range of real numbers represented by the
stored value of x? Assume a typical FP system with FLT_RADIX==2.
If some aspects of the range are implementation-defined, please
say so.

The upper point (which I have been calling xmax) is exactly at
1.0+DBL_EPSILON. This is spelled out by the C standard. Everything
else is implementation defined. The lower point (xmin) is probably
at 1.0-DBL_EPSILON/2. Note that these two numbers are not within
the range for 1.0, but do delimit it.
[...]

Thank you for trying to define what this "range" is. You've still
got it wrong, as far as I can tell.

Here's a graph showing five consecutive FP numbers, each of which is
exactly representable as a double; we can use nextafter() to define
their relationships. (View this in a fixed-width font.)

***************
|----|----|----------|----------|
a b c d e
1.0

a is 1.0-DBL_EPSILON
b is 1.0-DBL_EPSILON/2
c is 1.0
d is 1.0+DBL_EPSILON
e is 1.0+DBL_EPSILON*2

You're saying that the range represented by y goes all the way to both
of its neighbors, covering the range marked by asterisks. Unless your
ranges substantially overlap with each other, this doesn't make much
sense.

Click to expand...

Forget a and e.

Click to expand...

They are relevant to the discussion.

You have calculated b and d via DBL_EPSILON.

Click to expand...

That is obvious.

These mark numbers that cannot be stored in the object c.

Click to expand...

<snip>

The C standard states that 1.0+DBL_EPSILON *can* be stored in an C
object of type double. Specifically it is the smallest value greater
than 1.0 that *can* be stored.

Since your repsonse was fundamentally wrong at this point there is no
point in going further.

I said the object c. Not a C object. THE object c, as defined by
Keith in his earlier message. I expect an apology.

user923005 · May 18, 2009

For the value 1/3 you can pass the integers 1 and 3 to a fp
division routine, and store whatever appears.

float storeratio(int num, int denom) {
return (float)num / denom;
}

You don't really need the function, but it clears up what is being
done.

Does what you did above seem somehow more clear to you than:

float ratio_approximation = 1./3.;

?
If so, in what way.

The fp-system received two integers, and converted them into
a representation of their ratio. Is there any argument about the
value of the ratio of 1 to 3?

It's not exactly representible in a float unless the numeric base is
3, if that is what you mean. A numeric base of three may sound silly,
but on a ternary computer it would be ideal.

P.S.
Elsethread, you are arguing with Dik Winter about floating point.
That's like arguing with Willaim Kahan about the same thing or arguing
with Donald Knuth about computer programming. If you're going to tell
the Beatles how to write a popular song, you had better be pretty darn
good or you'll end up with egg on your face in the long run.

IMO-YMMV

CBFalconer · May 18, 2009

Flash said:
CBFalconer wrote:
.... snip ...

Presumable you mean whatever nextafter(1.0,0) would produce.

so what is the range of the floating-point-value 1.0+DBL_EPSILON?

That is a value that, when stored in a fp-object, will cause that
object value to be the first one above 1.0. That object has the
value y, and the range y*(1-DBL_EPSILON) through
y*(1+DBL_EPSILON). Those expressions with an EPSILON in them are
not necessarily fp values - they are real numbers and are extreme
limits for values whose storage will generate the appropriate
fp-value.

CBFalconer · May 18, 2009

Flash said:
.... snip ...

For the third time, please READ the paper. Don't ASSUME you know
what it is talking about until AFTER you have read it.

As you said (incorrectly) about me, you are not paying attention.

Give me a link to the paper, and I will _try_ to get around to
reading it. I seem to spend ALL my spare time answering questions
here lately.

CBFalconer · May 18, 2009

Keith said:
.... snip ...

Of course there's an argument about it; that's what this entire
thread has been about!

Oh? I'm not talking about what storeratio returns. I am talking
about the exact ratio between 1 and 3. That is specified in "num /
denom". The approximations are what is done by the fp system.

If I call storeratio(1, 3), I don't get the real value one-third. I
get a value of type float, a value that corresponds to a real number
close to the real number one-third. Your function performs a
floating-point division, not a mathematical real division.

But I am talking about the difference between the input and output
of storeratio. Of course they are different. Both values exist.
They are NOT the same. One is a real value. One is a
fp-object-value.

.... snip ...

In the floats, again, the real value one-third can't be represented,
so 1.0/3.0 yields a float value that's close to one-third, typically
something like 0.3333333432674407958984375.

No disagreement there.

I think we are agreeing but refusing to use the same languages.

To me, all those real values exist. Most cannot be handled by
the fp system.

CBFalconer · May 18, 2009

user923005 said:
.... snip ...

Does what you did above seem somehow more clear to you than:

float ratio_approximation = 1./3.;

? If so, in what way.

The function has a defined input and output. The input defines the
real value 1/3. The output is the fp-value representing that.

It's not exactly representible in a float unless the numeric base
is 3, if that is what you mean. A numeric base of three may sound
silly, but on a ternary computer it would be ideal.

But that has to do with the fpsystem, not with the numbers. Yes,
it is different. The difference is what this whole argument is
about.

P.S.
Elsethread, you are arguing with Dik Winter about floating point.
That's like arguing with Willaim Kahan about the same thing or
arguing with Donald Knuth about computer programming. If you're
going to tell the Beatles how to write a popular song, you had
better be pretty darn good or you'll end up with egg on your face
in the long run.

But I am the best!! The only problem is persuading everyone else
to agree with me. I can't use Saddams methodology.

CBFalconer · May 18, 2009

Keith said:
You keep saying "object" when you mean "value".

Am I being unclear again? The fp-object can have things stored in
it. When something is stored in it, it has an fp-object-value.
These are rigidly defined things, and any such fp-object-value can
represent a whole 'range' of real values. Knowing the
fp-object-value, we can calculate the range. We can also calculate
the adjacent fp-object-value, by the process of calculating a
number that will barely not fit into the 'range' of x, the original
fp-object. We have to store that number in another fp-object, say
y, and then examine y to see what its fp-object-value is. When we
have done all that we have duplicated nextafter.

CBFalconer · May 18, 2009

Flash said:
OK. In that case your range model can be changed at executiontime.
Specifically by a call to fesetround(). so the only way to find
out what your range is for a given variable at a given point in
the program is to examine the code to find out what the last call
to fesetround was before it was assigned.

I'm not sure whether or not that affects the results, but I do
maintain that once you alter the rounding methods you have altered
the fp system.

....

CBFalconer · May 19, 2009

Ben said:
.... snip ...

No I have not. If ymin < xmax what happens to those reals between
them? x and y are consecutive floats, and xmax is the smallest real
that gets stored as y so none of the numbers in this gap get stored as
y. Similarly ymin is largest real that gets stored as x so numbers
between ymin and xman don't get stored as x, so what happens if we
feed such a number to the fp-system? Are these numbers not
representable? Not in the "range" of any float? (From below the
answer seems to be "yes".)

Lets see if I can leap over the confusion. Consider a system with
a 4 bit significand, and an 8 bit exponent. The exponent uses the
value 128 to signify times 2 to the 0th power. We suppress the
msbit in the significand and replace it with a sign bit. The
significand can hold 0 through 15. Thus:

Exponent Significand Means
128 0 = 0x0 1.0
128 1 = 0x1 1.0 + 1/8
128 2 = 0x2 1.0 + 1/4
128 4 = 0x4 1.0 + 1/2
128 7 = 0x7 1.0 + 7/8
128 8 = 0x8 -1.0 /* the sign bit appeared */
128 9 = 0x9 -1.0 - 1/8
128 10 = 0xa -1.0 - 1/4
128 12 = 0xc -1.0 - 1/2
128 15 = 0xf -1.0 - 7/8

if we raise the exponent by 1, we double the value in Means. If we
lower it by one, we halve the values in Means. I hope we are
agreed so far.

Now, what is the EPSILON involved here. Obviously if we add 1/8 to
1.0, we get the next value. But that doesn't consider the rounding
done by the hardware. We only need to add 1/16 to get that
effect. What is the value 1/16 in that system?

127 0 1/2
126 0 1/4
125 0 1/8
124 0 1/16 /* Aha */

1.0 + 1/16 will round up to 1.0 + 1/8. /* assume usual rounding */

What does this result look like? See above. Only one least
significant bit is changed. So we have found EPSILON to be 1/16,
and the result from nextafter would be 1.0 + 1/8.

It's getting late, and I am tired, and I haven't gone through the
whole mess yet. Bah.

Keith Thompson · May 19, 2009

CBFalconer said:
The function has a defined input and output. The input defines the
real value 1/3. The output is the fp-value representing that.

No, the inputs are the int value 1 and the int value 3.

[...]

Keith Thompson · May 19, 2009

CBFalconer said:
Flash said:

CBFalconer said:

Keith Thompson wrote:
Keith Thompson wrote:
... snip ...
Let's try a simple concrete question. Given the declaration
double x = 1.0;
what *exactly* is the range of real numbers represented by the
stored value of x? Assume a typical FP system with FLT_RADIX==2.
If some aspects of the range are implementation-defined, please
say so.

The upper point (which I have been calling xmax) is exactly at
1.0+DBL_EPSILON. This is spelled out by the C standard. Everything
else is implementation defined. The lower point (xmin) is probably
at 1.0-DBL_EPSILON/2. Note that these two numbers are not within
the range for 1.0, but do delimit it.
[...]

Thank you for trying to define what this "range" is. You've still
got it wrong, as far as I can tell.

Here's a graph showing five consecutive FP numbers, each of which is
exactly representable as a double; we can use nextafter() to define
their relationships. (View this in a fixed-width font.)

***************
|----|----|----------|----------|
a b c d e
1.0

a is 1.0-DBL_EPSILON
b is 1.0-DBL_EPSILON/2
c is 1.0
d is 1.0+DBL_EPSILON
e is 1.0+DBL_EPSILON*2

You're saying that the range represented by y goes all the way to both
of its neighbors, covering the range marked by asterisks. Unless your
ranges substantially overlap with each other, this doesn't make much
sense.

Forget a and e.

Click to expand...

They are relevant to the discussion.

You have calculated b and d via DBL_EPSILON.

Click to expand...

That is obvious.

These mark numbers that cannot be stored in the object c.

Click to expand...

<snip>

The C standard states that 1.0+DBL_EPSILON *can* be stored in an C
object of type double. Specifically it is the smallest value greater
than 1.0 that *can* be stored.

Since your repsonse was fundamentally wrong at this point there is no
point in going further.

Click to expand...

I said the object c. Not a C object. THE object c, as defined by
Keith in his earlier message. I expect an apology.

I don't believe that Flash confused C (the language name) and c (the
name I applied to the value 1.0), though he can confirm that himself.
If c were an object, it would be a C object, and it would be possible
to store the value 1.0+DBL_EPSILON in it.

But in fact, c is merely a label I applied to a particular number,
*not* an actual object. And it's logically impossible, not just
mathematically impossible, to store a value in a number.

You seem to be having trouble keeping the distinction between objects
and values straight.

Keith Thompson · May 19, 2009

CBFalconer said:
Flash Gordon wrote: [...]

so what is the range of the floating-point-value 1.0+DBL_EPSILON?

Click to expand...

That is a value that, when stored in a fp-object, will cause that
object value to be the first one above 1.0. That object has the
value y, and the range y*(1-DBL_EPSILON) through
y*(1+DBL_EPSILON). Those expressions with an EPSILON in them are
not necessarily fp values - they are real numbers and are extreme
limits for values whose storage will generate the appropriate
fp-value.

If I understand you correctly, the range for the FP value 1.0 (x)
extends from 1.0 up to (but perhaps not including) 1.0+DBL_EPSILON.
And the range for the FP value 1.0+DBL_EPSILON (y) extends from
1.0+DBL_EPSILON down to (but perhaps not including)
1.0+DBL_EPSILON*(1.0-DBL_EPSILON). This means that most of the
numbers between 1.0 and 1.0+DBL_EPSILON are in both ranges. It also
means that there's a very small range of numbers just above 1.0 that
are within the range of x, but not within the range of y.

Is that really what you meant?

You're insisting in extending the meaning of DBL_EPSILON far beyond
what the standard says about it, in a manner that yields inconsistent
results.

Keith Thompson · May 19, 2009

Richard Heathfield said:
*NO* real values can be handled by the floating-point system except
by the most laughable of coincidences.

Not true. 1.0 is one of many counterexamples.

Keith Thompson · May 19, 2009

CBFalconer said:
Oh? I'm not talking about what storeratio returns. I am talking
about the exact ratio between 1 and 3. That is specified in "num /
denom". The approximations are what is done by the fp system.

I presume that what you mean by "the exact ratio between 1 and 3" is
the real number one-third.

That real number is *not* specified by "num / denom". "num / denom"
is a C expression, not a real expression. We can talk about the
result of dividing
the real number corresponding to the value of num
by
the real number corresponding to the value of denom.
But we can't talk about it in C, because C has no direct way of
expressing that concept.

The "/" in your storeratio function does not specify real division; it
specifies floating-point division, which is a very different thing.

[...]

No disagreement there.

I think we are agreeing but refusing to use the same languages.
To me, all those real values exist. Most cannot be handled by
the fp system.

All those real values exist in mathematics. They do not exist in C.
If this were sci.math, we might be having a very different
conversation.

No, we are not agreeing.

Ike Naar · May 19, 2009

Lets see if I can leap over the confusion. Consider a system with
a 4 bit significand, and an 8 bit exponent. The exponent uses the
value 128 to signify times 2 to the 0th power. We suppress the
msbit in the significand and replace it with a sign bit. The
significand can hold 0 through 15. Thus:

Exponent Significand Means
128 0 = 0x0 1.0
128 1 = 0x1 1.0 + 1/8
128 2 = 0x2 1.0 + 1/4
128 4 = 0x4 1.0 + 1/2
128 7 = 0x7 1.0 + 7/8
128 8 = 0x8 -1.0 /* the sign bit appeared */
128 9 = 0x9 -1.0 - 1/8
128 10 = 0xa -1.0 - 1/4
128 12 = 0xc -1.0 - 1/2
128 15 = 0xf -1.0 - 7/8

if we raise the exponent by 1, we double the value in Means. If we
lower it by one, we halve the values in Means. I hope we are
agreed so far.

Now, what is the EPSILON involved here. Obviously if we add 1/8 to
1.0, we get the next value. But that doesn't consider the rounding
done by the hardware. We only need to add 1/16 to get that
effect. What is the value 1/16 in that system?

127 0 1/2
126 0 1/4
125 0 1/8
124 0 1/16 /* Aha */

1.0 + 1/16 will round up to 1.0 + 1/8. /* assume usual rounding */

What does this result look like? See above. Only one least
significant bit is changed. So we have found EPSILON to be 1/16,
and the result from nextafter would be 1.0 + 1/8.

That definition of EPSILON is not in line with the definition in
the C standard. You yourself even mentioned the standard definition
elsethread:

CBFalconer said:
All we are told about EPSILON is:
-- the difference between 1 and the least value greater
than 1 that is representable in the given floating
point type, b1-p

The least value >1 that is representable in your floatingpoint system
is 1 + 1/8 (1.125); it's the number with exponent=128 and significand=1.
Subtracting 1 from that value yields EPSILON = 1/8 (0.125) .

It would be less confusing if you stick to the standard terminology,
and use a name other than "EPSILON" for the value 1 + 1/16 .

Keith Thompson · May 19, 2009

CBFalconer said:
Am I being unclear again? The fp-object can have things stored in
it. When something is stored in it, it has an fp-object-value.

Why do you use the phrase "fp-object-value" rather than just
"fp-value"?

A value in C can be the result of evaluating an expression; it needn't
be stored in or retrieved from an object. We don't need to introduce
FP objects in order to discuss FP values.

These are rigidly defined things, and any such fp-object-value can
represent a whole 'range' of real values. Knowing the
fp-object-value, we can calculate the range. We can also calculate
the adjacent fp-object-value, by the process of calculating a
number that will barely not fit into the 'range' of x, the original
fp-object. We have to store that number in another fp-object, say
y, and then examine y to see what its fp-object-value is. When we
have done all that we have duplicated nextafter.

Are you saying that, if you have an FP value, and you store that value
in an FP object and then retrieve the value of that object, the result
can be something different from the FP value you started with?

Keith Thompson · May 19, 2009

Richard Heathfield said:
Flash Gordon said: [...]

The C standard states that 1.0+DBL_EPSILON *can* be stored in an C
object of type double. Specifically it is the smallest value
greater than 1.0 that *can* be stored.

Click to expand...

Yes, but AIUI Chuck's point (such as it is) is that the REAL NUMBER
that would be equivalent to 1.0 - DBL_EPSILON/2 if only such a
thing were possible, cannot be stored in c. He's right, since you
can't actually store /any/ real number in /any/ floating-point type
except by the astounding coincidence that it happens to be
precisely equivalent in value to a floating-point value on the
system in question.

I don't know whether that's Chuck's point or not, but I think you're
mistaken on this point. Assuming FLT_RADIX==2, the real number that's
equivalent to 1.0 - DBL_EPSILON/2 *can* be stored in an object of type
double. It happens to be one of the finitely many real values that
can be represented exactly in type double.

(c is not a floating-point object, so you can't store anything in it,
but that's a separate point.)

[...]

His entire argument is fundamentally wrong, resting as it does on a
confusion between mathematics and floating-point hardware.

Agreed, but there is a defined relationship between them. The
standard defines a model whereby any floating-point number has a real
value; see C99 5.2.4.2.2p1-2. In effect, the floating-point numbers
of a given type are a finite subset of the real numbers.

Need Helping adding Square root code to an existing calculator. (Absolute begginer?)	0	Jan 12, 2025
How to alter the program so that when user types z or Z or 0, the program sets both a and b to zero?	0	Oct 10, 2022
Where is my mistake? Why is s equal to minus infinity at some loop iterations?	0	Oct 9, 2022
Comparison of Integer and Pointer (that's supposed to be an Integer). Where did I go wrong?	0	Nov 19, 2022
Structures and chained lists questions :	1	Feb 12, 2011
Rich Text Format (RTF) Document Builder in C++: Code and Features	0	Sep 28, 2025
Runtime Error with __gcd? (floating point exception)	1	Nov 27, 2024
Secure Keyboard v2.0 Modern C++ Virtual Keyboard for Windows (Glassmorphism UI, Clipboard Auto-Clear)	0	Mar 26, 2026

Float comparison

Ben Bacarisse

Flash Gordon

Flash Gordon

CBFalconer

user923005

CBFalconer

CBFalconer

CBFalconer

CBFalconer

CBFalconer

CBFalconer

CBFalconer

Keith Thompson

Keith Thompson

Keith Thompson

Keith Thompson

Keith Thompson

Ike Naar

Keith Thompson

Keith Thompson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads