Float comparison

CBFalconer · May 16, 2009

Flash said:
CBFalconer wrote:
.... snip ...

ymin and xmax a *real* numbers, so nextafter has absolutely
nothing to do with them. nextafter(x)==y, but this says nothing
about how you are defining your limits or about the relative
values of xmax and ymin.

Look back where Keith Thompson introduced nextafter(), and its use
in deducing the EPSILON in effect for any x value.

CBFalconer · May 16, 2009

Simultaneously y is specified by xmax, and x is specified by ymin.

So what represents (xmax+ymin)/2 ? remember, xmax and ymin are
real numbers, so if they are different this is a third unique value.

Remember that xmax defines a fp-object-value for y, while ymin does
the same for x. There is no equality specified. They are computed
from x and y fp-object=values respectively. Remember that xmax is
computed so that any conversion of xmax to a float will result in
the value y.

We are working with reals, not integers. So infinities rear their
ugly heads all over. We don't normally have to worry about them,
nor about the various EPSILONs etc. unless we are worrying about
the precise range of real values represented by an fp-object.
Dealing with all these as numbers makes things more confusing.
Using an exact representation of the fp-object in terms of
exponent, significand, etc. is much clearer.

....

CBFalconer · May 16, 2009

Keith said:
Right.

So you agree that it isn't stored, and that it cannot be stored.
But it was stored. How is that not a contradiction?

No, the fp-object didn't change the value. The C expression 1.0/3.0
produced a floating-point number. That floating-point number is
close, but not equal, to the real value one-third. The real value
one-third *never existed in the program*. The rational number that
was computed by the division operator is *the only thing* that is
stored in x.

Now you are injecting the programming. So we wrote 1.0/3.0 up
there. Maybe we wrote 1:3 and meant a rational object consisting
of two integers. The 1.0/3.0 was just a convenient way of
specifying the value. Rational values with sufficiently large
integers can represent any real value to any precision we desire.
The question is what happens when that value is stuffed into a
float.

CBFalconer · May 16, 2009

Keith said:
.... snip ...

You are trying to redefine EPSILON.

Above, I asked you a specific question:

When you say EPSILON, do you mean DBL_EPSILON, or do you mean the
difference between two arbitrary consecutive double values?

So which is it? Or are you suggesting that the value of DBL_EPSILON
changes depending on what floating-point number you're looking at?

The EPSILONS defined in float.h are those discovered by
nextafter(1.0, 2.0). Nothing more. The extensions are the result
of knowledge of how fp systems are implemented, and I have
generally said that they are system dependent.

I know that the value of nextafter(x, x + 1.0) will vary depending on
the value of x. But that's not what DBL_EPSILON means, and if you're
using EPSILON as an abbreviation of DBL_EPSILON (which is what you've
indicated in the past), then it's not what EPSILON means either.

But nextafter is simply a method of exposing the hex (or binary)
construction of fp objects, and making sense to people who have no
idea how the fp object is constructed. I would much rather deal
with the hex and add (or subtract) 1 from the portion representing
the significand. Then you have to renormalize, watch out for range
chanes, etc.

....

Ben Bacarisse · May 16, 2009

CBFalconer said:
Simultaneously y is specified by xmax, and x is specified by ymin.

Remember that xmax defines a fp-object-value for y, while ymin does
the same for x. There is no equality specified. They are computed
from x and y fp-object=values respectively. Remember that xmax is
computed so that any conversion of xmax to a float will result in
the value y.

We are working with reals, not integers. So infinities rear their
ugly heads all over. We don't normally have to worry about them,
nor about the various EPSILONs etc. unless we are worrying about
the precise range of real values represented by an fp-object.
Dealing with all these as numbers makes things more confusing.

I'd prefer to see an example. Originally you talked about
x*(1-DBL_EPSILON) to x*(1+DBL_EPSILON) as the range represented by a
floating pint number x, but that can't be right and I think you've
backed off from this view.

What is the range represented by a float x? Is it defined in terms of
consecutive representable values? If so, what is the range
represented by zero (i.e. do you include subnormals)? Does -0.0
represent the same range? Do + and - infinity represent ranges (+ and
- infinity each have a "previous" representable float but not a
following one).

I, for one, would like to know what this range is in more precise
terms.

CBFalconer · May 16, 2009

Ben said:
.... snip ...

I'd prefer to see an example. Originally you talked about
x*(1-DBL_EPSILON) to x*(1+DBL_EPSILON) as the range represented by
a floating pint number x, but that can't be right and I think
you've backed off from this view.

What is the range represented by a float x? Is it defined in
terms of consecutive representable values? If so, what is the
range represented by zero (i.e. do you include subnormals)? Does
-0.0 represent the same range? Do + and - infinity represent
ranges (+ and - infinity each have a "previous" representable
float but not a following one).

For most normal implementations, the range is x*(1-EPSILON) to
x*(1+EPSILON), with special considerations when x is a power of 2.
Those values are what I call xmax and xmin. They are NOT included
in the range, and thus the range is actually:

xmax > number > xmin

as a condition where number is described by x IN THAT FP
IMPLEMENTATION. The function nextafter() allows the y (where xmax
is a member of the range of y) to be defined. This y is the next
fp implemented value to x, there are none in between. Note that
xmax, number, xmin above are all real values. The magic thing
about xmax is that it CANNOT be specified by x, yet it CAN be
specified by y.

About 60 years ago I had a professor who hammered at all us
'students' with these concepts involving numbers, reals, integers,
limits, etc. and insisted we learn methods that handled them all.
I have forgotten a good deal of it. He was better at it than I am.

And no, I am not including subnormals, NaNs, INFs, etc. Zero is a
unique thing in floating implementations, necessary because
multiplication (and division) by zero needs to be recognized. We
can't just use the smallest representable normalized real.

Keith Thompson · May 16, 2009

I see you didn't answer that. You agree that the real value isn't
stored, and cannot be stored, then you immediately say that it *was*
stored. I cannot think of any reasonable interpretation in which that
makes any sense.

Now you are injecting the programming. So we wrote 1.0/3.0 up
there. Maybe we wrote 1:3 and meant a rational object consisting
of two integers.

No, because C has no such syntax. Remember, this is comp.lang.c;
we're talking about C floating-point numbers.

The 1.0/3.0 was just a convenient way of
specifying the value.

No, 1.0/3.0 is a C expression. It yields a floating-point value
that's close to the real value one-third, but cannot be equal to it
(unless FLT_RADIX is a multiple of 3).

Rational values with sufficiently large
integers can represent any real value to any precision we desire.

How is that relevant?

The question is what happens when that value is stuffed into a
float.

No such value was ever stuffed into a float. We're talking about
the behavior of the C code:

double x = 1.0/3.0;

The value stored is a floating-point value. On typical systems, this
is a rational number whose denominator is a power of 2. It is not
one-third.

Keith Thompson · May 16, 2009

CBFalconer said:
The EPSILONS defined in float.h are those discovered by
nextafter(1.0, 2.0). Nothing more. The extensions are the result
of knowledge of how fp systems are implemented, and I have
generally said that they are system dependent.

I note a complete failure to answer my question, which I thought was
reasonably straightforward.

One more time, what exactly do you mean by EPSILON?

But nextafter is simply a method of exposing the hex (or binary)
construction of fp objects, and making sense to people who have no
idea how the fp object is constructed.

No, nexafter is a function that returns an FP number adjacent to a
given one, where "adjacent" means that it's the next representable
value in the specified direction. (I'm not sure why the direction is
specified via a second FP number rather than more directly; perhaps
the function as specified is more useful for certain calculations.)

I would much rather deal
with the hex and add (or subtract) 1 from the portion representing
the significand. Then you have to renormalize, watch out for range
chanes, etc.

Deal with it however you like, as long as you do so coherently.

Keith Thompson · May 16, 2009

CBFalconer said:
You haven't been paying attention. There is adequate support.

I've been paying an extraordinary amount of attention, and you have
yet to demonstrate this. You haven't even presented a consistent
discription of your model.

For a while I was hoping that this discussion might yield some useful
insights, if not actual agreement, but that hope is rapidly fading.

Let's try a simple concrete question. Given the declaration
double x = 1.0;
what *exactly* is the range of real numbers represented by the stored
value of x? Assume a typical FP system with FLT_RADIX==2. If some
aspects of the range are implementation-defined, please say so.

CBFalconer · May 16, 2009

Keith said:
I see you didn't answer that. You agree that the real value isn't
stored, and cannot be stored, then you immediately say that it
*was* stored. I cannot think of any reasonable interpretation in
which that makes any sense.

But I did. I don't care about the 1.0/3.0 expression. I care
about the fact that the system attempted to store the real number
1/3 in the fp object. I am not specifying where it came from.

No, because C has no such syntax. Remember, this is comp.lang.c;
we're talking about C floating-point numbers.

It doesn't need such a syntax. I can easily write a system that
processes rationals, which are a combination of two integers. I
can pick various ways of transmitting and storing such rationals.
All will have some sort of limitations, but the accurate
representation of 1/3 is not one of them.

.... snip ...

No such value was ever stuffed into a float. We're talking about
the behavior of the C code:

double x = 1.0/3.0;

The value stored is a floating-point value. On typical systems, this
is a rational number whose denominator is a power of 2. It is not
one-third.

No. We are talking about stuffing the value 1/3 into a float. Not
a C expression, but a real number. A rational number, if you like.

Keith Thompson · May 16, 2009

CBFalconer said:
Ben Bacarisse wrote: [...]

What is the range represented by a float x? Is it defined in
terms of consecutive representable values? If so, what is the
range represented by zero (i.e. do you include subnormals)? Does
-0.0 represent the same range? Do + and - infinity represent
ranges (+ and - infinity each have a "previous" representable
float but not a following one).

Click to expand...

For most normal implementations, the range is x*(1-EPSILON) to
x*(1+EPSILON), with special considerations when x is a power of 2.

So you're assuming that FLT_RADIX==2.

May I presume that, for type double, EPSILON means DBL_EPSILON?

If the intent is that each real number within some range is
represented by exactly one FP number, then your formula is wrong.

Assuming a typical binary FP implementation, here's a table of
exactly representable numbers and the next representable number
after each one, in type double (the first column is x, the second
is nextafter(x, 10.0)).

1.0 1.0 + DBL_EPSILON
1.25 1.25 + DBL_EPSILON
1.5 1.5 + DBL_EPSILON
1.75 1.75 + DBL_EPSILON
2.0 2.0 + 2*DBL_EPSILON

Note that the difference between one FP number and the next is
constant over a substantial range.

Your formula, "x*(1-EPSILON) to x*(1+EPSILON)", seems to assume that
the difference is proportional to the value of x. It means that, for
example, the range represented by 1.75 in your model covers two other
numbers that are exactly representable, and extends considerably past
them.

I suggest you re-think your model.

[...]

And no, I am not including subnormals, NaNs, INFs, etc. Zero is a
unique thing in floating implementations, necessary because
multiplication (and division) by zero needs to be recognized. We
can't just use the smallest representable normalized real.

Yes, zero is special, but there's no reason it can't be
included in your model like any other FP number. It's entirely
possible to determine the representable numbers adjacent to 0.0
(nextafter(0.0, -1.0) and nextafter(0.0. 1.0)).

CBFalconer · May 17, 2009

Keith said:
.... snip ...

I note a complete failure to answer my question, which I thought was
reasonably straightforward.

One more time, what exactly do you mean by EPSILON?

EPSILON defines the minimum increment to x which requires a
different fp-object value. I.e. any smaller increment to x will be
ignored by the hardware. If it is used as "-EPSILON" it MAY be a
different value, however that only happens when x is a power of 2
in most implementations. C specifies it when x is 1.0.

.... snip ...

No, nexafter is a function that returns an FP number adjacent to a
given one, where "adjacent" means that it's the next representable
value in the specified direction. (I'm not sure why the direction is
specified via a second FP number rather than more directly; perhaps
the function as specified is more useful for certain calculations.)

Because of the difference in EPSILON I mentioned above. The use of
EPSILON or nextafter are simply two different ways of referring to
the same general phenomenom.

....

CBFalconer · May 17, 2009

Keith said:
.... snip ...

Let's try a simple concrete question. Given the declaration
double x = 1.0;
what *exactly* is the range of real numbers represented by the
stored value of x? Assume a typical FP system with FLT_RADIX==2.
If some aspects of the range are implementation-defined, please
say so.

The upper point (which I have been calling xmax) is exactly at
1.0+DBL_EPSILON. This is spelled out by the C standard. Everything
else is implementation defined. The lower point (xmin) is probably
at 1.0-DBL_EPSILON/2. Note that these two numbers are not within
the range for 1.0, but do delimit it.

If you form that value (1.0+DBL_EPSILON) and store it in a double,
you should get the same result as from nextafter(1.0, 2.0). Watch
out for the tendency of some systems to use added precision, so you
have to be sure the value has been stored and recovered.

(The 'attention' message was directed at Flash, not you).

CBFalconer · May 17, 2009

Keith said:
CBFalconer said:

Ben Bacarisse wrote: [...]

What is the range represented by a float x? Is it defined in
terms of consecutive representable values? If so, what is the
range represented by zero (i.e. do you include subnormals)? Does
-0.0 represent the same range? Do + and - infinity represent
ranges (+ and - infinity each have a "previous" representable
float but not a following one).

Click to expand...

For most normal implementations, the range is x*(1-EPSILON) to
x*(1+EPSILON), with special considerations when x is a power of 2.

Click to expand...

So you're assuming that FLT_RADIX==2.

May I presume that, for type double, EPSILON means DBL_EPSILON?
Yes.

If the intent is that each real number within some range is
represented by exactly one FP number, then your formula is wrong.

Assuming a typical binary FP implementation, here's a table of
exactly representable numbers and the next representable number
after each one, in type double (the first column is x, the second
is nextafter(x, 10.0)).

1.0 1.0 + DBL_EPSILON
1.25 1.25 + DBL_EPSILON
1.5 1.5 + DBL_EPSILON
1.75 1.75 + DBL_EPSILON
2.0 2.0 + 2*DBL_EPSILON

Note that the difference between one FP number and the next is
constant over a substantial range.

Your formula, "x*(1-EPSILON) to x*(1+EPSILON)", seems to assume that
the difference is proportional to the value of x. It means that, for
example, the range represented by 1.75 in your model covers two other
numbers that are exactly representable, and extends considerably past
them.

You haven't allowed for the fact that DBL_EPSILON represents the
least significant bit in the significand, and that the exponent
doesn't change for values from 1.0 to <2.0. The net effect of the
multiplication by x, storing in a double, and extracting is to
eliminate the portion of the EPSILON above DBL_EPSILON. The
multiplication is a handy way of forming the right EPSILON for any
x value.

Try dumping the output of nextafter (and the x value) above in hex
format. I think you will immediately see what I am talking about.

Keith Thompson · May 17, 2009

CBFalconer said:
Keith Thompson wrote: [...]

No such value was ever stuffed into a float. We're talking about
the behavior of the C code:

double x = 1.0/3.0;

The value stored is a floating-point value. On typical systems, this
is a rational number whose denominator is a power of 2. It is not
one-third.

Click to expand...

No. We are talking about stuffing the value 1/3 into a float. Not
a C expression, but a real number. A rational number, if you like.

No, that's not what I'm talking about. You *can't* stuff the value
1/3 into a float (unless FLT_RADIX is a multiple of 3).

So let's talk about the behavior of

double x = 1.0/3.0;

Do you believe that the real value one-third occurs in a C program
that contains that declaration? (Hint: the correct answer is No.)

Keith Thompson · May 17, 2009

CBFalconer said:
EPSILON defines the minimum increment to x which requires a
different fp-object value. I.e. any smaller increment to x will be
ignored by the hardware. If it is used as "-EPSILON" it MAY be a
different value, however that only happens when x is a power of 2
in most implementations. C specifies it when x is 1.0.

You have explicitly stated that EPSILON is merely an abbreviation for
DBL_EPSILON. Now you're saying it's something else, something much
more general. And I think you're saying that -EPSILON isn't
necessarily the negative of EPSILON.

I think I have an idea what you're trying to say, but you need to say
it coherently. I can't have this discussion with you if you won't
make some effort to communicate consistently.

Please stop using the term EPSILON unless you're willing to define it
and stick to that definition. Use nextafter() if you like.

[snip]

Keith Thompson · May 17, 2009

CBFalconer said:
Keith Thompson wrote:
... snip ...

The upper point (which I have been calling xmax) is exactly at
1.0+DBL_EPSILON. This is spelled out by the C standard. Everything
else is implementation defined. The lower point (xmin) is probably
at 1.0-DBL_EPSILON/2. Note that these two numbers are not within
the range for 1.0, but do delimit it.

[...]

Thank you for trying to define what this "range" is. You've still got
it wrong, as far as I can tell.

Here's a graph showing five consecutive FP numbers, each of which is
exactly representable as a double; we can use nextafter() to define
their relationships. (View this in a fixed-width font.)

***************
|----|----|----------|----------|
a b c d e
1.0

a is 1.0-DBL_EPSILON
b is 1.0-DBL_EPSILON/2
c is 1.0
d is 1.0+DBL_EPSILON
e is 1.0+DBL_EPSILON*2

You're saying that the range represented by y goes all the way to both
of its neighbors, covering the range marked by asterisks. Unless your
ranges substantially overlap with each other, this doesn't make much
sense.

I've assumed that the intent of your model is that each real number is
within the range of exactly one FP number. I would have thought that
the range represented by 1.0 would extend only halfway to each of its
neighbors. Why would c's range be so wide while b's and d's ranges
are not?

Or do the ranges actually overlap? If I have a real number that's
halfway between c and d, is it within both ranges?

Keith Thompson · May 17, 2009

CBFalconer said:
Keith said:

CBFalconer said:

Ben Bacarisse wrote: [...]
What is the range represented by a float x? Is it defined in
terms of consecutive representable values? If so, what is the
range represented by zero (i.e. do you include subnormals)? Does
-0.0 represent the same range? Do + and - infinity represent
ranges (+ and - infinity each have a "previous" representable
float but not a following one).

For most normal implementations, the range is x*(1-EPSILON) to
x*(1+EPSILON), with special considerations when x is a power of 2.

Click to expand...

So you're assuming that FLT_RADIX==2.

May I presume that, for type double, EPSILON means DBL_EPSILON?
Yes.

If the intent is that each real number within some range is
represented by exactly one FP number, then your formula is wrong.

Assuming a typical binary FP implementation, here's a table of
exactly representable numbers and the next representable number
after each one, in type double (the first column is x, the second
is nextafter(x, 10.0)).

1.0 1.0 + DBL_EPSILON
1.25 1.25 + DBL_EPSILON
1.5 1.5 + DBL_EPSILON
1.75 1.75 + DBL_EPSILON
2.0 2.0 + 2*DBL_EPSILON

Note that the difference between one FP number and the next is
constant over a substantial range.

Your formula, "x*(1-EPSILON) to x*(1+EPSILON)", seems to assume that
the difference is proportional to the value of x. It means that, for
example, the range represented by 1.75 in your model covers two other
numbers that are exactly representable, and extends considerably past
them.

Click to expand...

You haven't allowed for the fact that DBL_EPSILON represents the
least significant bit in the significand, and that the exponent
doesn't change for values from 1.0 to <2.0.

I certainly have allowed for that.

The net effect of the
multiplication by x, storing in a double, and extracting is to
eliminate the portion of the EPSILON above DBL_EPSILON.

What does that mean? EPSILON *is* DBL_EPSILON; you said so yourself.
There is no "portion of the EPSILON above DBL_EPSILON".

The
multiplication is a handy way of forming the right EPSILON for any
x value.

It seems like you're saying that the expression x*(1+EPSILON) is to be
considered a C floating-point expression, not a real number. But the
whole point is to define the real range represented by a given
floating-point number, which includes numbers that cannot be exactly
represented as FP numbers.

[snip]

In all this time, you have yet to present a coherent definition for
these ranges of yours.

Flash Gordon · May 17, 2009

CBFalconer said:
You haven't been paying attention. There is adequate support.

Like Keith, I *have* been paying attention. You have yet to point at
anything in the standard that really supports your claim. The reasons
why the few points in the standard you have pointed at do not support
your claim have been explained (but generally ignored by you). Several
sections in the standard explicitly contradict what you say and have
been pointed out to you with explanations of *why* they explicitly
contradict your claim. You seem to be the one not paying attention.

Flash Gordon · May 17, 2009

CBFalconer said:
Simultaneously y is specified by xmax, and x is specified by ymin.

Remember that *I* am the one who introduced x, y, xmin, ymin, xmax and
ymax and specified EXACTLY what they are. So here, again, is what they
are, and if you want to start talking about something else then please
actually explicitly STATE that you are ignoring the question being asked
ans talking about something else instead.

----------
Given two numbers x and y whose values are "touching ranges" with y
being the next greater range than x then in your model we have:

x represents xmin to xmax
y represents ymin to ymax
x < y
xmax equal to ymin (there can be no intervening real numbers since if
there were any there would be an infinite number)
-----------

Now, what I meant was that x and y are consecutive floating point
numbers, for example they could be doubles 1.0 and 1.0+DBL_EPSILON as I
was assuming that your ranges did not overlap. Another post be you now
suggests they do, which makes them of even less use for anything
serious, since then your model does not tell you what value will be stored.

Remember that xmax defines a fp-object-value for y, while ymin does
the same for x.

Well, you have definitely changed the terminology from what I introduced
when I asked the question.

There is no equality specified. They are computed
from x and y fp-object=values respectively. Remember that xmax is
computed so that any conversion of xmax to a float will result in
the value y.

That is EXPLICITLY not the terminology I used to ask my question, since
xmax was related to x and ymin to y. Again, if you are going to ignore
the terminology someone uses to define there question you are not
answering the question.

We are working with reals, not integers.

Again, why are you trying to tell me what question I was asking rather
than trying to answer the question.

So infinities rear their
ugly heads all over. We don't normally have to worry about them,
nor about the various EPSILONs etc. unless we are worrying about
the precise range of real values represented by an fp-object.

i was ASKING about what your ranges are! I was TRYING to get an accurate
definition of them!

Dealing with all these as numbers makes things more confusing.
Using an exact representation of the fp-object in terms of
exponent, significand, etc. is much clearer.

OK, if that is all you can cope with, then I will ask my question given
EXPLICIT EXACT VALUES. First I need values you will accept. So..

Here are two lines of C code.

int main(void)
{
double valone = 1.0;
double valoneandabit = (valone, 100.0);
/* rest of program irrelevant to discussion */
return 0;
}

Now, I know you have access to a C compiler. If on YOUR SPECIFIC
compiler you were to compile and run the above, what are the EXACT
numerical ranges for valone and valoneandabit. I.e. I am asking for a
concrete numerical answer.

Note that 1.0-EPSILON is NOT and exact numerical value.
0.91836678766234234786324762347653245897924375348795324 is, but it is
not the number I want obviously. So give me exact numbers you are happy
with for us to then discuss.

Need Helping adding Square root code to an existing calculator. (Absolute begginer?)	0	Jan 12, 2025
How to alter the program so that when user types z or Z or 0, the program sets both a and b to zero?	0	Oct 10, 2022
Where is my mistake? Why is s equal to minus infinity at some loop iterations?	0	Oct 9, 2022
Comparison of Integer and Pointer (that's supposed to be an Integer). Where did I go wrong?	0	Nov 19, 2022
Structures and chained lists questions :	1	Feb 12, 2011
Rich Text Format (RTF) Document Builder in C++: Code and Features	0	Sep 28, 2025
Runtime Error with __gcd? (floating point exception)	1	Nov 27, 2024
Secure Keyboard v2.0 Modern C++ Virtual Keyboard for Windows (Glassmorphism UI, Clipboard Auto-Clear)	0	Mar 26, 2026

Float comparison

CBFalconer

CBFalconer

CBFalconer

CBFalconer

Ben Bacarisse

CBFalconer

Keith Thompson

Keith Thompson

Keith Thompson

CBFalconer

Keith Thompson

CBFalconer

CBFalconer

CBFalconer

Keith Thompson

Keith Thompson

Keith Thompson

Keith Thompson

Flash Gordon

Flash Gordon

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads