Float comparison

Keith Thompson · May 14, 2009

CBFalconer said:
Keith said:

CBFalconer said:

Keith Thompson wrote:

... snip ...

If an int object i contains the value 42, it doesn't matter
whether it was generated via
int i = 42;
or
int i = 6 * 7;
or
int i = 429 / 10;
The meaning of the stored value 42 is independent of how it was
generated.

If a double object x contains the value 42.0, it doesn't matter
how it was generated. The meaning of 42.0 is independent of how
it was generated.

No, it isn't. To quote the standard again:

5.2.4.2.2:

... snip ...

[#10] The values given in the following list shall be
replaced by implementation-defined constant expressions with
(positive) values that are less than or equal to those
shown:

-- the difference between 1 and the least value greater
than 1 that is representable in the given floating
point type, b1-p

FLT_EPSILON 1E-5
DBL_EPSILON 1E-9
LDBL_EPSILON 1E-9

Click to expand...

Yes, yes, we all know what the *_EPSILON constants mean -- and
it's not what you seem to think they mean. FLT_EPSILON has a
very specific meaning, and you've just quoted it.
FLT_EPSILON == nextafter(1.0, 2.0) - 1.0
(See C99 7.12.11.1 for the nextafter() function.)

Click to expand...

No, you misunderstand the fundamentals. With the normal fp
implementation, the value 1+EPSILON will not be represented as
1.0. nextafter(1.0, 0.5) will be the first value less than 1.0 not
represented as 1.0. (I was not aware the nextafter function
existed - it makes explanations easier.) That 'range' that is
represented by 1.0 is the fundamental characteristic of the fp
value 1.0. A proper implementation of nextafter will handle the
funny results due to rounding policies, etc.

No, the fundamental characteristic of the floating-point value 1.0 is
the real value 1.0. I'm not missing your point; I just disagree with
it.

If the *_EPSILON constants are so fundamental, why does the standard
place so little emphasis on them?

But you miss the fact that all values greater than 1.0 and less
than 1.0+EPSILON (I am dropping the DBL - it is just a typing
nuisance) are _represented_ by the value 1.0. If you go through
the same process to find the first value less than 1.0 that has a
separate value, you again have a portion of the 'range', i.e. that
portion that is less than 1.0. No matter what you do, using that
fp system, you cannot represent any real value in that 'range' as
anything other than 1.0. The necessary consequence is that when
you read 1.0 you don't know what it represents, other than
something in the 'range'.

No, *you* miss the fact that values greater than 1.0 and less than
1.0+EPSILON *are not represented*. You cannot represent any real
value in the range (1.0, 1.0+EPSILON) *at all*.

Floating-point types are not continuous, they're discrete. The FP
value 1.0 is a close approximation of the real value 1.0+VERY_TINY,
but it isn't 1.0+VERY_TINY, it's just 1.0.

And we can, easily. Look at the hex representation of 1.0. Find
the significand portion. Increment that by one least significant
bit. You have just formed nextafter(1.0, 2.0).

Yes, but that doesn't support your claim about ranges, which is what I
asked.

[...]

There is continuous confusion here from confusing the value
represented by the fp object, and the number stored in the fp
object. They are probably different.

Click to expand...

They're *probably* different?

Click to expand...

Yes, because there are an infinite number of reals in the 'range',
and only one object value. Ignoring the programming that set the
value, why should you prefer one result to another?

Because only one value in that range is the value of the fp object.

No, you are imposing knowledge of the programming. I am looking
_solely_ at what is stored in the fp object. It can equally well
represent any value in the 'range'. Since there are an infinity of
such values, and only one so-called object value, the probability
that they are identical is extremely small.

No, I am not "imposing knowledge of the programming". I am looking
solely at what is stored in the fp object. I'm just seeing what's
really there, as defined by the C standard.

[...]

Keith Thompson · May 14, 2009

CBFalconer said:
However x represents >xmin to <xmax
and y represents >ymin to <ymax

eliminating the equal condition, and thus the annoying missing
reals.

The assumption is that the real numbers xmax and ymin are equal. So
you're saying that the real number xmax (a.k.a. ymin) cannot be
represented by any floating-point number. (Of course I agree, but I
don't really think that's what you meant.)

Keith Thompson · May 14, 2009

CBFalconer said:
Flash said:

I agree completely with what Keith has stated in this sub-thread.

Click to expand...

[snip]

I am now completely confused about what this particular argument is
about. I see nothing objectionable in the quoted portion
above.

You don't object either to your claim that the use of 'floating point
number' was confused, or to my statement that there was nothing
confused about it?

Now I'm confused.

Keith Thompson · May 14, 2009

CBFalconer said:
Ike Naar wrote: [...]

For me this whole "range" idea makes little sense. Suppose you
manage to provide a proper definition for the range, what can you
do with it that you can't do without?

Click to expand...

You can keep track of the accuracy of your answers, and when the fp
system is or is not adequate for the job.

You can do that anyway. Assume that each floating-point number
represents a single real number. It's still entirely possible to
reason about the accuracy of calculations and stored FP numbers (and
in fact people do so all the time).

I don't dispute that these ranges exist. I deny that a floating-point
number represents a range rather than a single value. And I honestly
don't understand how you can continue to claim otherwise after reading
5.2.4.2.2.

CBFalconer · May 14, 2009

Keith said:
.... snip ...

No, I am not "imposing knowledge of the programming". I am looking
solely at what is stored in the fp object. I'm just seeing what's
really there, as defined by the C standard.

Without infinite memory, you can't represent all real values
uniquely. I also am looking at the values as defined by the
standard. See the discussion on EPSILON and nextafter (which I am
glad you mentioned, it makes things easier).

CBFalconer · May 14, 2009

Keith said:
The assumption is that the real numbers xmax and ymin are equal. So
you're saying that the real number xmax (a.k.a. ymin) cannot be
represented by any floating-point number. (Of course I agree, but I
don't really think that's what you meant.)

No, they are not equal. x represents values less than xmax. y
represents values greater than ymin. Since x < y (original
definition) that implies xmax > ymin. Remember that y is used to
represent xmax, although xmax is calculated from x. Similarly x is
used to represent ymin, which is calculated from y.

Flash Gordon · May 15, 2009

CBFalconer said:
However x represents >xmin to <xmax
and y represents >ymin to <ymax

eliminating the equal condition, and thus the annoying missing
reals.

In which case there is according to you no floating point number that
represents the real values xmin or xmax. What is the point of your
ranges if they do not cover all real values?

Keith Thompson · May 15, 2009

CBFalconer said:
No, they are not equal. x represents values less than xmax. y
represents values greater than ymin. Since x < y (original
definition) that implies xmax > ymin. Remember that y is used to
represent xmax, although xmax is calculated from x. Similarly x is
used to represent ymin, which is calculated from y.

I've read that paragraph several times, and I just can't make any
sense of it.

The assumption is that x and y are consecutive floating-point numbers,
and that their ranges just touch each other at the midpoint.

If xmax != ymin (both are real number numbers, not FP numbers), which
is greater, and by how much do they differ?

Let's assume the oversimplified floating-point system I mentioned
earlier, where a floating-point number has exactly one decimal digit
before and after the decimal point. So nextafter(1.0, 2.0) == 1.1
(remember, 1.1 is exactly representable.) In this model, what exactly
are the ranges represented by the FP numbers 1.0 and by 1.1? Does the
real number 1.05 belong to the range of the FP number 1.0, or to the
range of the FP number 1.1, or to both, or to neither?

Does each real number in the range -DBL_MAX .. +DBL_MAX belong to the
range of exactly one floating-point number? I would have assumed that
the answer is yes, and that that's the whole point of your model.

Flash Gordon · May 15, 2009

CBFalconer said:
Keith Thompson wrote:
.... snip ...

I don't dispute that these ranges exist. I deny that a floating-
point number represents a range rather than a single value. And
I honestly don't understand how you can continue to claim
otherwise after reading 5.2.4.2.2.

Click to expand...

Here (again) is paragraph 10 from 5.2.4.2:

[#10] The values given in the following list shall be
replaced by implementation-defined constant expressions with
(positive) values that are less than or equal to those
shown:

-- the difference between 1 and the least value greater
than 1 that is representable in the given floating
point type, b1-p

FLT_EPSILON 1E-5
DBL_EPSILON 1E-9
LDBL_EPSILON 1E-9

OK, so that says that values between 1.0 and 1.0 + ???_EPSILON are not
representable, thus contradicting your claim that they are representable
because they are part of your range.

5.2.4.2.2 defines the translation to a real of the value specified
by the fp object.

No, it defines a floating point number. It explicitly does so because
"floating point number" is in italics, and that is the mechanism used in
the standard to indicate a definition. Since it is the definition of a
floating point number, anything which does not meet it (such as the real
value one third on common systems) is NOT a floating point number.

It does NOT define what that value represents.

Well, you can obviously use it to represent anything you want, but what
it represents beyond the value it has is entirely up to the programmer.

The above does.

Which above? If you mean the EPSILON values you are mistaken, they are
just values derived from the definition of a floating point number.

This is the multiple use of the word value for
different things.

You can hardly claim that Kieth and I are using the word to mean
multiple things when we are the ones explicitly saying it has only one
meaning, i.e. the single unique value which is actually stored. It is
YOU how are trying to give it two meaning.

Also note the specific mention of other values in paragraph 3.

[#3] Floating types may include values that are not
normalized floating-point numbers,
...

If you quote the rest of that sentence it provides absolutely NO support
for you claim. Since it goes on to state what it means, and it means
specific values which fit the model, infinities (which are not reals)
and NaNs (which are not reals). The sentence in full is:

"In addition to normalized ï¬‚oating-point numbers (f1>0 if xâ‰ 0), ï¬‚oating
types may be able to contain other kinds of ï¬‚oating-point numbers, such
as subnormal ï¬‚oating-point numbers (xâ‰ 0, e=emin, f1=0) and unnormalized
ï¬‚oating-point numbers (xâ‰ 0, e>emin, f1=0), and values that are not
ï¬‚oating-point numbers, such as inï¬nities and NaNs."

No where in that list does it include anything which allows the real
value of one third to be represented.

Flash Gordon · May 15, 2009

CBFalconer said:
Keith Thompson wrote:
.... snip ...

Without infinite memory, you can't represent all real values
uniquely.

So? No one has claimed you can.

I also am looking at the values as defined by the
standard. See the discussion on EPSILON and nextafter (which I am
glad you mentioned, it makes things easier).

Those do not support you position, they tell you all the values which
cannot be represented at all.

Keith Thompson · May 15, 2009

CBFalconer said:
Keith said:

... snip ...

I don't dispute that these ranges exist. I deny that a floating-
point number represents a range rather than a single value. And
I honestly don't understand how you can continue to claim
otherwise after reading 5.2.4.2.2.

Click to expand...

Here (again) is paragraph 10 from 5.2.4.2:

[#10] The values given in the following list shall be
replaced by implementation-defined constant expressions with
(positive) values that are less than or equal to those
shown:

-- the difference between 1 and the least value greater
than 1 that is representable in the given floating
point type, b1-p

FLT_EPSILON 1E-5
DBL_EPSILON 1E-9
LDBL_EPSILON 1E-9

Why do you keep quoting that? I know what it says, and it doesn't
support your claims. Do you see the word "range" in there somewhere?

5.2.4.2.2 defines the translation to a real of the value specified
by the fp object. It does NOT define what that value represents.
The above does. This is the multiple use of the word value for
different things.

The standard doesn't use the word "value" for different things. The
standard uses it for one thing, and you use it for another.

Paragraph 10 isn't about floating-point values in general. It's about
1.0, DBL_EPSILON, and 1.0+DBL_EPSILON (and the corresponding values
for the other types). If it's supposed to be a statement about
floating-point values in general, it's extraordinarly badly worded.

Also note the specific mention of other values in paragraph 3.

[#3] Floating types may include values that are not
normalized floating-point numbers,
...

Again, that's talking about denormalized numbers, NaNs, and
infinities, not about these ranges of yours. Note carefully the word
"may". A conforming implementation could have floating-point types
that don't have any of these things, so that paragraph needn't apply.
If your range model were valid, it would apply equally to such a
system. You can't reasonably use paragraph 3 to support your model.

Phil Carmody · May 15, 2009

Ben Bacarisse said:
See later...

It does. Using maths rather than C notation, if a + b = b we can
deduce that a + b + -b = b + -b i.e. that a = 0. This is what Phil
Carmody was saying above. Because addition is associative, and x + -x

Yes. Mathematical addition is associative. C FP addition is not
associative. This is another reason why FP addition isn't a group.
It practically fails every condition, in fact!

Ike Naar · May 15, 2009

If you look at things carefully you will see that the EPSILON
involved does not change for fp values from greater than 1.0 to
less than 2.0.

The EPSILON involved does not change for any fp values.
In one of your earlier posts, you introduced EPSILON as
a shorthand for FLT_EPSILON, DBL_EPSILON or LDBL_EPSILON.
For a given implementation, these are fixed constants.
They're defined in <float.h> .

crisgoogle · May 15, 2009

No, you are imposing knowledge of the programming. I am looking
_solely_ at what is stored in the fp object. It can equally well
represent any value in the 'range'. Since there are an infinity of
such values, and only one so-called object value, the probability
that they are identical is extremely small.

<more snip>

Everyone understands that for the vast majority of mathematical
results
that one "tries" to store in a float, that result is not exactly
representable.

You seem to believe that the float therefore represents all those
possible values.

By your reasoning, an unsigned int also represents an infinite number
of values, namely: x + n * (UINT_MAX + 1), where n is in Z, and x of
course is the single value, on [0, UINT_MAX], that most people would
say is stored in that unsigned int.

Is that your claim, that, without looking at the programming, unsigned
ints have an infinite number of values? If not, why not? How is this
situation different than that for floats?

CBFalconer · May 15, 2009

Flash said:
So? No one has claimed you can.

Those do not support you position, they tell you all the values
which cannot be represented at all.

Instead of yammering at each other with fixed positions, consider
this. My view of a 'range' works for everything your 'fixed value'
version does. It is just more detailed. Note that the reverse
does NOT apply.

CBFalconer · May 15, 2009

No, you are imposing knowledge of the programming. I am looking
_solely_ at what is stored in the fp object. It can equally well
represent any value in the 'range'. Since there are an infinity of
such values, and only one so-called object value, the probability
that they are identical is extremely small.

Click to expand...

<more snip>

Everyone understands that for the vast majority of mathematical
results
that one "tries" to store in a float, that result is not exactly
representable.

You seem to believe that the float therefore represents all those
possible values.

By your reasoning, an unsigned int also represents an infinite
number of values, namely: x + n * (UINT_MAX + 1), where n is in
Z, and x of course is the single value, on [0, UINT_MAX], that
most people would say is stored in that unsigned int.

Is that your claim, that, without looking at the programming,
unsigned ints have an infinite number of values? If not, why not?
How is this situation different than that for floats?

No, because the arithmetic system on unsigned ints is closed.
Apart from division by zero, you can't generate a value outside the
set 0 .. UINT_MAX. Those things are not integers. They follow
defined rules. They are not intended to represent integers.
Similarly floats are not reals, and also follow defined rules.
However we can always deposit a real in a float, and the question
is 'what is that real'.

CBFalconer · May 15, 2009

Keith said:
.... snip ...

I've read that paragraph several times, and I just can't make any
sense of it.

It did when I wrote it!

The assumption is that x and y are consecutive floating-point numbers,
and that their ranges just touch each other at the midpoint.

No, ranges don't 'touch at the midpoint'. The midpoint of the
range is usually the value of the fp object. For most systems the
exceptions come when the fp object value is an integral power of
two.

If xmax != ymin (both are real number numbers, not FP numbers), which
is greater, and by how much do they differ?

With x < y I think part of the confusion is that xmax is NOT
represented by anything in the x range, but by something in the y
range. Similarly ymin is represented by something in the x range,
not in the y range. This arises from the use of < and >, rather
than <= or >=.

As a result there is NO value larger than xmax and smaller than y
min. Yet xmax > ymin.

|----------y----------|---------x--------|-------z-----
xmax^ ^ymin zmax^ ^xmin

CBFalconer · May 15, 2009

Flash said:
CBFalconer wrote:
.... snip ...

In which case there is according to you no floating point number
that represents the real values xmin or xmax. What is the point
of your ranges if they do not cover all real values?

Look again. Remember y > x, and y is adjacent to x. xmax is
represented by y. ymin is represented by x.

CBFalconer · May 15, 2009

Keith said:
CBFalconer said:

Keith said:

... snip ...

I don't dispute that these ranges exist. I deny that a floating-
point number represents a range rather than a single value. And
I honestly don't understand how you can continue to claim
otherwise after reading 5.2.4.2.2.

Click to expand...

Here (again) is paragraph 10 from 5.2.4.2:

[#10] The values given in the following list shall be
replaced by implementation-defined constant expressions with
(positive) values that are less than or equal to those
shown:

-- the difference between 1 and the least value greater
than 1 that is representable in the given floating
point type, b1-p

FLT_EPSILON 1E-5
DBL_EPSILON 1E-9
LDBL_EPSILON 1E-9

Click to expand...

Why do you keep quoting that? I know what it says, and it doesn't
support your claims. Do you see the word "range" in there somewhere?

I have said that the 'range' is my contribution to the verbiage.
It is the range of real numbers represented by an fp objects value.

The standard doesn't use the word "value" for different things. The
standard uses it for one thing, and you use it for another.

That is an (understandable) omission by the standard. There are
two quantities of interest. One is the value of the fp-object.
The second is the real number that was stored therein.

Paragraph 10 isn't about floating-point values in general. It's about
1.0, DBL_EPSILON, and 1.0+DBL_EPSILON (and the corresponding values
for the other types). If it's supposed to be a statement about
floating-point values in general, it's extraordinarly badly worded.

Also note the specific mention of other values in paragraph 3.

[#3] Floating types may include values that are not
normalized floating-point numbers,
...

Click to expand...

Again, that's talking about denormalized numbers, NaNs, and
infinities, not about these ranges of yours. Note carefully the word
"may". A conforming implementation could have floating-point types
that don't have any of these things, so that paragraph needn't apply.
If your range model were valid, it would apply equally to such a
system. You can't reasonably use paragraph 3 to support your model.

That simply extends the number of examples given. It does not
restrict them.

CBFalconer · May 15, 2009

Ike said:
The EPSILON involved does not change for any fp values.

Not so. It depends on the implementation, but for most
implementation it changes when ever the fp-object value is an
integral power of 2.

In one of your earlier posts, you introduced EPSILON as
a shorthand for FLT_EPSILON, DBL_EPSILON or LDBL_EPSILON.
For a given implementation, these are fixed constants.
They're defined in <float.h> .

No argument. I am not discussing all the fp formats at once, one
will do.

Need Helping adding Square root code to an existing calculator. (Absolute begginer?)	0	Jan 12, 2025
How to alter the program so that when user types z or Z or 0, the program sets both a and b to zero?	0	Oct 10, 2022
Where is my mistake? Why is s equal to minus infinity at some loop iterations?	0	Oct 9, 2022
Comparison of Integer and Pointer (that's supposed to be an Integer). Where did I go wrong?	0	Nov 19, 2022
Structures and chained lists questions :	1	Feb 12, 2011
Rich Text Format (RTF) Document Builder in C++: Code and Features	0	Sep 28, 2025
Runtime Error with __gcd? (floating point exception)	1	Nov 27, 2024
Secure Keyboard v2.0 Modern C++ Virtual Keyboard for Windows (Glassmorphism UI, Clipboard Auto-Clear)	0	Mar 26, 2026

Float comparison

Keith Thompson

Keith Thompson

Keith Thompson

Keith Thompson

CBFalconer

CBFalconer

Flash Gordon

Keith Thompson

Flash Gordon

Flash Gordon

Keith Thompson

Phil Carmody

Ike Naar

crisgoogle

CBFalconer

CBFalconer

CBFalconer

CBFalconer

CBFalconer

CBFalconer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads