Float comparison

C

crisgoogle

<big snip>
<more snip>
Everyone understands that for the vast majority of mathematical
results
that one "tries" to store in a float, that result is not exactly
representable.
You seem to believe that the float therefore represents all those
possible values.
By your reasoning, an unsigned int also represents an infinite
number of values, namely:  x + n * (UINT_MAX + 1), where n is in
Z, and x of course is the single value, on [0, UINT_MAX], that
most people would say is stored in that unsigned int.
Is that your claim, that, without looking at the programming,
unsigned ints have an infinite number of values? If not, why not?
How is this situation different than that for floats?

No, because the arithmetic system on unsigned ints is closed.
Apart from division by zero, you can't generate a value outside the
set 0 .. UINT_MAX.  Those things are not integers.  They follow
defined rules.  They are not intended to represent integers.
Similarly floats are not reals, and also follow defined rules.
However we can always deposit a real in a float, and the question
is 'what is that real'.

Right, the floats follow defined rules, and under those rules,
they _are_ closed. Operations on floats
result in floats. As someone else (Keith T?) mentioned elsthread,
any real value that might, theoretically, be the mathematical
result of any float operation _never_ exists in the computer.
The operands are floats, the FPU stores them and the results
in floating point registers, and the value stored back to
memory is a float.

I still don't see what the fundamental difference in approaches is.
Sure, unsigned int's are not integers. But neither are floats reals.
Values stored back into either one are not necessarily the
mathematical result of the expression that led to that value.
Either one may be the result of literally an infinite number
of mathematical values.
 
K

Keith Thompson

CBFalconer said:
It did when I wrote it! :)


No, ranges don't 'touch at the midpoint'. The midpoint of the
range is usually the value of the fp object. For most systems the
exceptions come when the fp object value is an integral power of
two.

By "midpoint", I didn't mean the midpoint of a range, I meant the
point *between* two consecutive ranges.
With x < y I think part of the confusion is that xmax is NOT
represented by anything in the x range, but by something in the y
range. Similarly ymin is represented by something in the x range,
not in the y range. This arises from the use of < and >, rather
than <= or >=.

As a result there is NO value larger than xmax and smaller than y
min. Yet xmax > ymin.

|----------y----------|---------x--------|-------z-----
xmax^ ^ymin zmax^ ^xmin

When drawing a line segment representing a range of real numbers, the
usual convention is to show smaller numbers to the left, larger
numbers to the right. Why did you draw your picture backwards? Is
the ordering y, x, z supposed to mean something?

Let me try this again. In your model, the FP number x represents a
range of real numbers, from xmin to xmax, and the FP number y
represents a range of real numbers from ymin to ymax. (We'll leave
aside for the moment the question of whether the ranges include the
endpoints.) We assume that x < y, and that y == nextafter(x, x+1.0)
-- i.e., x and y are consecutive representable FP numbers. Let's also
assume that x and y are, say, somewhere near 1.5, so the difference
between consecutive FP numbers is uniform. x and y are FP numbers,
xmin, xmax, ymin, and ymax are real numbers. Remember, smaller
numbers are to the *left*, larger numbers to the *right*.


|----------x----------|----------y----------|
xmin xmax
ymin ymax

Any real number that is greater than xmin and less than xmax is in the
range of the FP number x. Similarly, any real number that is greater
than ymin and less than ymax is in the range of the FP number y.

Are you with me so far?

You've said that x < y, but xmax > ymin. Is that really what you
meant? If so, the diagram looks something like this:

|----------x----------|----------y----------|
xmin ymin xmax ymax

You've also said that there is no value (real value?) larger than xmax
and smaller than ymin. Given that xmax > ymin, that's an unremarkable
statement, but I suspect what you really meant is that there's no
value smaller than xmax and larger than xmin. In other words, xmax
and ymin (both of which are real numbers) are unequal, yet there are
no other real numbers between them.

This is quite simply mathematically impossible. For any two unequal
real numbers, there are infinitely many real numbers between them.
For example, for any real numbers a and b where a < b:

a < (a+b)/2) < b

Can you clarify this point?

What exactly is the relationship between xmax and ymin? If they're
unequal, by how much do they differ? Is each real number in the
range -DBL_MAX .. +DBL_MAX a member of the range of some double FP
value? Which FP value's range does xmax belong to? What about ymin?

You *could* have an internally consistent model where xmax==ymin,
and each FP number's range includes, say, its upper bound but
not its lower bound. Then the range corresponding to x would be
the set of reals rx such that xmin < rx <= xmax, and likewise
for y. Or the membership of xmax could be left unspecified or
implementation-defined. But until you at least come up with an
internally consistent model, it's going to be very difficult to
discuss this.
 
K

Keith Thompson

CBFalconer said:
Keith said:
CBFalconer said:
Keith Thompson wrote:

... snip ...

I don't dispute that these ranges exist. I deny that a floating-
point number represents a range rather than a single value. And
I honestly don't understand how you can continue to claim
otherwise after reading 5.2.4.2.2.

Here (again) is paragraph 10 from 5.2.4.2:

[#10] The values given in the following list shall be
replaced by implementation-defined constant expressions with
(positive) values that are less than or equal to those
shown:

-- the difference between 1 and the least value greater
than 1 that is representable in the given floating
point type, b1-p

FLT_EPSILON 1E-5
DBL_EPSILON 1E-9
LDBL_EPSILON 1E-9

Why do you keep quoting that? I know what it says, and it doesn't
support your claims. Do you see the word "range" in there somewhere?

I have said that the 'range' is my contribution to the verbiage.
It is the range of real numbers represented by an fp objects value.

It's not the verbiage I object to. It's your assertion that the
concept behind the verbiage is fundamental to the nature of
floating-point numbers.
That is an (understandable) omission by the standard. There are
two quantities of interest. One is the value of the fp-object.
The second is the real number that was stored therein.

And what real value would that be?

Given:

double x = 1.0/3.0;

are you still asserting that the real value one-third is somehow
stored in x?

It isn't. It cannot be.
Paragraph 10 isn't about floating-point values in general. It's about
1.0, DBL_EPSILON, and 1.0+DBL_EPSILON (and the corresponding values
for the other types). If it's supposed to be a statement about
floating-point values in general, it's extraordinarly badly worded.
Also note the specific mention of other values in paragraph 3.

[#3] Floating types may include values that are not
normalized floating-point numbers,
...

Again, that's talking about denormalized numbers, NaNs, and
infinities, not about these ranges of yours. Note carefully the word
"may". A conforming implementation could have floating-point types
that don't have any of these things, so that paragraph needn't apply.
If your range model were valid, it would apply equally to such a
system. You can't reasonably use paragraph 3 to support your model.

That simply extends the number of examples given. It does not
restrict them.

Note again the use of the word "may". Imagine a conforming C
implementation in which the floating types do not include any values
that are not normalized floating-point numbers. Does your model still
apply to such an implementation?
 
K

Keith Thompson

CBFalconer said:
Not so. It depends on the implementation, but for most
implementation it changes when ever the fp-object value is an
integral power of 2.
[...]

No. DBL_EPSILON is "the difference between 1 and the least value
greater than 1 that is representable in the given floating point type"
(in this case, type double). On my system, for example, DBL_EPSILON
is exactly 2.0**-51 (assume the obvious meaning for "**").

The difference between two consecutive double values does change over
the range of type double. But DBL_EPSILON is not the difference
between two arbitrary consecutive double values. It's *only* the
difference between 1.0 and nextafter(1.0, 2.0), nothing else.

When you say EPSILON, do you mean DBL_EPSILON, or do you mean the
difference between two arbitrary consecutive double values? If the
latter, I suggest finding a different term.
 
K

Keith Thompson

CBFalconer said:
Instead of yammering at each other with fixed positions, consider
this. My view of a 'range' works for everything your 'fixed value'
version does. It is just more detailed. Note that the reverse
does NOT apply.

First off, being more detailed isn't necessarily a virtue. You are
adding details that are not supported by the standard.

Second, I disagree that your model adds more detail. It doesn't
distinguish among the infinitely many numbers in the range for a given
floating-point value. The single-value model defines a single
unambiguous real value for each floating-point value.
 
C

CBFalconer

.... snip ...


Right, the floats follow defined rules, and under those rules,
they _are_ closed. Operations on floats result in floats. As
someone else (Keith T?) mentioned elsthread, any real value that
might, theoretically, be the mathematical result of any float
operation _never_ exists in the computer. The operands are floats,
the FPU stores them and the results in floating point registers,
and the value stored back to memory is a float.

This is a quickie, but floats are NOT closed. Reals are closed.
For most implementations floats have specific changes each time the
value is doubled (or halved). If you form:

c = a + b;
d = c - a;

and examine b and d, they will normally be different.
Precondition: a is greater than 2 * b, or b is greater than 2 * a.
 
C

CBFalconer

Keith said:
.... snip ...


First off, being more detailed isn't necessarily a virtue. You are
adding details that are not supported by the standard.

Second, I disagree that your model adds more detail. It doesn't
distinguish among the infinitely many numbers in the range for a
given floating-point value. The single-value model defines a
single unambiguous real value for each floating-point value.

You can derive your view from mine, by abandoning knowledge. I
can't derive my view from yours. Thus my view is more detailed.
It is NOT wrong.

And we can never distinguish between various fp values in the
'range'. We just need to be aware that the object value specifies
something in that range.

Consider how you can place limits on the accuracy of matrix
inversion.
 
K

Keith Thompson

CBFalconer said:
This is a quickie, but floats are NOT closed. Reals are closed.
For most implementations floats have specific changes each time the
value is doubled (or halved). If you form:

c = a + b;
d = c - a;

and examine b and d, they will normally be different.
Precondition: a is greater than 2 * b, or b is greater than 2 * a.

That's not what "closed" means. A set is closed under an operation if
applying that operation to members of the set always yields a member
of the set. It's not about reversibility.

Whether the set of floats is closed under addition is
implementation-specific. The behavior on overflow (or division by
zero) is undefined. If addition of two floats always yields either a
numeric float value, a NaN, or an infinity, then the set of floats is
closed under addition. If some addition operators can cause the
program to crash rather than yielding a result, then it isn't.
 
C

CBFalconer

Keith said:
By "midpoint", I didn't mean the midpoint of a range, I meant the
point *between* two consecutive ranges.


When drawing a line segment representing a range of real numbers, the
usual convention is to show smaller numbers to the left, larger
numbers to the right. Why did you draw your picture backwards? Is
the ordering y, x, z supposed to mean something?

Let me try this again. In your model, the FP number x represents a
range of real numbers, from xmin to xmax, and the FP number y
represents a range of real numbers from ymin to ymax. (We'll leave
aside for the moment the question of whether the ranges include the
endpoints.) We assume that x < y, and that y == nextafter(x, x+1.0)
-- i.e., x and y are consecutive representable FP numbers. Let's also
assume that x and y are, say, somewhere near 1.5, so the difference
between consecutive FP numbers is uniform. x and y are FP numbers,
xmin, xmax, ymin, and ymax are real numbers. Remember, smaller
numbers are to the *left*, larger numbers to the *right*.

|----------x----------|----------y----------|
xmin xmax
ymin ymax

Any real number that is greater than xmin and less than xmax is in the
range of the FP number x. Similarly, any real number that is greater
than ymin and less than ymax is in the range of the FP number y.

Are you with me so far?

You've said that x < y, but xmax > ymin. Is that really what you
meant? If so, the diagram looks something like this:

|----------x----------|----------y----------|
xmin ymin xmax ymax

Precisely. ymin is a real value specified by x. xmax is a real
value specified by y. y > x. Therefore xmax > ymin. This, with
the aid of nextafter(), resolves the end point confusion.
You've also said that there is no value (real value?) larger than xmax
and smaller than ymin. Given that xmax > ymin, that's an unremarkable
statement, but I suspect what you really meant is that there's no
value smaller than xmax and larger than xmin. In other words, xmax
and ymin (both of which are real numbers) are unequal, yet there are
no other real numbers between them.

No. xmax is specifiable by y, and ymin by x. They are real
values. We know xmax > ymin. They are NOT the object values.
Anything smaller than xmax is specified by x. Anything larger than
ymin is specified by y. There is no impossibility. You can make
'smaller' any small value you wish. Similarly you can make larger
any small value you wish. You are dealing with reals here, not
floats.
This is quite simply mathematically impossible. For any two unequal
real numbers, there are infinitely many real numbers between them.
For example, for any real numbers a and b where a < b:

a < (a+b)/2) < b

Can you clarify this point?

For this you have to pick the exact values first. In the above I
am dealing with limits. For example, if I pick zero as one value,
and q as anything larger than zero, there are infinite reals
between zero and q, once I pick q. This brings up the old saw from
50+ years (for me :) ago: "For any e you can pick a so that b-a <
e".
What exactly is the relationship between xmax and ymin? If they're
unequal, by how much do they differ?

I don't know. I know how to calculate an xmax that will produce
the y value. I know how to calculate a ymin that will produce an x
value. I know that x < xmax and y > ymin, also x < y and ymin <
xmax. Here again we have two actual values - the one calculated to
fix the end points, and the real values in the ranges. The latter
are the things subject to limits.

....
 
C

CBFalconer

Keith said:
.... snip ...


And what real value would that be?

Given:

double x = 1.0/3.0;

are you still asserting that the real value one-third is somehow
stored in x?

It isn't. It cannot be.

True. It isn't. However, it was stored. The fp-object didn't
accept it unchanged. It converted it into a fp-object-value.
That's all that is left. Now the question is how can we use that,
and how big are the errors involved in so using it.

....
 
C

CBFalconer

Keith said:
CBFalconer said:
Not so. It depends on the implementation, but for most
implementation it changes when ever the fp-object value is an
integral power of 2.
[...]

No. DBL_EPSILON is "the difference between 1 and the least value
greater than 1 that is representable in the given floating point type"
(in this case, type double). On my system, for example, DBL_EPSILON
is exactly 2.0**-51 (assume the obvious meaning for "**").

The difference between two consecutive double values does change over
the range of type double. But DBL_EPSILON is not the difference
between two arbitrary consecutive double values. It's *only* the
difference between 1.0 and nextafter(1.0, 2.0), nothing else.

When you say EPSILON, do you mean DBL_EPSILON, or do you mean the
difference between two arbitrary consecutive double values? If the
latter, I suggest finding a different term.

Apply your nextafter to values in the range 1.0 to something less
than 2.0. As I said, it is implementation specific, but I expect
you will find that the EPSILON returned is constant. If you apply
it to values in the 2.0 to less than 4.0 you will find the EPSILON
returned is doubled. This is a prediction.

Use something like "nextafter(x, x + 1.0)" to compute the EPSILONs.
 
K

Keith Thompson

CBFalconer said:
True. It isn't.
Right.

However, it was stored.

So you agree that it isn't stored, and that it cannot be stored. But
it was stored. How is that not a contradiction?
The fp-object didn't
accept it unchanged. It converted it into a fp-object-value.

No, the fp-object didn't change the value. The C expression 1.0/3.0
produced a floating-point number. That floating-point number is
close, but not equal, to the real value one-third. The real value
one-third *never existed in the program*. The rational number that
was computed by the division operator is *the only thing* that is
stored in x.
That's all that is left. Now the question is how can we use that,
and how big are the errors involved in so using it.

Given enough information about the floating-point implementation, we
can compute *exactly* the error resulting from the computation, with
"error" being defined as the difference between the real number
one-third and the real number corresponding to the nearby rational
value that was actually computed. Mathematically, it's going to be
something like A/2**Y - 1/3, or 1/3 - A/2**Y.

Computing the possible error for more elaborate computations is more
difficult (and it's not made any easier by pretending that the stored
values are ranges0.
 
F

Flash Gordon

CBFalconer said:
Precisely. ymin is a real value specified by x. xmax is a real
value specified by y. y > x. Therefore xmax > ymin. This, with
the aid of nextafter(), resolves the end point confusion.

ymin and xmax a *real* numbers, so nextafter has absolutely nothing to
do with them. nextafter(x)==y, but this says nothing about how you are
defining your limits or about the relative values of xmax and ymin.
No. xmax is specifiable by y, and ymin by x. They are real
values. We know xmax > ymin. They are NOT the object values.

No one claimed they were floating point numbers, they are limits on two
of your ranges. As they are real numbers or rationals, then as you have
already been told (and was proved many years ago) it is PROVABLE that
there are an infinite number of real numbers between them. This is
fairly basic number theory. So if xmax is greater than ymin there are an
INFINITE number of real values that fall in to both of your ranges.
Anything smaller than xmax is specified by x. Anything larger than
ymin is specified by y. There is no impossibility. You can make
'smaller' any small value you wish. Similarly you can make larger
any small value you wish. You are dealing with reals here, not
floats.

WE KNOW THEY ARE REALS! They were introduced in to the discussion
explicitly as real values!

Now do you claim that xmax is mapped to x and ymin to y? This is a
simple question with a yes/no answer.
For this you have to pick the exact values first.> In the above I
am dealing with limits. For example, if I pick zero as one value,
and q as anything larger than zero, there are infinite reals
between zero and q, once I pick q. This brings up the old saw from
50+ years (for me :) ago: "For any e you can pick a so that b-a <
e".

Yes, that is correct. Now instead of reiterating what everyone else
knows deal with the issue. Mind you, that is a related point, not the
actual one.
I don't know. I know how to calculate an xmax that will produce
the y value.

Then pick two values and calculate them. For simplicity I suggest you
let x be 1.0 and y be nextafter(x)
I know how to calculate a ymin that will produce an x
value. I know that x < xmax and y > ymin, also x < y and ymin <
xmax.

If xmax!=ymin then the mathematical expression (xmax+ymin)/2 is a real
number which is different to both and between them. Is that value in the
range of x, y or both, and why.
Here again we have two actual values - the one calculated to
fix the end points, and the real values in the ranges. The latter
are the things subject to limits.

What are you talking about? x and y are consecutive representable values
in the floating point system used by a C implementation, the min/max
values are the limits of the ranges that you say they represent. Are you
saying that your ranges are not exact now?
 
K

Keith Thompson

CBFalconer said:
Keith said:
CBFalconer said:
Ike Naar wrote:

If you look at things carefully you will see that the EPSILON
involved does not change for fp values from greater than 1.0 to
less than 2.0.

The EPSILON involved does not change for any fp values.

Not so. It depends on the implementation, but for most
implementation it changes when ever the fp-object value is an
integral power of 2.
[...]

No. DBL_EPSILON is "the difference between 1 and the least value
greater than 1 that is representable in the given floating point type"
(in this case, type double). On my system, for example, DBL_EPSILON
is exactly 2.0**-51 (assume the obvious meaning for "**").

The difference between two consecutive double values does change over
the range of type double. But DBL_EPSILON is not the difference
between two arbitrary consecutive double values. It's *only* the
difference between 1.0 and nextafter(1.0, 2.0), nothing else.

When you say EPSILON, do you mean DBL_EPSILON, or do you mean the
difference between two arbitrary consecutive double values? If the
latter, I suggest finding a different term.

Apply your nextafter to values in the range 1.0 to something less
than 2.0. As I said, it is implementation specific, but I expect
you will find that the EPSILON returned is constant. If you apply
it to values in the 2.0 to less than 4.0 you will find the EPSILON
returned is doubled. This is a prediction.

Use something like "nextafter(x, x + 1.0)" to compute the EPSILONs.

You are trying to redefine EPSILON.

Above, I asked you a specific question:

When you say EPSILON, do you mean DBL_EPSILON, or do you mean the
difference between two arbitrary consecutive double values?

So which is it? Or are you suggesting that the value of DBL_EPSILON
changes depending on what floating-point number you're looking at?

I know that the value of nextafter(x, x + 1.0) will vary depending on
the value of x. But that's not what DBL_EPSILON means, and if you're
using EPSILON as an abbreviation of DBL_EPSILON (which is what you've
indicated in the past), then it's not what EPSILON means either.
 
F

Flash Gordon

CBFalconer said:
Instead of yammering at each other with fixed positions, consider
this. My view of a 'range' works for everything your 'fixed value'
version does.

No, your definition does not work for reverse error analysis. This is
where you assume your result is exact and work back from the result to
work out the range of inputs that could produce it. It does not work for
this by *definition*.
It is just more detailed. Note that the reverse
does NOT apply.

No, yours is NOT more detailed. Your model does not allow for any
reasoning that cannot be done starting from the model as everyone else
is describing it, you know, the one the C standard actually defines. In
fact, your model is so ill-defined you have not managed to answer basic
questions about it.

Both models tell us there are an infinite number of real values between
x and nextafter(x) which, if you want to map them to a floating point
number you will have to map to either x or nextafter(x) (which value
depends on the rounding mode in effect, which is implementation defined
and can be *changed* whilst the programming is running).
 
F

Flash Gordon

CBFalconer said:
Look again. Remember y > x, and y is adjacent to x. xmax is
represented by y. ymin is represented by x.

So what represents (xmax+ymin)/2 ?
remember, xmax and ymin are real numbers, so if they are different this
is a third unique value.
 
F

Flash Gordon

CBFalconer said:
True. It isn't.

If it isn't, then why claim it is?
However, it was stored.

This directly contradicts your previous statement. Either it is stored
or it is not.
The fp-object didn't
accept it unchanged. It converted it into a fp-object-value.

We are talking about C, not C++, objects do not perform operations they
are operated on by code.
That's all that is left. Now the question is how can we use that,
and how big are the errors involved in so using it.

That can be analysed easily enough if the implementation documents the
accuracy of division (it is allowed to define it as unknown). It does
not make the range part of the object though.
 
F

Flash Gordon

CBFalconer said:
You can derive your view from mine, by abandoning knowledge.

You have yet to even properly *define* your view. Our view is clearly
and unambiguously defined.
I
can't derive my view from yours.

You might not be able to, but someone who understands set theory a
little can easily do so. All they have to do is define a many-to-one
mapping from the set of reals to the set of floating-point-numbers.
However, this mapping can be changed in C99 during program execution
because the program can call functions to change the rounding mode (it
is implementation defined which modes are available and whihc mode it
starts in).
Thus my view is more detailed.
It is NOT wrong.

It is not supported by the standard.

In any case, a model of int with INT_MAX defined as 32767 is more
detailed than one which does not specify its value. However, the C
standard does not define the value.
And we can never distinguish between various fp values in the
'range'. We just need to be aware that the object value specifies
something in that range.

The model having only specific exact values does not stop people
reasoning about what errors will be introduced when the mathematically
correct value is not representable.
Consider how you can place limits on the accuracy of matrix
inversion.

Quite easily. As someone else posted, as the problem gets more complex
it is easier to reason about the range of started positions that could
give you a specific answer then the other way around.

Anyway, if you decide that 1.0 represents and orange the C standard does
not stop you from using that in your program, equally it does not stop
you from using 1.0 to represent the range of values
[1.0-deltal,1.0+deltam) or any other range you choose.
 
C

CBFalconer

Flash said:
CBFalconer wrote:
.... snip ...


No, your definition does not work for reverse error analysis. This
is where you assume your result is exact and work back from the
result to work out the range of inputs that could produce it. It
does not work for this by *definition*.

No, I never assume the fp value is exact. The fp object value IS
exact, and that simply serves to identify the available 'range' for
that value. If that shows that the errors resulting from those
approximations are negligible to your end result, you are free to
ignore them.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,434
Messages
2,571,685
Members
48,796
Latest member
Greg L.

Latest Threads

Top