Float comparison

CBFalconer · May 22, 2009

Flash said:
CBFalconer wrote:
.... snip ...

I thought you were not considering how the floating point objects
got there values? Certainly when I pointed out real uses of double
which use it precisely with no error you said it did not count
because it depended on the code. You can't have it both ways.
Well, you can try, but it is likely that someone will remember
and point it out.

But I'm not. I just gave an example of a way to feed a rational
into the fp-system. I didn't say it succeeded.

I am only worrying about what you can get back from a stored
fp-value.

Now, since you are allowing analysis of code to impact on value...
double x = 1.0;
x now contains exactly 1.0 with absolutely no error.

You are assuming it succeeded.

Flash Gordon · May 22, 2009

CBFalconer said:
I would have the same results except for the last entry. It
certainly eases some calculations, but fouls the statistics.

Hmm. Se you think you know better than the IEEE committees responsible
for writing their floating point standards? I somehow think that they
might know just a little bit more than you about floating point arithmetic.

Do you at least accept that your rounding mode is NOT what is usually
done in this day and age?

Flash Gordon · May 22, 2009

CBFalconer said:
I am worrying about the bit that controls the rounding, not the lsb
of the significand. That control bit is about to be dropped, and
the rounding makes the significand as close as possible to the
desired result.

You are also failing to allow for the fact that C allows for a number of
*different* rounding models, and even allows the program to select
between them at run-time (possibly even based on user input).

I never went through any formal training in fp-systems.

Yet your are arguing with Dik T Winter, how probably knows more about
mathematics than you and I between us have ever known.

I built my
first one in 1965, and that was in hardware, which made it harder
to modify. Later I build at least two different ones, the last was
used in embedded systems for about 20 years and continuously
improved.

<snip>

So? If you are going to start with claims of authority then you should
probably start with Dik having a lot more authority than you, and so by
believing everything he says unless you can find papers by someone more
knowledgeable than him that contradict what he says.

Keith Thompson · May 22, 2009

CBFalconer said:
Keith Thompson wrote: [...]

He's said several times in response to direct questions that EPSILON
means either FLT_EPSILON, DBL_EPSILON, or LDBL_EPSILON. (More
recently, he's "clarified" that EPSILON and epsilon are two different
things; I don't recall him saying this when he introduced the terms.)

Click to expand...

I am only talking about one floating point system. Not the three
defined by the C standard. The results apply to any of those.
Thus the word EPSILON is general to all fp-systems.

So in the context of C, does the word EPSILON refer to any of
FLT_EPSILON, DBL_EPSILON, or LDBL_EPSILON? Yes or no?

Upthread, you wrote:

Think about WHY EPSILON changes at the power of two. It has to do
with the rounding in computing the real value x*(1+EPSILON) when
that expression is handed to the fp-system.

Would you care to explain that remarkable statement?

Keith Thompson · May 22, 2009

CBFalconer said:
The following is the C standard definition. I am interpreting
"least value greater than 1" as the real value that is to be
inserted into the fp-system to form the successor value.

-- the difference between 1 and the least value greater
than 1 that is representable in the given floating
point type, b1-p

FLT_EPSILON 1E-5
DBL_EPSILON 1E-9
LDBL_EPSILON 1E-9

Why complicate it like that? The "least value greater than 1" *is*
the successor value.

When you talk about inserting real values into the FP system, you need
to consider rounding errors. Rounding errors are not relevant to the
definition of the *_EPSILON constants.

Keith Thompson · May 22, 2009

CBFalconer said:
Well, I understand it!

I seriously doubth that. :-|

Keith Thompson · May 22, 2009

CBFalconer said:
Since 1.0 is an fp-object value, then z is an fp-object value of
1.0, and has the same range as x and y. This is only considering
the intrinsic losses involved in fp storage.

So what happened to the ranges of the operands?

In your model, each of x and y represents a range of real values.
For concreteness, let's say the ranges are 0.99 .. 1.01 (that's
obviously oversimplified, but it should suffice to make the point).
Then the possible range of the result of the multiplication should
be 0.9801 .. 1.0201, shouldn't it? This is just over twice the
range of the operands.

But you say that z represents the same range as x and y. Why?

(I know why: x, y, and z all represent exact real values, not ranges.
But I'm curious how you explain it within your model.)

Keith Thompson · May 22, 2009

CBFalconer said:
Keith Thompson wrote:
... snip ...

I think that, if you follow the details of what is going on in your
machine, you will find that your 'long number' is never used. Not
having your machine, and if I did I wouldn't volunteer to do that
work, I can't be sure.

Let me clarify something.

As a floating constant of type double, appearing in a C program,
there's no effective difference between
0.333333333333333314829616256247390992939472198486328125
and
0.3333333333333333
Both evaluate (on my system) to exactly the same double value.

As real numbers, though, they are quite different. They're close, but
they differ by a known amount.

I showed the stored value in decimal because that's the most commonly
understood human-readable format for representing numbers. When it's
stored as a double, it's obviously not stored as 55 decimal digits;
it's stored in 64 bits. The point is that the represented value is
exactly 0.333333333333333314829616256247390992939472198486328125, no
more, no less. This value can also be written as 0x1.5555555555555p-2 .

It's very likely that your own system uses the same FP format as mine
(IEC 60559 double format), and that your system stores that value in
the same way as mine.

Does this program:

#include <stdio.h>
int main(void) {
double x = 1.0 / 3.0;
printf("x = %.54f\n", x);
printf("x = %a\n", x);
return 0;
}

produce this output:

x = 0.333333333333333314829616256247390992939472198486328125
x = 0x1.5555555555555p-2

on your system?

Keith Thompson · May 22, 2009

CBFalconer said:
They are bounds on numbers that can be inserted into the fp-system
and produce fp-object-values identical to x. The insertion process
takes some real, does whatever is needed to produce the optimum fit
in the fp-system, and stores that value. That xmin value is only
valid for that x and that fp-system.

I think it's the "take some real" part that's the problem. There is
no general way to take an arbitrary real number and insert it into a
floating-point system.

Question 1: Does a floating-point number (call it an "fp-object-value"
if you like) represent a contiguous range of real numbers?

Question 2: What are the (real) bounds of that range?

Flash Gordon · May 22, 2009

CBFalconer said:
No, look at the values used.> foo is smaller than xmax, not
greater. That's why it generates 1.0, which has a range that
includes the value of foo.

You are correct that I miss-read which number was which.

OK, so since the mathematical value 1.05859375 is greater than ymin
(which is by your answer quoted above 1.0546875) why was 1.0625 (or 1 +
1/8 if you prefer) not stored in foo? After all, it is within that range
(being greater than ymin)!

The exact value of foo CANNOT be stored
in the tinyfloat system.

No one expects it to.

.... rest based on error

Yes, I made a mistake in which error I thought you had made.

So how can there be an infinite number of real numbers in the overlap?
What defines which range is selected for real numbers that fall in to
both ranges?
Remember that the rounding model can be changed by the C program at run
time.

Keith Thompson · May 22, 2009

CBFalconer said:
Yes, I used sloppy verbiage. epsilon changes for every x, and is a
real value. The practical EPSILON, which causes a minimum visible
difference in fp-object values, changes when x becomes an exact
power of two. It should have yet another name.

I thought we had established that you were using EPSILON as a general
term for FLT_EPSILON, DBL_EPSILON, and LDB_EPSILON.

Now it appears that "epsilon" (lower case), "EPSILON" (upper case),
and "DBL_EPSILON" (the constant defined by the C standard) are *three*
different things.

I have directly asked you about this, and you have yet to give a
coherent answer; it just becomes more complicated every time you try
to define it.

Flash Gordon · May 22, 2009

CBFalconer said:
I didn't say it didn't exist. I said it couldn't be used in the
tinyfloat system. There is no place to put those less significant
bits.

Well, you have answered else-thread which values is stored.

However, if that value has no place in the tinyfloat system why do you
think one third has a place in double?

No. I said that a value LESS than xmax (for 1.0) gets converted to
1.0. That xmax was (the binary for the significand) 10001, where
the final 1 was used only for rounding purposes.

I got your error backwards. You are saying a value greater than ymin (as
per the numbers you provided) is converted to x instead of y.

Oh, and the C standard does not mandate rounding as you specify and,
indeed, the IEEE standard (which can optionally be followed by C
implementations) explicitly uses a different rounding model.

Flash Gordon · May 22, 2009

CBFalconer said:
But I'm not. I just gave an example of a way to feed a rational
into the fp-system. I didn't say it succeeded.

No, you gave an example of how to a floating point calculation which
produces an approximation to one third.

You still have not explained why your instance of, "well if you consider
this C code it gives a representation of a rational" is valid but my
real world "this C code is storing an exact value in a double and using
exact arithmetic with no rounding errors" is not valid.

I am only worrying about what you can get back from a stored
fp-value.

No, you are also arguing that the mathematical value of one third can be
represented in C whilst the exact value of 1.0 cannot.

You are assuming it succeeded.

It was a SERIOUS point. Making a joke about it does not stop it being a
serious point nor does it answer the point.

Keith Thompson · May 22, 2009

Keith Thompson said:
I seriously doubth that. :-|

(Typo: I meant "doubt", of course. I noticed that just as I was
sending the message.)

Flash Gordon · May 22, 2009

CBFalconer said:
Somewhere back when, yes, I did tighten some definitions.

No you completely changed them in your first post after I introduced them.

This
happened as it became clear to me what was bothering people. Now
the whole system is a mess with everybody pushing their own
definitions.

I defined EXACTLY four number. Keith has agreed with my definition of
those four numbers. I have seen no one other than YOU try to change the
definition of them.

I just want to stick with mine, as they have evolved.

Everyone who has expressed an opinion other than you seems to find your
definition confusing. Everyone who has expressed an opinion other than
you seems to find the original definition simple and easy to understand.

Now, for that simplified floating point system why not tell us some
specific ranges.

What is the range (your range) about the real value 1.0?
What is the range (your range) about the real value 1.125 (or 1 + 1/8) ?

Do these two ranges overlap?

There, i have asked the question I originally asked, except with exact
numbers for a hypothetical tinyfloat, and the only term I have
introduced is "your range" by which I mean whatever it is you mean by
the range. This is a separate question to the one I've asked else-thread.

Phil Carmody · May 22, 2009

Flash Gordon said:
There is no reason why a cross compiler cannot do this optimisation if
a "normal" compiler can. After all, the person writing it has to be
aware of how the target implements floating point anyway.

He doesn't.

What rounding mode is my FPU in currently on this linux/x86 box?
And on my OSX/POWER box? And on my BSD/C7 box? On my linux/Arm box?
My linux/Alpha box?

The opcodes to perform a dynamically-rounded division do not
change, yet the results do. So a cross-compiler can generate
them, yet not know what they will yield.

You cannot perform the 1.0/3.0 constant-folding optimisation
if you wish to honour the not-known-until-runtime settings of
the processor. It's commonly an optimisation setting to permit
or forbid the compiler from performing such optimisations.
GCC's is -frounding-math. However, the behaviour of individual
compilers is best discussed in newsgroups specific to that
compiler. The C standard gives implementations rather a lot of
flexibility in what they may do regarding rounding.

Phil

Ben Bacarisse · May 22, 2009

CBFalconer said:
I was thinking about that last night, and I think I have a strong
argument for using the simpler (is the bit 1) criterion for
rounding. Consider the value:

1110.1000

Now we say it is exactly halfway between the two possible rounded
values, because there are no more 1 bits in the value. However,
that value represents a real, and the reals outnumber the rationals
by an order of infinities. Therefore there almost MUST be another
1 bit somewhere to the right, and therefore we should round up.

This argument explains very clearly why this thread has been going on
for so long. No computation is ever done on anything other than a
rational, and a limited subset of these to boot. This means that the
exact result is (or can be) know. There is nothing to gained by
trying to guess what that exact result is really supposed to be.

There is much more to be gained by the stability offered by the round
to even rule.

<snip note that you have read the Goldberg paper>

Ben Bacarisse · May 22, 2009

CBFalconer said:
The following is the C standard definition. I am interpreting
"least value greater than 1" as the real value that is to be
inserted into the fp-system to form the successor value.

-- the difference between 1 and the least value greater
than 1 that is representable in the given floating
point type, b1-p

How can b1-p (badly formatted but presumably you know what it means)
be interpreted to mean anything but one thing? For you example system
it is 1/8.

Flash Gordon · May 22, 2009

Phil said:
He doesn't.

Well, I suppose FLT_ROUNDS could be coded to always be -1, and
fegetround and fesetround could always fail. In any case, by the above I
did not mean what the last call to fesetround was, I meant how the
processor works (which includes whether it is possible to change the
rounding mode, how many bits in a double etc).

What rounding mode is my FPU in currently on this linux/x86 box?
And on my OSX/POWER box? And on my BSD/C7 box? On my linux/Arm box?
My linux/Alpha box?

Irrelevant. Or at least, not as relevant as you think.

The opcodes to perform a dynamically-rounded division do not
change, yet the results do. So a cross-compiler can generate
them, yet not know what they will yield.

Unless FLT_ROUNDS returns -1 and fesetround and fegetround always fail
the author of the implementation needs to know how to set (if it can be
changed) and determine the rounding mode. However, in this case as it is
indeterminable it can do the rounding as it likes.

There is no difference in this for a cross compiler compared to a normal
compiler.

You cannot perform the 1.0/3.0 constant-folding optimisation
if you wish to honour the not-known-until-runtime settings of
the processor.

You (the writer of the implementation) can set the initial mode (if it
is setable, and if not you know what it is) during program
initialisation. Then if during compilation/linking you determine that
fesetround is never called you know exactly what rounding mode is in effect.

It's commonly an optimisation setting to permit
or forbid the compiler from performing such optimisations.
GCC's is -frounding-math. However, the behaviour of individual
compilers is best discussed in newsgroups specific to that
compiler.

I'm discussing what is allowed and possible, not the details of a
specific implementation.

The C standard gives implementations rather a lot of
flexibility in what they may do regarding rounding.

Yes, which also gives a lot of opportunities for doing optimisations.
Specifically it can also default to "#pragma FENV_ACCESS off" and then
unless it sees it being set to on it can assume that the default mode
(as defined by the implementation and as can be set by the
implementation during program initialisation) is in effect thus allowing
it to perform the optimisation.

So no, the compiler is not always allowed to do this optimisation.
However, the standard *does* allow it providing certain criteria are
met, and whether they are met can be determined by the author of the
implementation knowing how the target works together with some analysis
of the source code presented.

Dik T. Winter · May 22, 2009

> "Dik T. Winter" wrote: ....
>
> I am worrying about the bit that controls the rounding, not the lsb
> of the significand.

But there is not a single bit that controls the rounding in current
systems. Moreover, xxx_EPSILON as defined by the C standard is 1 "ulp"
of 1.0.

But that you are a designer of floating-point hardware does not make you
knowledgable in the handling of floating-point in numerical analysis.
Moreover, it does not show that you did make the best possible. One of
the most atrocious examples of badly designed floating-point hardware
can be found on the Cray-1 (it could get the multiplication wrong by
4 "ulp"s, and it did not do full division).

Need Helping adding Square root code to an existing calculator. (Absolute begginer?)	0	Jan 12, 2025
How to alter the program so that when user types z or Z or 0, the program sets both a and b to zero?	0	Oct 10, 2022
Where is my mistake? Why is s equal to minus infinity at some loop iterations?	0	Oct 9, 2022
Comparison of Integer and Pointer (that's supposed to be an Integer). Where did I go wrong?	0	Nov 19, 2022
Structures and chained lists questions :	1	Feb 12, 2011
Rich Text Format (RTF) Document Builder in C++: Code and Features	0	Sep 28, 2025
Runtime Error with __gcd? (floating point exception)	1	Nov 27, 2024
Secure Keyboard v2.0 Modern C++ Virtual Keyboard for Windows (Glassmorphism UI, Clipboard Auto-Clear)	0	Mar 26, 2026

Float comparison

CBFalconer

Flash Gordon

Flash Gordon

Keith Thompson

Keith Thompson

Keith Thompson

Keith Thompson

Keith Thompson

Keith Thompson

Flash Gordon

Keith Thompson

Flash Gordon

Flash Gordon

Keith Thompson

Flash Gordon

Phil Carmody

Ben Bacarisse

Ben Bacarisse

Flash Gordon

Dik T. Winter

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads