Float comparison

Dik T. Winter · May 20, 2009

>
> Of course. But there are infinitely many other reals that are not
> equal to y, and will be expressed as y. I used 'represented by' to
> express this fact.

I thought I did read "xmax is the smallest such real that can be
*expressed* in the fp-system". Did you write something different?

Dik T. Winter · May 20, 2009

>
> Somewhere, sometime, I specified 'the usual rounding', or similar.

Currently the 'usual rounding' is the IEEE arithmetic round to even rule.
Or do you have something different in mind?

Dik T. Winter · May 20, 2009

> "Dik T. Winter" wrote: ....
>
> Are you claiming that, in your expression, (1.0 / 3.0) is never
> stored anywhere, including in registers?

At what point is (in your opinion) the value one third (if it ever did
exist on the computer) change to an approximation of that?

gwowen · May 20, 2009

When you say 'floating-point numbers' above, are you talking about

real values, or values that the fp-object can hold and spit back?
I don't think it matters too much there, but it illustrates the
things that need caution.

For an FP in which 1.0 is exactly representable, can you answer the
following questions?

float x = 1.0; // what value does x hold? What range does it
represent?
float y = 1.0; // what value does y hold? What range does it
represent?

float z1 = x * y; // what value does z1 hold? What range does it
represent?
float z2 = x * x * x * x * x * x * x * x * x; // what value does z2
hold?
// What range does it
represent?

float w = 0.0; // what value does w hold? What range does it
represent?
float z2 = (x-y) * (x-y); // what value does y hold? What range does
it represent?
float z3 = (z2-w) * (x-y) * z2; // ibid

// Are the ranges represented by z variables
// in any way related to the ranges of the variable from which the
// are computed?
//
// If not, what purpose can they possibly serve?

Flash Gordon · May 20, 2009

I did. Twice. In messages you responded to. here it is for a third time
http://www.physics.arizona.edu/~restrepo/475A/Notes/sourcea/node13.html

You don't have to read it. Just don't dismiss it as being the wrong way
to do things and something that leads to madness without even having the
courtesy to read the information you are given.

I keep seeing some responses to my messages from Richard noname.
He is PLONKED as a troll, and thus the actual messages are not
transmitted to me.

What has that got to do with anything? It was me who posted the links.

Flash Gordon · May 20, 2009

CBFalconer said:
Flash Gordon wrote:
.... snip ...

I don't see the point in that.

The point is that you are failing to understand when things are
explained in the abstract, so I want to explain using a concrete example
that *your* believe in, not one that I have invented.

I hope to write a demonstration
program sometime when this has calmed down. Right now I have no
time for it. I have given the values expressed in terms of the
appropriate EPSILON values in float.h, which are portable TO
SIMILAR fp-systems. Most are similar.

As already explained to you, I and others are talking about what applies
to EVERY C implementation, not some undefined subset of similar
implementations. If you model does not apply to every C implementation
then that in itself should be enough to show you that it is not an
accurate reflection of the standard.

Actually, I can't even do it because my library is not a C99
library, and it doesn't include nextafter.

OK, replace the second line which whatever you need to get the next
representable value above 1.0. Or even do the arithmetic based on what
is in your float.h file to derive the numbers (since you say they can be
derived from them, that should be easy).

I don't have any c99
library, and I am not going to bother to write a nextafter. I have
shown how you can get the same result from the appropriate EPSILON.

If you know how to get the numbers then you know how to get them. I need
the numbers from YOU so that you cannot claim that I have misunderstood
how to generate them.

If you want just invent some numbers! As long as they are numbers you
are happy show something reasonable and something you consider valid for
discussion of what could occur on an implementation I don't care that
much. Ideally I want what you believe are real numbers so you can't
claim that what I point out is an artifact of the chosen example, but
I'll work with whatever you provide.

Flash Gordon · May 20, 2009

CBFalconer said:
People do not seem to appreciate my attempt to inject a touch of
humor/humour.

It looked like you were trying to make a serious point. Certainly what I
wrote was a serious point which I'm not sure you have fully understood
based on other recent posts you have made.

Flash Gordon · May 20, 2009

Keith Thompson wrote:

(It's likely that the expression 1.0/3.0 will be optimized to a
constant value and no division operation will actually be executed.
In fact, I'd be mildly astonished by a compiler that *didn't* perform
this simple optimization. But I'll ignore that and stick to the
semantics of the abstract machine, in which the division actually
occurs.)

<snip>

I've come across a Pascal compiler that did not do this kind of simple
optimisation, so there is precedent for it not happening..

Ike Naar · May 20, 2009

I've come across a Pascal compiler that did not do this kind of simple
optimisation, so there is precedent for it not happening..

Or think of a cross compiler.

Flash Gordon · May 20, 2009

Keith said:
And here we have a result that I have difficulty believing is what you
intended.

Personally I thought it was 50-50 that this is what he believed, so I do
believe it.

Let's put all this in decimal notation, so we can compare things more
easily:

x = 1.0
xmax = 1.0625
ymin = 1.0546875
y = 1.125

x and y are adjacent FP numbers; nextafter(x, +INFINITY) == y.

So the real ranges represented by these two distinct FP values
overlap.

Does that really make sense to you?

Assume the FP type we're dealing with here is called "tinyfloat", and
consider the following:

double x = 1.0;
double foo = 1.05859375; // halfway between ymin and xmax
double y = 1.125;

printf("x = %f\n", x);
printf("y = %f\n", y);
printf("foo = %f\n", foo);

I hope you'll agree that the first two lines of output will be:

x = 1.0
y = 1.125

What is the third line of output?

Of equal importance is the simple question "why?"

This, by the way, is why I wanted actual numbers. It makes it far easier
to show certain problems.

Flash Gordon · May 20, 2009

Keith said:
And here we have a result that I have difficulty believing is what you
intended.

Personally I thought it was 50-50 that this is what he believed, so I do
believe it.

Let's put all this in decimal notation, so we can compare things more
easily:

x = 1.0
xmax = 1.0625
ymin = 1.0546875
y = 1.125

x and y are adjacent FP numbers; nextafter(x, +INFINITY) == y.

So the real ranges represented by these two distinct FP values
overlap.

Does that really make sense to you?

Assume the FP type we're dealing with here is called "tinyfloat", and
consider the following:

double x = 1.0;
double foo = 1.05859375; // halfway between ymin and xmax
double y = 1.125;

printf("x = %f\n", x);
printf("y = %f\n", y);
printf("foo = %f\n", foo);

I hope you'll agree that the first two lines of output will be:

x = 1.0
y = 1.125

What is the third line of output?

Of equal importance is the simple question "why?"

This, by the way, is why I wanted actual numbers. It makes it far easier
to show certain problems.

Phil Carmody · May 20, 2009

Flash Gordon said:
Keith Thompson wrote:

<snip>

I've come across a Pascal compiler that did not do this kind of simple
optimisation, so there is precedent for it not happening..

And given that the rounding mode can be changed at run-time, there
are even supportable reasons for modern C compilers to not perform
such short-cuts if they think that by not so doing they're doing
you a favour.

Phil

CBFalconer · May 20, 2009

Keith said:
.... snip ...

I mean both; they're essentially the same. The standard defines,
for each floating-point number (excluding NaNs and infinities) the
real number that is its value in the model.

And FP objects are irrelevant here; a value needn't be stored in
an object, and storing it in an object doesn't change it.

Not so. As an elementary demo, I am using double to prepare a
'real value', and float to demonstrate.

#include <stdio.h>
#include <float.h>

/* Show that a value is altered by storing in a float */
void demo(void) {
volatile double d; /* volatile to ensure store/loads happen */
volatile float f;

d = 1.0/3;
f = 0; /* just to ensure d is actually stored */
f = d;
printf("DBL_DIG=%d\t d=%.*e\n", DBL_DIG, DBL_DIG+2, d);
printf("FLT_DIG=%d\t f=%.*e\n", FLT_DIG, FLT_DIG+2, (double)f);
}

/* ----------------- */

int main(void) {
demo();
return 0;
}

I am too lazy just now to figure out how to dump the float and
double in hex. But this should show that the value gets altered by
the storage. I am just using double to have something
understandable to C that can get altered. The output on my
machinery is:

[1] c:\c\junk>a
DBL_DIG=15 d=3.33333333333333315e-01
FLT_DIG=6 f=3.33333343e-01

CBFalconer · May 20, 2009

For an FP in which 1.0 is exactly representable, can you answer
the following questions?

float x = 1.0; // what value does x hold? What range does it
represent?
float y = 1.0; // what value does y hold? What range does it
represent?

You have omitted all attribution. Please don't do that for
anything you quote, and do quote enough to make your query clear.
I shall answer just the one question.

The fp-object-value of both x and y is 1.0. The 'range' is usually
going to be:

1.0*(1+FLT_EPSILON) > x;
1.0*(1-FLT_EPSILON/2) < x;

where the anomaly for the -ve swing is because x is an exact power
of 2. This applies in most fp-systems using normal rounding.

When you subtract equal fp-object-values you get zero. The range
of that is large, compared to its value. You can figure out the
least possible error from the ranges on the input values to the
subtraction. For actual error you need the actual error on the
input values, which may be much larger.

CBFalconer · May 20, 2009

Joe said:
Does your knowledge of the FP system on your machine include:

(32-bit float)

3 2 1
10987654321098765432109876543210
- 1-bit sign (1 == negative)
-------- 8-bit exponent (unsigned)
24-bit mantissa ------------------------

^_ here you omitted a blank. Added.

Note b23, the putive msb of the mantissa is actually the lsb of the
exponent. The sign bit is b31. The exponent bias in the following
examples is 126. The imaginary binary point is to the left of the msb of
the mantissa.

That looks fine to me. I prefer to think of the significand as
being spread over 24 bits, and after normalization the MSbit (known
to be 1) is replaced by the sign bit. Nothing to do with the
exponent bits.

FLT_EPSILON is an FP value precisely:
00110100 00000000 00000000 00000000
Exp = 104 (-22)
11101010
Man = .10000000 00000000 00000000
1.19209290e-07

The value 1.0
00111111 10000000 00000000 00000000
Exp = 127 (1)
00000001
Man = .10000000 00000000 00000000
1.00000000e+00

The value (1.0 + FLT_EPSILON)
00111111 10000000 00000000 00000001
Exp = 127 (1)
00000001
Man = .10000000 00000000 00000001
1.00000012e+00

The value FLT_EPSILON is constant and is defined in terms of the value
1.0. It doesn't vary in any way under any circumstance.

The function nextafter*() has nothing to do with *EPSILON.

Yes it does, because the value of EPSILON is used to compute it.
Remember that epsilon (lower case) which I have been using lately
as a real value, and is the VARIABLE epsilon that, for any y
(=fp-object-value) can form the successor fp-object value. i.e.:

x + epsilon (=xmax)

will, when stored in an fp-object, result in the next fp-object
value y. This variation if easily handled by simply using:

xmax = x*(1+EPSILON)

to form xmax. This uses the rounding of the fp-system.

CBFalconer · May 20, 2009

Keith said:
That is absolutely wrong. I honestly can't imagine where you get
these ideas. Sorry to be harsh, but this is frustrating.

(It's likely that the expression 1.0/3.0 will be optimized to a
constant value and no division operation will actually be executed.
In fact, I'd be mildly astonished by a compiler that *didn't* perform
this simple optimization. But I'll ignore that and stick to the
semantics of the abstract machine, in which the division actually
occurs.)

We have an expression 1.0/3.0, with subexpressions 1.0 and 3.0. Each
subexpression is of type double, with a value that can be represented
exactly; there's no loss of precision in the evaluation of the
floating-point constants. At this point, after the constants are
evaluted but before the division is evaluated, we have two distinct FP
values, 1.0 and 3.0.

Then the division is evaluated. The division operation takes two
double operands, with values 1.0 and 3.0, and yields a floating-point
result; on my system, that result is exactly
0.333333333333333314829616256247390992939472198486328125 .

Which is NOT 1.0/3.0. You can supply the two integers as integers
in a structure, for example. You can use that to build a whole
rational arithmetic system, but we won't bother. The point is that
the exact value has been specified, and the fp-processing has
altered it. My 'range' process on the fp-object-value ties it
down.

BTW, there is no point in printing digits from your double value
past the length specified by DBL_DIG (see float.h). See my earlier
answer to you and the example showing how the fp-system alters
values.

CBFalconer · May 20, 2009

Keith said:
.... snip ...

(1/3 isn't transcendental; it's just not representable in
floating-point.)

I didn't claim it was. Either.

CBFalconer · May 20, 2009

Richard said:
CBFalconer said:
.... snip ...

Fine. Please show me how to specify the (positive) square root
of 2 to a C program. Not an approximation, please, ...

If you bothered to read my message closely, you should have been
able to detect the use of "an exact real". The phrase was NOT
"every exact real". There is a significant difference. I thought
(erroneously) that you were capable of that distinction.

CBFalconer · May 20, 2009

Flash said:
.... snip ...

If you want just invent some numbers! As long as they are numbers
you are happy show something reasonable and something you consider
valid for discussion of what could occur on an implementation I
don't care that much. Ideally I want what you believe are real
numbers so you can't claim that what I point out is an artifact
of the chosen example, but I'll work with whatever you provide.

I did that in another message in the last 2 days to Keith.

CBFalconer · May 20, 2009

Keith said:
And here we have a result that I have difficulty believing is
what you intended.

Let's put all this in decimal notation, so we can compare
things more easily:

x = 1.0
xmax = 1.0625
ymin = 1.0546875
y = 1.125

I don't find the decimals easier.

x and y are adjacent FP numbers; nextafter(x, +INFINITY) == y.

So the real ranges represented by these two distinct FP
values overlap.

Does that really make sense to you?

YES. Remember that xmax and ymin are values TO BE INSERTED into an
fp-object in order to generate the nextafter (or nextbefore)
fp-object-value. That is their only purpose. They are not
fp-object-values. If we average them, for example, considering
that xmax > ymin, we get a value < xmax and > ymin.

But the only purpose of xmax was to generate something that forced
a y. The only purpose of ymin was to generate something that
forced an x. These numbers were sufficiently different from x (and
y) to force a one bit difference in the significand. They were NOT
the exact fp-object-values formed.

Need Helping adding Square root code to an existing calculator. (Absolute begginer?)	0	Jan 12, 2025
How to alter the program so that when user types z or Z or 0, the program sets both a and b to zero?	0	Oct 10, 2022
Where is my mistake? Why is s equal to minus infinity at some loop iterations?	0	Oct 9, 2022
Comparison of Integer and Pointer (that's supposed to be an Integer). Where did I go wrong?	0	Nov 19, 2022
Structures and chained lists questions :	1	Feb 12, 2011
Rich Text Format (RTF) Document Builder in C++: Code and Features	0	Sep 28, 2025
Runtime Error with __gcd? (floating point exception)	1	Nov 27, 2024
Secure Keyboard v2.0 Modern C++ Virtual Keyboard for Windows (Glassmorphism UI, Clipboard Auto-Clear)	0	Mar 26, 2026

Float comparison

Dik T. Winter

Dik T. Winter

Dik T. Winter

gwowen

Flash Gordon

Flash Gordon

Flash Gordon

Flash Gordon

Ike Naar

Flash Gordon

Flash Gordon

Phil Carmody

CBFalconer

CBFalconer

CBFalconer

CBFalconer

CBFalconer

CBFalconer

CBFalconer

CBFalconer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads