Float comparison

CBFalconer · May 10, 2009

Flash said:
Beej Jorgensen wrote:
.... snip ...

Ah, but what he said I disagree with. He claims you can deduce a range
from it, and I disagree because it could represent an exact number, a
value with a possible error of +/-12.7 or any other range.

I would agree with that. A number of people have said that you need to
perform error analysis to know what the value means (and sometimes that
you need to work backwards in your analysis fro the result instead of
forwards from the inputs).

What I have said is that, taken in isolation, you can compute the
'range' covered by any FP value, and that you cannot tell any one
value from any other in that range. I have NOT said that the FP
value cannot have errors and uncertainties greater than that
'range'. The 'range' is a minimum uncertainty. To compute the
actual error, or uncertainty, you have to closely examinge all the
programming and physics involved.

Spiros Bousbouras · May 10, 2009

I'd like to know how you intend on disseminating it if it isn't?

The medium on which the C standard is written is subject to the
laws of physics but that doesn't mean that the standard itself
is. In other words the medium is distinct from the message,
Marshall McLuhan notwithstanding.

Of course if you mean it is a work of fiction ...

You can look at it as a collection of mathematical axioms.

Keith Thompson · May 11, 2009

CBFalconer said:
Keith said:

CBFalconer said:

Since I can't easily render the formula in 5.2.4.4.2 in
plain text, I've grabbed a screenshot and made it available at
<http://www.mib.org/~kst/5.2.4.2.2.gif>. Please read it. It very
clearly says that a floating-point number is defined by a formula
that specifies a single unique real value, *not* a range.

The formula is perfectly accurate. It reflects the value specified
by the FP content. It DOESN'T reflect the meaning of that value.

Click to expand...

[...]

What is the difference between "the value" and "the meaning of that
value"? Does the standard define this "meaning"?

Click to expand...

Not in so many words, but in effect. It specifies how to compute
the uncertainty in the value at some points, using EPSILONs. It
can't be especially precise, because it doesn't specify the FP
specification. All it takes is some understanding of the actual
systems to generalize it. The 'range' word that I have been using
expresses almost all of it.

Chapter and verse, please. Show me where the standard talks about
uncertainties in stored floating-point values (*not* in floating-point
calculations).

FLT_EPSILON, DBL_EPSILON, or LDBL_EPSILON is

"the difference between 1 and the least value greater than 1
that is representable in the given floating point type, b**(1-p)"

(The standard uses a superscript. b is the radix, typically 2;
p is the precision, the number of base-b digits in the significand.)

The definition says nothing about an "uncertainty in the value".

I think that when you use the phrase "the meaning of that
value", you're just referring to what *you* think it should mean.
Your interpretation is not supported by the standard. And as you
know, the standard defines the language, which is what we discuss
in this newsgroup.

What I've been talking about all along is what you call "the value
specified by the FP content".

You have your own preconceived notions about what you think a
floating-point number *should* mean, and you've been reading the
standard through that filter.

Let's try another approach. Different programming languages
support different kinds of numeric entities: signed and unsigned
integers, complex numbers, imaginary numbers, fixed-point numbers,
floating-point numbers, rational numbers, arbitrary-precision
numbers, and so forth.

Suppose we want to define a new kind of number-like entity.
Call it a "pseudo-real number", or PRN. We define it to have the
following properties:

A PRN can be stored in a relatively small fixed number of bits,
typically 32, 64, or something similar. A stored PRN corresponds to
a single unique real number; we can define a mathematical formula
for the correspondence. This corresponding real number is always
rational, with a denominator that's a power of 2 (other bases
are possible, but we'll leave that aside for now). Each PRN value
corresponds to some real value, but the vast majority of real values
do not have a corresponding PRN value. Note in particular that a
PRN does not specify a range. It specifies a unique real value.
The set of PRNs corresponds to a sparse discrete subset of the
real numbers. If you object to that idea, remember that I'm not
talking about floating-point numbers, I'm talking about PRNs, an
entity of my own invention. As long as my definition is consistent,
I can define it any way I like.

Operations on PRNs are defined as follows. Given "x op y" (where
op can be any of +, -, *, /), we determine the mathemetical real
number corresponding to each PRN operand. We apply the specified
operator to these two real values, yielding a mathematical real
result; call it Z. (I'm ignoring division by zero.) (Z will
always be a rational number; it may or may not have a corresponding
PRN value.) Now we choose, as the result of "x op y", a PRN value
whose corresponding real value is *close* to Z. How close it must
be is implementation-defined. (Ideally, we'd like to choose the
PRN value whose real value is as close as possible to Z, but we
don't require this.)

Of course we can't actually perform calculations with mathematical
real values, but the calculations must be done in a way that yields
the same result as what's described by the above algorithm.

Are you with me so far? Do you agree that this is an internally
consistent description? I'm not asking you to agree that it's
useful or that it corresponds to any physical reality, just that
it's consistent.

Clearly my description of PRNs is incompatible with your description
of floating-point numbers. Each PRN corresponds to a unique real
value; there are no ranges anywhere in the description.

Now here's the big question. How is this description of PRNs
incompatible with the C standard's description of floating-point
numbers? Not with your notions about what floating-point numbers
*should* be, but with what the C standard actually says about them.
If it is incompatible, please quote the exact wording from the
standard that demonstrates the incompatibility.

Another question: Would PRNs as I've described them be useful?
We can't represent all real numbers in a finite amount of storage;
are PRNs an acceptable compromise that allows us to do useful
calculations?

Given the recent history of this discussion, it seems likely that you
will not attempt to answer these questions. I'm prepared to be
pleasantly surprised.

Guest · May 11, 2009

Richard Heathfield said:
Richard Heathfield said:

Phil Carmody said:

I earnestly apologise, Richard. He was teetering for quite a while,
too long as you say, but this issue seems to have been a polarising
one.

Or even better - interesting and on-topic discussion! This could
have been such a thread if it hadn't headed down the loony path.

Click to expand...

parts of it were excellent!

Dik T. Winter · May 11, 2009

> "Dik T. Winter" wrote: ....
>
> I think you are mixing up errors due to the representation, and
> errors due to the processing. The representation errors are
> known. They may or may not be significant. The processing you
> control.

But you are wrong. The representation errors (if any) are small, it is
the errors due to calculations that grow and can become significant.
Take a course on numerical analysis.

Flash Gordon · May 11, 2009

CBFalconer said:
Flash said:

CBFalconer said:

Flash Gordon wrote:
Keith Thompson wrote:

... snip ...

So what? doubles are used to store reals, more or less. ints
are used to store integers. 1.0+DBL_EPSILON/2.0 is a real[1].
1.5 is NOT an integer.

[1] but not a storable real, in a double.
It's that "more or less" that bites you, isn't it? Your model
makes some sense if you ignore those pesky details where it
falls apart.
<snip>

In addition doubles *are* used to store exact integral values
and integer types *are* used to store approximations. Any claim
that you have stored a range when you have in fact stored the
exact value that you intended to store is clearly wrong and would
make error analysis impossible.
So what? You are referring to the programming that is using that
object. I am talking about what you can deduce from the object in
isolation.

Click to expand...

That value is. The point is, that taken in isolation you do NOT know
that it is a range. Taken in isolation you only know what the actual
stored value is. is a single exact value specified by a well-defined
model which may represent either an exact quantity or a range which
could be (and ofen is) far larger than the ranges you are talking about.

The standard actually talks about values that "can be represented
exactly" in section 6.3.1.5 (paragraph 2).

Click to expand...

Yes, but that is simply a number.

Yes, which is all that is actually stored.

You do know what the 'range'
is. The number does not express the range of numbers that will be
stored with that identical value, and simply cannot be told apart.
In isolation what you know is the range of values that can have
been represented thusly.

In isolation you know exactly what value *has* been stored, you just
can't tell whether the programmer intended to store a value that cannot
be represented.

... snip ...

Go forth and implement a FP system.

I've implemented FP arithmetic (sufficient to solve my problem, not a
full system) in assembler already thanks. The data fed in to it was
exact values, the data extracted from it and finally converted to
integers was exact values. The intervening calculations were
approximations which gave a known acceptable limit on the error of the
settings given to the hardware when compared to the theoretical
mathematical values the algorithm I had to implement an approximation of.

Now address the point that your claim is in direct contradiction to the
wording of the standard.

You have yet to address how float can be a subset of double if the items
in the two sets are fundamentally different (i.e. if the range about 1.0
for the closest approximation to 1.0 is different in the two sets).

Then examine when and how it
fails.

It did not fail because I understood exactly what was needed and
carefully worked through it to ensure that the errors and approximations
stayed within acceptable limits.

You will soon appreciate the fundamental truths about such
a system. You will soon see how all values (apart from zero, NaNs,
Infs) are approximations, and how they cannot overlap.

If you think that zero is exact then you are being inconsistent.
Calculations (including the conversion of a floating point number in the
source code) can yield 0 when the "correct" value is non-zero but
smaller than the smallest number representable in the floating point system.

One point
is that taking a value and writing it down does NOT convert it from
an approximation to something exact.

No, nor does it convert something exact in to an approximation.

Note that no one has claimed that any type in C can store exactly all
rationals within any given range.

Flash Gordon · May 11, 2009

CBFalconer said:
Flash said:

CBFalconer said:

Flash Gordon wrote:
Keith Thompson wrote:

... snip ...

So what? doubles are used to store reals, more or less. ints
are used to store integers. 1.0+DBL_EPSILON/2.0 is a real[1].
1.5 is NOT an integer.

[1] but not a storable real, in a double.
It's that "more or less" that bites you, isn't it? Your model
makes some sense if you ignore those pesky details where it
falls apart.
<snip>

In addition doubles *are* used to store exact integral values
and integer types *are* used to store approximations. Any claim
that you have stored a range when you have in fact stored the
exact value that you intended to store is clearly wrong and would
make error analysis impossible.
So what? You are referring to the programming that is using that
object. I am talking about what you can deduce from the object in
isolation.

Click to expand...

That value is. The point is, that taken in isolation you do NOT know
that it is a range. Taken in isolation you only know what the actual
stored value is. is a single exact value specified by a well-defined
model which may represent either an exact quantity or a range which
could be (and ofen is) far larger than the ranges you are talking about.

The standard actually talks about values that "can be represented
exactly" in section 6.3.1.5 (paragraph 2).

Click to expand...

Yes, but that is simply a number.

Yes, which is all that is actually stored.

You do know what the 'range'
is. The number does not express the range of numbers that will be
stored with that identical value, and simply cannot be told apart.
In isolation what you know is the range of values that can have
been represented thusly.

In isolation you know exactly what value *has* been stored, you just
can't tell whether the programmer intended to store a value that cannot
be represented.

... snip ...

Go forth and implement a FP system.

I've implemented FP arithmetic (sufficient to solve my problem, not a
full system) in assembler already thanks. The data fed in to it was
exact values, the data extracted from it and finally converted to
integers was exact values. The intervening calculations were
approximations which gave a known acceptable limit on the error of the
settings given to the hardware when compared to the theoretical
mathematical values the algorithm I had to implement an approximation of.

Now address the point that your claim is in direct contradiction to the
wording of the standard.

You have yet to address how float can be a subset of double if the items
in the two sets are fundamentally different (i.e. if the range about 1.0
for the closest approximation to 1.0 is different in the two sets).

Then examine when and how it
fails.

It did not fail because I understood exactly what was needed and
carefully worked through it to ensure that the errors and approximations
stayed within acceptable limits.

You will soon appreciate the fundamental truths about such
a system. You will soon see how all values (apart from zero, NaNs,
Infs) are approximations, and how they cannot overlap.

If you think that zero is exact then you are being inconsistent.
Calculations (including the conversion of a floating point number in the
source code) can yield 0 when the "correct" value is non-zero but
smaller than the smallest number representable in the floating point system.

One point
is that taking a value and writing it down does NOT convert it from
an approximation to something exact.

No, nor does it convert something exact in to an approximation.

Note that no one has claimed that any type in C can store exactly all
rationals within any given range.

CBFalconer · May 11, 2009

Keith said:
Chapter and verse, please. Show me where the standard talks
about uncertainties in stored floating-point values (*not* in
floating-point calculations).

FLT_EPSILON, DBL_EPSILON, or LDBL_EPSILON is

"the difference between 1 and the least value greater than 1
that is representable in the given floating point type, b**(1-p)"

(The standard uses a superscript. b is the radix, typically 2;
p is the precision, the number of base-b digits in the significand.)

The definition says nothing about an "uncertainty in the value".

Come on now. "the difference between 1 and the least value greater
than 1" spells out an area not representable in that fp system.
The uncertainty in the value includes at least that, and we haven't
even worried about values less than 1.

.... snip ...

You have your own preconceived notions about what you think a
floating-point number *should* mean, and you've been reading the
standard through that filter.

I see no way of storing a value within the 'range' in a FP value,
and being able to read out that exact value. NOT the FP value.

Let's try another approach. Different programming languages
support different kinds of numeric entities: signed and unsigned
integers, complex numbers, imaginary numbers, fixed-point numbers,
floating-point numbers, rational numbers, arbitrary-precision
numbers, and so forth.

Suppose we want to define a new kind of number-like entity.
Call it a "pseudo-real number", or PRN. We define it to have the
following properties:

A PRN can be stored in a relatively small fixed number of bits,
typically 32, 64, or something similar. A stored PRN corresponds to
a single unique real number; we can define a mathematical formula
for the correspondence. This corresponding real number is always
rational, with a denominator that's a power of 2 (other bases
are possible, but we'll leave that aside for now). Each PRN value
corresponds to some real value, but the vast majority of real values
do not have a corresponding PRN value. Note in particular that a
PRN does not specify a range. It specifies a unique real value.
The set of PRNs corresponds to a sparse discrete subset of the
real numbers. If you object to that idea, remember that I'm not
talking about floating-point numbers, I'm talking about PRNs, an
entity of my own invention. As long as my definition is consistent,
I can define it any way I like.

Alright so far.

Operations on PRNs are defined as follows. Given "x op y" (where
op can be any of +, -, *, /), we determine the mathemetical real
number corresponding to each PRN operand. We apply the specified
operator to these two real values, yielding a mathematical real
result; call it Z. (I'm ignoring division by zero.) (Z will
always be a rational number; it may or may not have a corresponding
PRN value.) Now we choose, as the result of "x op y", a PRN value
whose corresponding real value is *close* to Z. How close it must
be is implementation-defined. (Ideally, we'd like to choose the
PRN value whose real value is as close as possible to Z, but we
don't require this.)

You have left out a condition that applies to floats. The ranges
are all separate. There is no real number that belongs to more
than one range, which range corresponds to an exact value
recorded. This makes it possible to pick a unique value to
represent ANY real.

Of course we can't actually perform calculations with mathematical
real values, but the calculations must be done in a way that yields
the same result as what's described by the above algorithm.

Are you with me so far? Do you agree that this is an internally
consistent description? I'm not asking you to agree that it's
useful or that it corresponds to any physical reality, just that
it's consistent.

Clearly my description of PRNs is incompatible with your description
of floating-point numbers. Each PRN corresponds to a unique real
value; there are no ranges anywhere in the description.

Now here's the big question. How is this description of PRNs
incompatible with the C standard's description of floating-point
numbers? Not with your notions about what floating-point numbers
*should* be, but with what the C standard actually says about them.
If it is incompatible, please quote the exact wording from the
standard that demonstrates the incompatibility.

Yes. See the paragraphs above. By leaving out uniqueness you
abandon the < and > logical operations. I think that is enough
difference. You don't have anything that says that PRN represents
all values between (PRN + deltaH) and (PRN - deltaL). The C
standards description of EPSILON covers that, and forces the
existance of a 'range'.

Another question: Would PRNs as I've described them be useful?
We can't represent all real numbers in a finite amount of storage;
are PRNs an acceptable compromise that allows us to do useful
calculations?

Some only.

CBFalconer · May 11, 2009

Dik T. Winter said:
But you are wrong. The representation errors (if any) are small,
it is the errors due to calculations that grow and can become
significant. Take a course on numerical analysis.

Has nothing to do with it. The 'range' is an error level
fundamental to the representatin. You can make things worse, but
not better. That 'making worse' is subject to analysis.

Beej Jorgensen · May 11, 2009

Keith Thompson said:
Yes.

Ok, I think that's going to have to be good enough for me--I'm running
out of time to keep up. I think it's a step closer to Chuck's
position--maybe not enough for him, though.

It's been a rather instructive thread for me, though!

-Beej

Keith Thompson · May 11, 2009

Beej Jorgensen said:
Ok, I think that's going to have to be good enough for me--I'm running
out of time to keep up. I think it's a step closer to Chuck's
position--maybe not enough for him, though.

Hmm. I'm sorry, but I don't see how that's a step close to Chuck's
position. It's consistent with what I've been saying all along.

It's been a rather instructive thread for me, though!

Me too.

CBFalconer · May 11, 2009

Flash said:
Yes, which is all that is actually stored.

In isolation you know exactly what value *has* been stored, you
just can't tell whether the programmer intended to store a
value that cannot be represented.

No, you don't, not without looking at the programming that did the
store. I think there is no argument that the precise value stored
in a FP object is a real. So is the value stored in the adjacent
object (i.e. the closest one numerically which exhibits a
difference via the '<' and '>' operators). It is also a well known
fact that between any two different reals you can find an infinity
of other real values. So I claim it is highly unlikely that you
can tell the exact value stored in that object. Remember, any
value in that infinity of values will result in the identical
construction of the FP object.

Beej Jorgensen · May 11, 2009

Keith Thompson said:
It's consistent with what I've been saying all along.

Well, at least that's good to hear. I was trying to find something to
placate him that was consistent with what you were saying.

-Beej

Dik T. Winter · May 11, 2009

>
> Ok, I think that's going to have to be good enough for me--I'm running
> out of time to keep up. I think it's a step closer to Chuck's
> position--maybe not enough for him, though.

Not enough at all. Consider the following piece of code (where for
simplicity I assume a > b > 0.0, otherwise it is a bit more cumbersome):
void add(double a, double b, double *c, double *cc) {
c = a + b;
cc = a - c + b;
}
(and assume also that the value of c used in the calculation of cc is the
value actually stored, for some compilers you may need some flags for that.)

Assuming something about the arithmetic on the machine (which IEEE machines
do satisfy), after that operation, a + b == c + cc exactly, while cc is much
less than c.

This is a basic building block to (approximately) double the maxmimum precision
on a processor using build-in arithmetic only. Assuming that the 'c' stored
is an approximation and not assuming that the 'c' used in the calculation of
'cc' is not the value actually stored in 'c' voids the whole argument.

I may note that this technique (due to T.J. Dekker in the sixties) was used
when IBM implemented double precision on the PC/RT.

For this technique to work it is absolutely *not* necessary to know what the
value stored in 'c' actually is, it is only necessary to know that what did
get in also gets out, and some bounds on what did get in (that is: the
properly rounded result of the addition, called "faithful" in the original
article).

(The other basic block is a routine of the format:
void mul(double a, double b, double *c, double *cc)
where after the operation a * b == c + cc exactly. It uses similar basic
operations but is a bit concoluted.)

Dik T. Winter · May 11, 2009

>
> What nonsense is this last? Do you have any knowledge of mathematics?
> What is (in the rationals) the adjacent rational to 1/2?
>
> Yes, they are countable, but that does *not* mean that there is an
> adjacent one in the standard order.

I have been thinking about this and still do not know how CBF came to that
nonsense. I can only think that CBF does not know what 'countable' means.
It only means that in *some* order there is a first, a second and so on, and
that for every natural number there is an element in that order, and that for
every element in that order there is a natural number that denotes it.

It is easy to prove that if a set is countable there is an ordering of the
element such that each element has a predecessor and a successor. But it
is just as easy to prove that there is an ordering of the real numbers such
that each has a predecessor and a successor (and so two adjacent numbers).
And you do not need the axiom of choice for that.

Adjacency is about a specific ordering, countability is not about a specific
ordering.

Dik T. Winter · May 11, 2009

> "Dik T. Winter" wrote: ....
>
> Has nothing to do with it. The 'range' is an error level
> fundamental to the representatin. You can make things worse, but
> not better. That 'making worse' is subject to analysis.

But using the representation makes things better if you want to double the
actual precision. See my other article on how to add two floating-point
numbers to a second pair where the outcome is exact.

The representation error is also subject to analysis. Look at any book by
J.H. Wilkinson.

Spiros Bousbouras · May 11, 2009

Consider the following piece of code (where for
simplicity I assume a > b > 0.0, otherwise it is a bit more cumbersome):
void add(double a, double b, double *c, double *cc) {
c = a + b;
cc = a - c + b;
}

You must mean *c = a + b and *cc = a - *c + b

Nate Eldredge · May 11, 2009

Dik T. Winter said:
But it
is just as easy to prove that there is an ordering of the real numbers such
that each has a predecessor and a successor (and so two adjacent numbers).
And you do not need the axiom of choice for that.

What do you have in mind here? I confess I can't think offhand of such
an ordering.

Spiros Bousbouras · May 11, 2009

What do you have in mind here? I confess I can't think offhand of such
an ordering.

For everything that follows you don't need the axiom of choice.

Let Z be the set of integers. There is a bijection f: R -> ZxR
where x denotes cartesian product. We define an ordering <=' on
R as follows:
For x,y in R , x <=' y iff
( f(x) = (i , x1) f(y) = (j , y1) and
( x1 < y1 or ( x1 == y1 and i <= j ) )
)

For every real x with f(x) = (i , x1) the immediate predecessor
is y where f(y) = (i-1 , x1) and the immediate successor is z
where f(z) = (i+1 , x1)

Spiros Bousbouras · May 11, 2009

I have been thinking about this and still do not know how CBF came to that
nonsense. I can only think that CBF does not know what 'countable' means.
It only means that in *some* order there is a first, a second and so on, and
that for every natural number there is an element in that order, and that for
every element in that order there is a natural number that denotes it.

Or more simply that there is a bijection (i.e. an 1-1 and onto
function) from the naturals to the set. Anyway, from your message
it's not clear whether you've seen it or not but CBF acknowledged
that he made a mistake several days ago. See
< http://groups.google.co.uk/group/comp.lang.c/msg/c9aa8d95feab47fd >

Need Helping adding Square root code to an existing calculator. (Absolute begginer?)	0	Jan 12, 2025
How to alter the program so that when user types z or Z or 0, the program sets both a and b to zero?	0	Oct 10, 2022
Where is my mistake? Why is s equal to minus infinity at some loop iterations?	0	Oct 9, 2022
Comparison of Integer and Pointer (that's supposed to be an Integer). Where did I go wrong?	0	Nov 19, 2022
Structures and chained lists questions :	1	Feb 12, 2011
Rich Text Format (RTF) Document Builder in C++: Code and Features	0	Sep 28, 2025
Runtime Error with __gcd? (floating point exception)	1	Nov 27, 2024
Secure Keyboard v2.0 Modern C++ Virtual Keyboard for Windows (Glassmorphism UI, Clipboard Auto-Clear)	0	Mar 26, 2026

Float comparison

CBFalconer

Spiros Bousbouras

Keith Thompson

Guest

Dik T. Winter

Flash Gordon

Flash Gordon

CBFalconer

CBFalconer

Beej Jorgensen

Keith Thompson

CBFalconer

Beej Jorgensen

Dik T. Winter

Dik T. Winter

Dik T. Winter

Spiros Bousbouras

Nate Eldredge

Spiros Bousbouras

Spiros Bousbouras

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads