Float comparison

F

Flash Gordon

CBFalconer said:
Look back where Keith Thompson introduced nextafter(), and its use
in deducing the EPSILON in effect for any x value.

nextafter does not defined what you mean by your ranges. I am trying to
get an exact definition of them. Saying nextafter() resolves the
confusion is of no help because the standard does not say:
CBF's range is .... nextafter() ....
More to the point, it is important whether limit values are part of the
range or not, and nextafter does not answer questions about what you
mean, only you can do that be actually answering them in a clear and
consistent manner.
 
F

Flash Gordon

CBFalconer said:
No, I never assume the fp value is exact.

I did not say YOU assumed that. Try reading what I wrote.
The fp object value IS
exact, and that simply serves to identify the available 'range' for
that value. If that shows that the errors resulting from those
approximations are negligible to your end result, you are free to
ignore them.

Now actually READ what is written.

Actually, I got my term slightly wrong. The correct term is "backward
error analysis".

This is a technique where one starts with the exact result of a
calculation and works backwards to find the equation which would produce
that exact result.

It is a standard mathematical technique.

If one was dealing with decimal floating point, it could be
To get the result 0.333333333333 (which we find stored in x) what would
we have had to divide 1.0 by? Note that this is analysis of some C
program which happened to produce that result in order to work out how
good or bad that program is. If you replace 0.333333333333 by a range
(which is all you claim you have) then you have to do range arithmetic
(obviously) which does NOT give you the answer to the question.

If yo are going to start rewriting all the maths, numerical analysis and
error analysis text books then please tell us. If not, try assuming that
the experts in the field (not me, but those who devised these
techniques) actually knew what they were doing, and that all the experts
who peer-reviewed their papers knew as well.

Here is one of many papers available on Backward Error Analysis
http://www.physics.arizona.edu/~restrepo/475A/Notes/sourcea/node13.html
 
K

Keith Thompson

Flash Gordon said:
OK, if that is all you can cope with, then I will ask my question
given EXPLICIT EXACT VALUES. First I need values you will accept. So..

Here are two lines of C code.

int main(void)
{
double valone = 1.0;
double valoneandabit = (valone, 100.0);

Was that supposed to be
double valoneandabit = nextafter(valone, 100.0);
 
F

Flash Gordon

William said:
I think that CBF is trying to say that each floating point value x
represents the open interval of reals (x1, x2) where:
x1 = ( x + y1 ) / 2
x2 = ( x + y2 ) / 2
y1 = nextafter( x, x - 1 )
y2 = nextafter( x, x + 1 )

(eg, x1 is Flash's xmin, x2, is Flash's xmax, and y1, x, y2 are 3
consecutive
representable values. x1 and x2 are non-representable reals that lie
halfways between the representable values. This is the only possible
coherent description I can parse from what he's been saying,

That was what I thought. However, I'm not sure.
but
nearly
all of his calculations are wrong.)

I've been ignoring them to a degree because I've seen posts by others
pointing out the errors and trying to get at his meaning rather than
specific values.
However, this means that x1 and x2 are not representable, so he
would be wise to instead use the half-open interval [x1, x2), so
that x1 would be represented by x, and x2 would be represented
by y2.

That itself has issues with C, but I won't worry too much about them
(although I have mentioned them else-thread) until I have the definition
of what he means from the man himself.
However, it seems clear to me that CBF doesn't understand
the fundamental maths involved.

That does seem to be the case.
He keeps saying things like
xmax != ymin but they are 'adjacent' real values, and that any real
number can be represented as a fraction, indicating that he
believes all Reals are Rational.

Indeed. Hence me trying to nail him down.
You're not going to get
a coherent description of his model from him, because he
doesn't understand it himself.

Which is why I've asked for a specific numerical example which is
correct for his specific implementation. I believe (some of) the
problems can be shown from a specific example.
He is probably trolling, and
is being wildly successful.

Failure to understand is not necessarily trolling.
 
B

Ben Bacarisse

[OK, the thread has moved on, but since I asked the question I feel I
should acknowledge the answer and comment on it, but feel free to
leave this unanswered if you think it will be neater to follow only
the other subthread.]
For most normal implementations, the range is x*(1-EPSILON) to
x*(1+EPSILON), with special considerations when x is a power of 2.

You need to say what EPSILON is and what special considerations are
needed. Without knowing these things no one can say if your model is
simple or complex, helpful or unhelpful.
Those values are what I call xmax and xmin. They are NOT included
in the range, and thus the range is actually:

xmax > number > xmin

as a condition where number is described by x IN THAT FP
IMPLEMENTATION. The function nextafter() allows the y (where xmax
is a member of the range of y) to be defined. This y is the next
fp implemented value to x, there are none in between. Note that
xmax, number, xmin above are all real values.

Yes, all the ?max ?min numbers are real and x and y are floating point
numbers. Because you are excluding subnormals, I have to use a
modified nextafter which I'll call nextafter'. Given y ==
nextafter'(x, INFINITY) we have: (for positive x)

xmin < x < xmax and ymin < y < ymax

You are saying, I think, that xmax == ymin because if xmax > ymin
there would be overlap and if xmax < ymin there would be
unrepresentable reals. What actually is xmax (AKA ymin)? Is there a
formula for it? Is it just (x+y)/2?

If so, this model is not that different the one in the C standard.
Constants get represented by the closest floating point number.
Simple calculations like 1.0/3.0 are not required to be that accurate
(though they will be if the implementation uses IEEE FP) so I think
your model is rather strict.

I am still not sure why you also want to turn that simple model round
and say that FP values the represent ranges of reals, but that is
another argument and will maybe be clearer when we know what these
ranges actually are.

Presumably the largest and smallest representable floats represent
either no range or some special range defined just for them.

Alos, I think your model is at odds with the rounding behaviour
specified in the C standard, though to be sure you need to say whether
xmax really is (x+y)/2. We really need to know that.
The magic thing
about xmax is that it CANNOT be specified by x, yet it CAN be
specified by y.

I don't know what you are getting at here. x and y specify each other
so how could xmax not be specified by x since y is? Tell us what xmax
actually is and maybe it will be clear why it can't be specified by x.
About 60 years ago I had a professor who hammered at all us
'students' with these concepts involving numbers, reals, integers,
limits, etc. and insisted we learn methods that handled them all.
I have forgotten a good deal of it. He was better at it than I am.

In 1949 the art of floating point arithmetic was in it's infancy. It
is not immediately obvious that a 60 year old model will still be a
good one. I can't yet see any advantage over the simpler model.
And no, I am not including subnormals, NaNs, INFs, etc.

This is a problem because the model in the C standard does include
then, so you have to say something about them if only to say they are
all special cases like DBL_MAX and -DBL_MAX seem to be.
Zero is a
unique thing in floating implementations, necessary because
multiplication (and division) by zero needs to be recognized. We
can't just use the smallest representable normalized real.

What is a normalised real?

Are you saying there is a set of reals close to zero that can't be
represented? That seems to be a significant flaw. If you are not
saying that, why does zero *not* represent the range of reals that get
stored as zero (most of which are non zero) just like the floating
point value 1.0 represents, in your view, a range of values most of
which are not 1?

(And 1 is also often special. IEC 60559 mandates that 1*x == x/1 ==
x and this is often true even on less well specified floating point
implementations.)
 
C

CBFalconer

Keith said:
CBFalconer said:
Keith Thompson wrote:
... snip ...

The upper point (which I have been calling xmax) is exactly at
1.0+DBL_EPSILON. This is spelled out by the C standard. Everything
else is implementation defined. The lower point (xmin) is probably
at 1.0-DBL_EPSILON/2. Note that these two numbers are not within
the range for 1.0, but do delimit it.
[...]

Thank you for trying to define what this "range" is. You've still
got it wrong, as far as I can tell.

Here's a graph showing five consecutive FP numbers, each of which is
exactly representable as a double; we can use nextafter() to define
their relationships. (View this in a fixed-width font.)

***************
|----|----|----------|----------|
a b c d e
1.0

a is 1.0-DBL_EPSILON
b is 1.0-DBL_EPSILON/2
c is 1.0
d is 1.0+DBL_EPSILON
e is 1.0+DBL_EPSILON*2

You're saying that the range represented by y goes all the way to both
of its neighbors, covering the range marked by asterisks. Unless your
ranges substantially overlap with each other, this doesn't make much
sense.

Forget a and e. You have calculated b and d via DBL_EPSILON.
These mark numbers that cannot be stored in the object c. They ARE
NOT the objects below and above c. You get those objects by
storing b (or d) and then regaining a value from THAT storage.
Call that result y. Then calculate ymin and ymax by using the
multipliers (1-DBL_EPSILON) and (1+DBL_EPSILON). Store ymin and
examine the stored value. It will be 1.0. The picture looks as
below.

range for 1.0 range for y
*************** .................
|----|---------|--------|--------|
b c e d y
1.0

Using your identifiers, e is ymin, in the range for 1.0, and
outside the range for y. It can equally well be found by
calculating y-DBL_EPSILON, or by y*(1-DBL_EPSILON), because of
rounding and truncation. Note that the '|'s mark a position that
does not have any number (apart from those for 1.0 and y). Those
positions are limits.

You should also get
y == 1.0 * (1.0 + DBL_EPSILON)
BUT ONLY AFTER storing the computed value and regaining it from the
storage. Ensure the compiler optimizer doesn't eliminate that last
step.

Note that y-DBL_EPSILON forms ymin, a real value storable in c.
You can't get there by forming "c+EPSILON-EPSILON" because you need
the storage/regain process to form y.
 
C

CBFalconer

Flash said:
Like Keith, I *have* been paying attention. You have yet to point at
anything in the standard that really supports your claim. The reasons
why the few points in the standard you have pointed at do not support
your claim have been explained (but generally ignored by you). Several
sections in the standard explicitly contradict what you say and have
been pointed out to you with explanations of *why* they explicitly
contradict your claim. You seem to be the one not paying attention.

This whole thing is a tricky area. The standard does not keep up
with the trickery. This normally doesn't matter, until you are
worrying about exactly what the fp system can do for you. My
argument is primarily based on maths and physics, and the standard
when it agrees. It doesn't disagree, but it is failing to separate
various views.

I have straightened out various ideas I had at the opening of
this. For example, when we started I didn't know (or care) whether
those values that defined the edges of ranges belonged to the
range. I now KNOW that they do not belong.

I wasn't being snide with you. I simply assumed you hadn't been
reading everything. I have the (probably erroneous) idea that
everything I write is crystal clear and now obvious to all. As I
have said before, I make a lousy teacher.
 
C

CBFalconer

Flash said:
.... snip ...

If one was dealing with decimal floating point, it could be
To get the result 0.333333333333 (which we find stored in x) what
would we have had to divide 1.0 by? Note that this is analysis of
some C program which happened to produce that result in order to
work out how good or bad that program is. If you replace
0.333333333333 by a range (which is all you claim you have) then
you have to do range arithmetic (obviously) which does NOT give
you the answer to the question.

No. You have the value 0.333333333333. You assume this is the
result of dividing 1.0 by something, and you use the limits of the
'range' of the stored value to find the range of possible
divisors. Try it, and you will find that is a very narrow range.
Then store those computed possibilities and extract them again,
thus finding out the fp-object-values that will do those things for
you.

If you wish set the divisor to 3, and mess with the numerator to
find a range of possibilities. Don't play with both - that leads
to madness :).
 
C

CBFalconer

Flash said:
.... snip ...

nextafter does not defined what you mean by your ranges. I am
trying to get an exact definition of them. Saying nextafter()
resolves the confusion is of no help because the standard does
not say:

No, but it provides another means of discovering the 'range'. What
I mean by that 'range' of x is those real number values that, when
stored in a fp-object, will result in a stored value exactly equal
to x. I am only using one fp-system. The existence of EPSILON is
also ephemeral. The critical thing is the variation in the
significand of the fp value, provided the exponent remains
unchanged (normalized). You can see this easily in hex dumps of
the fp-object.

The point of the 'range' is that it defines the inherent error
levels due to the storage format.
 
C

CBFalconer

William said:
.... snip ...

However, this means that x1 and x2 are not representable, so he
would be wise to instead use the half-open interval [x1, x2), so
that x1 would be represented by x, and x2 would be represented
by y2. However, it seems clear to me that CBF doesn't understand
the fundamental maths involved. He keeps saying things like
xmax != ymin but they are 'adjacent' real values, and that any real
number can be represented as a fraction, indicating that he
believes all Reals are Rational. You're not going to get
a coherent description of his model from him, because he
doesn't understand it himself. He is probably trolling, and
is being wildly successful.

I didn't say that, as far as I can remember. But trying to
simplify one area to make a point in another is not trivial, and I
am not good at it.

For example, while xmax != ymin, if x < y then xmax > ymin. This
is a consequence of the fact that storing xmax in a fpobject yields
y, and storing ymin in an object yields x. xmax and ymin are not
fixed numbers, they are numbers that force the storage to yield y
and x respectively.
 
C

CBFalconer

Keith said:
.... snip ...


What does that mean? EPSILON *is* DBL_EPSILON; you said so
yourself. There is no "portion of the EPSILON above DBL_EPSILON".


It seems like you're saying that the expression x*(1+EPSILON) is
to be considered a C floating-point expression, not a real number.
But the whole point is to define the real range represented by a
given floating-point number, which includes numbers that cannot be
exactly represented as FP numbers.

It is a real number. But the moment we feed it into the fp
processor, rounding effects take place, and the effect is to make
EPSILON constant until x reaches a new power of 2.
 
C

CBFalconer

Ben said:
.... snip ...


What is a normalised real?

Are you saying there is a set of reals close to zero that can't be
represented? That seems to be a significant flaw. If you are not
saying that, why does zero *not* represent the range of reals that get
stored as zero (most of which are non zero) just like the floating
point value 1.0 represents, in your view, a range of values most of
which are not 1?

(And 1 is also often special. IEC 60559 mandates that 1*x == x/1 ==
x and this is often true even on less well specified floating point
implementations.)

This is one section of your message. Things are too complex to
handle in a single message.

fp values are normally normalized, so that the left hand bit of the
significand is a 1. It can then be discarded and replaced by the
sign bit. This allows for the maximum number of significant bits
possible in the remainder of the significand.

However if the number is too small it can't be normalized, because
that would require a more negative exponent than the exponent space
can handle. If we accept denormalized values we can intepret them,
but that means a method of denoting the denormalization. Even
then, once the remaining bits move off to the right, they will
eventually disappear off the right of the significand. These are
definitely unrepresentable near zeroes.

I hope that is clear.
 
C

CBFalconer

Ben said:
.... snip ...


In 1949 the art of floating point arithmetic was in it's infancy.
It is not immediately obvious that a 60 year old model will still be
a good one. I can't yet see any advantage over the simpler model.

Another sub-section of your original message. Back then nobody was
studying fp-systems. However, we were studying mathematics.
fp-systems are an attempt to make a usable system out of minimum
supplies.

The mathematics prevail. fp-systems are good in that they are
normally consistent with reality. This whole argument/discussion
is about the edges of that consistency.
 
C

CBFalconer

Ben said:
.... snip ...

[OK, the thread has moved on, but since I asked the question I
feel I should acknowledge the answer and comment on it, but feel
free to leave this unanswered if you think it will be neater to
follow only the other subthread.]
For most normal implementations, the range is x*(1-EPSILON) to
x*(1+EPSILON), with special considerations when x is a power of 2.

You need to say what EPSILON is and what special considerations
are needed. Without knowing these things no one can say if your
model is simple or complex, helpful or unhelpful.

I have said before that EPSILON is the one appropriate to the fp
system. I.e. DBL_EPSILON for doubles, FLT_EPSILON for floats,
LDBL_EPSILON for long doubles.
Yes, all the ?max ?min numbers are real and x and y are floating
point numbers. Because you are excluding subnormals, I have to use
a modified nextafter which I'll call nextafter'. Given y ==
nextafter'(x, INFINITY) we have: (for positive x)

nonsense. Subnormals don't appear until you reach extremely small
values.
xmin < x < xmax and ymin < y < ymax

You are saying, I think, that xmax == ymin because if xmax > ymin
there would be overlap and if xmax < ymin there would be
unrepresentable reals. What actually is xmax (AKA ymin)? Is there a
formula for it? Is it just (x+y)/2?

No, I didn't say that. xmax is a value that, when stored in a
fp-object, will produce a value larger than x when read back from
that fp-object. If that value is y, it can produce a ymin which,
when stored in a fp-object, will produce a value smaller than y.
That will be x. Note that ymin is NOT equal to xmax. In fact,
ymin < xmax (using real arithmetic).
If so, this model is not that different the one in the C standard.
Constants get represented by the closest floating point number.
Simple calculations like 1.0/3.0 are not required to be that accurate
(though they will be if the implementation uses IEEE FP) so I think
your model is rather strict.

They can't be exact in a binary system.
I am still not sure why you also want to turn that simple model
round and say that FP values the represent ranges of reals, but
that is another argument and will maybe be clearer when we know
what these ranges actually are.

Presumably the largest and smallest representable floats represent
either no range or some special range defined just for them.

Everything falls apart when you get outside the range the system
can handle.
Alos, I think your model is at odds with the rounding behaviour
specified in the C standard, though to be sure you need to say
whether xmax really is (x+y)/2. We really need to know that.

It isn't. xmax is a real value which, when stored, will appear as
a y.
I don't know what you are getting at here. x and y specify each other
so how could xmax not be specified by x since y is? Tell us what xmax
actually is and maybe it will be clear why it can't be specified by x.

Above I used specified loosely, to mean part of the 'range' of x
(or y). y can be computed from x, by calculating and storing
xmax. x can be computed from y, by calculating and storing ymin.
 
C

CBFalconer

Flash said:
Remember that *I* am the one who introduced x, y, xmin, ymin, xmax and
ymax and specified EXACTLY what they are. So here, again, is what they
are, and if you want to start talking about something else then please
actually explicitly STATE that you are ignoring the question being asked
ans talking about something else instead.

----------
Given two numbers x and y whose values are "touching ranges" with y
being the next greater range than x then in your model we have:

x represents xmin to xmax
y represents ymin to ymax
x < y
xmax equal to ymin (there can be no intervening real numbers since if
there were any there would be an infinite number)
-----------

Not quite. x represents >xmin to <xmax
y represents >ymin to <ymax
x < y
xmax > ymin

and the 'represents' means what happens when you store the value
and then read it back from the fp-object. x can't represent xmax,
because storing xmax produces y, which is > x.

xmax and xmin have the sole purposes of defining the 'range' for x,
when manipulated in the fp-system under discussion.

I admit I have diddled about with the specifications as it became
clearer what I was talking about.
Now, what I meant was that x and y are consecutive floating point
numbers, for example they could be doubles 1.0 and 1.0+DBL_EPSILON
as I was assuming that your ranges did not overlap. Another post
be you now suggests they do, which makes them of even less use for
anything serious, since then your model does not tell you what
value will be stored.

No, they don't overlap. Remember that the store/retrieve cycle is
necessary to make the system exhibit the characteristics. In fact,
to use the fp-system.
No, they are calculated values that will force representation of
the 'adjacent' fp-object when stored. Nothing more.
Well, you have definitely changed the terminology from what I
introduced when I asked the question.

I haven't kept track. It is quite possible.
That is EXPLICITLY not the terminology I used to ask my question,
since xmax was related to x and ymin to y. Again, if you are
going to ignore the terminology someone uses to define there
question you are not answering the question.

xmax is calculated from x to generate y. ymin is calculated from y
to generate x.
Again, why are you trying to tell me what question I was asking
rather than trying to answer the question.

Because we have to keep track of what we are talking about. Are
you claiming there is nothing confusing about this discussion?
i was ASKING about what your ranges are! I was TRYING to get an
accurate definition of them!

I thought I was answering that.
OK, if that is all you can cope with, then I will ask my question given
EXPLICIT EXACT VALUES. First I need values you will accept. So..

Here are two lines of C code.

int main(void)
{
double valone = 1.0;
double valoneandabit = (valone, 100.0);
/* rest of program irrelevant to discussion */
return 0;
}

Now, I know you have access to a C compiler. If on YOUR SPECIFIC
compiler you were to compile and run the above, what are the EXACT
numerical ranges for valone and valoneandabit. I.e. I am asking for a
concrete numerical answer.

I don't know what you are getting at. The answer is 100.0. If by
ranges you mean what I defined as ranges, for valone the range is

1.0+DBL_EPSILON .. 1.0-DBL_EPSILON/2 (not including ends)

for valoneandabit

100.0*(1+DBL_EPSILON) .. 100.0*(1-DBL_EPSILON/2)

again not including the end points. These are running into the
anomaly (for most systems) where the basic value is exactly a power
of 2 and the epsilons change.
 
C

CBFalconer

Keith said:
.... snip ...

No, that's not what I'm talking about. You *can't* stuff the value
1/3 into a float (unless FLT_RADIX is a multiple of 3).

I disagree. Example:

double x = 1.0;
double y = 3.0;
double z;

z := x/y;

We handed the fp system a description of the real 1/3. Now we are
worrying about what we get back.
So let's talk about the behavior of

double x = 1.0/3.0;

Do you believe that the real value one-third occurs in a C program
that contains that declaration? (Hint: the correct answer is No.)

Yes. At the point where something holds 1.0, something else holds
3.0, and the system has been told to divide. I can see that point
in your statement. If you had written:

double x = 0.333333333333333;
or
double x = 0.333333333333334;

then I wouldn't see that point. But the results (in x) would be
identical.
 
C

CBFalconer

Keith said:
.... snip ...

Please stop using the term EPSILON unless you're willing to define
it and stick to that definition. Use nextafter() if you like.

I am always using it as the value supplied by limits.h (or is it
float) for the float variety in use. The only variation is when
crossing the magical power of 2 in the x value, when the negative
value (not specified by the std) is halved. Negative means
subtracted from x, rather than added.

I don't think I have ever changed this usage. It originates in the
meaning of the least significant bit in the significand.

As you showed it can be computed from nextafter. But it is NOT the
same thing.
 
B

Ben Bacarisse

CBFalconer said:
This is one section of your message. Things are too complex to
handle in a single message.

fp values are normally normalized, so that the left hand bit of the
significand is a 1. It can then be discarded and replaced by the
sign bit. This allows for the maximum number of significant bits
possible in the remainder of the significand.

However if the number is too small it can't be normalized, because
that would require a more negative exponent than the exponent space
can handle. If we accept denormalized values we can intepret them,
but that means a method of denoting the denormalization. Even
then, once the remaining bits move off to the right, they will
eventually disappear off the right of the significand. These are
definitely unrepresentable near zeroes.

I hope that is clear.

Yes, but mainly because I knew that already. I was hoping you'd
clarify your range model in regard to zero rather than tell be about
normalisation. I just wanted to know if zero also represented a
range and, if so, what range.
 
K

Keith Thompson

CBFalconer said:
No, but it provides another means of discovering the 'range'. What
I mean by that 'range' of x is those real number values that, when
stored in a fp-object, will result in a stored value exactly equal
to x.

And how exactly does one store a real number value in a floating-point
object? Can you demonstrate with some C code? I assert that only a
floating-point value can be stored in a floating-point object (where
floating-point values are a subset of real values).
I am only using one fp-system. The existence of EPSILON is
also ephemeral.

That last sentence makes no sense.

[snip]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,434
Messages
2,571,685
Members
48,796
Latest member
Greg L.

Latest Threads

Top