Switch for floating !!

D

Dik T. Winter

>
> Bingo!

Indeed! I have a good example where comparison for equality (not near
equality) in floating-point makes sense. In the calculation, using
Newtons method, of the square root of a floating-point number. For that
I did analyze the situation when arithmetic was truncating in the direction
of 0. In the range [1/4, 1] in general the algorithm would converge to a
single value (i.e. further iterates would give exactly the same value). So
a good stopping criterium would be exact equality of two results after
evaluation. In that range there were only *two* exceptions, and they can
easily be singled out before the calculation (again using exact equality).
Although C does not guarantee that it works, the underlying floating-point
arithmetic does guarantee it.

Long ago I have written a routine to compute the arcsine of an argument.
The routine heavily relied on exact comparisons, if they were not the result
would have been totally wrong for some inputs.

The conclusion is that you can do exact comparison on floating-point, but
you should know what you are doing. And that is in the realm of numerical
mathematics.
 
C

CBFalconer

Kaz said:
Yes. It's an approximation of a range of real numbers near 2.0.

I believe that for EPS = (pick one of) FLT_, DBL_, LDBL_EPSILON,
range will be:

2.0 + EPS > value > 2.0 - EPS
 
K

Keith Thompson

Golden California Girls said:
And you would be wrong -- AGAIN -- because it is base two math not
base ten math.

Can you explain what base ten math has to do with this? (Hint: As far
as I can tell, nothing.)

DBL_EPSILON, for example, is the difference between 1.0 and the least
value greater than 1.0 that's representable as a double. The next
representable number above 2.0 is very likely to be 2.0 + 2*DBL_EPSILON.
So for any mathematical real number between 2.0 and 2.0+DBL_EPSILON,
the nearest representable double value is 2.0.

On the other hand, the next representable number below 2.0 is likely
to be 2.0 - DBL_EPSILON, not 2.0 - 2*DBL_EPSILON. So the double value
2.0 can be seen as an approximation of any number in the range
2.0-DBL_EPSILON/2 .. 2.0+DBL_EPSILON.

Of course, that's only if you assume that a double value represents
any real value whose closest representation is that double value, and
as I've argued that's really a matter of what the code is intended to
do.

BTW, I've thought of another example where floating-point values can
be considered to be exact. If I recall correctly, stock prices used
to be quoted in eighths of a dollar, so a given stock might have a
current price of, say, $13.875). Any multiple of 1.0/8.0 within a
wide range can be represented exactly in double, assuming a typical
binary floating-point implementation. (The same is true for decimal
floating-point as well.) So if a stock program has a stored value of
13.875, that represents exactly $13.875, not an approximation of some
unspecified value between 13.875-8*DBL_EPSILON and
13.875+8*DBL_EPSILON. (IEEE 64-bit floating-point, which is typically
used to imlement double, gives exact values for multiples of 1/8 up to
about one quadrillion or so.)
 
P

Phil Carmody

Bartc said:
So an integer 2 would be an approximation of the real numbers between
1.5 and 2.5?

And 2*2 would be an approximation of real numbers between 2.25 and 6.25?
Yikes! 2<<2 might be an approximation of some numbers that have half-set
bits! Where will this madness end?

Phil
 
P

Phil Carmody

Golden California Girls said:
And you would be wrong -- AGAIN -- because it is base two math not base ten math.

You're just as wrong. The issue is completely orthogonal to the choice
of radix. Go read some Goldberg or Kahan (or Knuth, or ...).

Phil
 
K

Keith Thompson

Golden California Girls said:
As you aren't CBF you aren't making the mistake I see in this NG all
the time with the epsilon value. That is scaling it base 10 and not
base 2.

Whatever mistake you're talking about, neither I nor CBF made it.
You're the only one talking about base ten; the range CBF specified
(2.0 + EPS > value > 2.0 - EPS) has nothin to do with base ten.
DBL_EPSILON, for example, is the difference between 1.0 and the least
value greater than 1.0 that's representable as a double. The next
representable number above 2.0 is very likely to be 2.0 + 2*DBL_EPSILON.
So for any mathematical real number between 2.0 and 2.0+DBL_EPSILON,
the nearest representable double value is 2.0.

You are correct the next greater representable value is 2.0 + 2
times epsilon. [you knew CBF's one times epsilon was wrong]
(dropping double because it doesn't matter if it is float, double or
long double as long as you use the corresponding epsilon)

I merely used double as a concrete example.
It isn't
likely, it is required.

Required by what? The IEC 60559 floating-point standard (commonly
called IEEE) probably does require it, but the C standard doesn't.
At least one IBM mainframe floating-point format uses exponents
representing powers of 16, so the granularity would jump by a factor
of 16 rather than 2; DBL_EPSILON would be the difference between
any number and the next larger representable number over the range
1.0 <= X < 16.0 (I think).
You are incorrect in your rounding
statement about numbers between 2.0 and 2.0 + 2*epsilon. Rounding
down isn't right, round to nearest is.

Read what I wrote again; I assumed round-to-nearest; I just didn't
mention values less than 2.0. But again, the C standard doesn't
require round-to-nearest; for an implementation that doesn't claim IEC
60559 conformance, C places very few constraints on floating-point
behavior.
This is also true, except you can not take the value 2.0 and
subtract epsilon and arrive at the next smaller representable value.
2.0 - epsilon will equal 2.0. This is so because of the base 2
exponent used in floating point values. In changing epsilon so that
it has the same base two exponent so the mantissa's can be
subtracted, you will make the mantissa of epsilon underflow and be
zero.

Oh? Then why does this program:

#include <stdio.h>
#include <float.h>
int main(void)
{
double below = 2.0 - DBL_EPSILON;
double two = 2.0;
double above = 2.0 + DBL_EPSILON * 2;

printf("below = %.30f\n", below);
printf("two = %.30f\n", two);
printf("above = %.30f\n", above);
return 0;
}

produce this output?

below = 1.999999999999999777955395074969
two = 2.000000000000000000000000000000
above = 2.000000000000000444089209850063
Extra credit, how do you arrive with the next lower representable
value below 2.0 knowing epsilon?

If you don't know what floating-point model is being used, you can't;
knowing epsilon is insufficient given what the C standard guarantees
If you happen to know the system uses IEC 60559 floating-point, just
subtract epsilon.

DBL_EPSILON is the difference between 1.0 and the next higher
representable number. It's the granularity of the representation for
the value 1.0 and a range of values above 1.0. Assuming a binary
exponent, the exponent, and therefore the granularity of the mantissa,
will be the same for 1.0 <= X < 2.0. When we reach 2.0, the exponent
increases by 1 and the granularity becomes coarser by a factor of 2.
The same thing happens when we reach 4.0, and each power of 2.0 after
that. So the consecutive representable numbers are:

...
1.0-EPS/2
1.0-EPS
1.0
1.0+EPS
1.0+EPS*2
...
2.0-EPS*2
2.0-EPS
2.0
2.0+EPS*2
2.0+EPS*4
...
4.0-EPS*4
4.0-EPS*2
4.0-EPS
4.0-EPS*4
4.0-EPS*8
...
 
I

Ike Naar

This is also true, except you can not take the value 2.0 and subtract epsilon
and arrive at the next smaller representable value. 2.0 - epsilon will equal
2.0. This is so because of the base 2 exponent used in floating point values.
In changing epsilon so that it has the same base two exponent so the mantissa's
can be subtracted, you will make the mantissa of epsilon underflow and be zero.

On this old x86, with gcc 2.95.3, 2.0f - FLT_EPSILON _does_ yield
the next smaller representable float value below 2.0f .

expression value representation

1.0f-FLT_EPSILON 0.9999998808 0x3f7ffffe
1.0f 1.0000000000 0x3f800000
1.0f+FLT_EPSILON 1.0000001192 0x3f800001

2.0f-2*FLT_EPSILON 1.9999997616 0x3ffffffe
2.0f-FLT_EPSILON 1.9999998808 0x3fffffff
2.0f 2.0000000000 0x40000000
2.0f+FLT_EPSILON 2.0000000000 0x40000000
2.0f+2*FLT_EPSILON 2.0000002384 0x40000001

Exactly the same results appear on a Sun SPARC, with Sun CC.
Extra credit, how do you arrive with the next lower representable value below
2.0 knowing epsilon?

Thanks for the extra credit.
 
K

Keith Thompson

Golden California Girls said:
I do see that C doesn't require much of anything for floating math.


It does seem to permit useless floating point.

It permits a wide variety of floating-point implementations, because
there are a wide variety of floating-point implementations in the real
world, and declaring most such implementations non-conforming likely
would have killed the C language.

Now C *could* have defined a set of conditions looser than what's in
IEC 60559 while still guaranteeing that, for example, 1.0+1.0==2.0 and
being consistent with all the real-world implementations. (The Ada
standard does something like this.) But the authors chose instead
just to rely on the implementations to do the right thing -- which, in
most cases, they do.
What is the value of FLT_EVAL_METHOD?

2, which means to "evaluate all operations and constants to the range
and precision of the long double type", but I don't think that's
relevant.
Or put another way have you
been lied to about the value of DBL_EPSILON for computations.

No.

I just modified the above program to use long double rather than
double and got similar results, so the result don't depend on using a
wider type for calculations.

I computed the value of 2.0 - DBL_EPSILON. Both operands are exactly
representable values of type double. I stored the result, whose
mathematical value is also exactly representable in type double, in an
object of type double. Assuming a sane floating-point implementation,
or more specifically an IEC 60559 floating-point representation, I can
expect the exact mathematical result to be stored in the object.

[...]
Lets take a 2 bit mantissa and count up. Our value of epsilon is then 1/2.

decimal mantissa exponent
0.5 10 -1
0.75 11 -1
1.0 10 0
1.5 11 0
2.0 10 1
3.0 11 1
4.0 10 10
6.0 11 10

Now lets take 2.0 and subtract our epsilon of 0.5

2.0 10 1
0.5 10 -1
but we can't subtract the mantissa's until the exponents match.
we shift and get
2.0 10 1
0.5 00 1
now we perform the subtraction and get
2.0 10 1

2.0, 0.5, and 1.5 are all exactly representable in this floating-point
representation, so computing 2.0 - 0.5 should give me 1.5 in any sane
implementation. If you make assumptions about how the subtraction is
implemented (shifting mantissas and tweaking exponents, keeping
everything within your representation at all times), then you might
conclude that the calculation yields 2.0. Conclusion: your
assumptions are incorrect.

FLT_EVAL_METHOD refers to results of C floating-point operations such
as ordinary subtraction. What you're talking about is intermediate
results that occur during the execution of that subtraction operation,
probably within the machine's floating-point unit. As I understand
it, FPUs do have to use extra bits internally to get correct results.

The only thing that's visible to a C program is that if you compute
x = y - z;
where y, z, and the mathematical result of the subtraction are all
exactly representable in the given type, you'll get the right result
stored in x. (Again, the C standard doesn't require this, but IEC
60559 does.) What happens *within* the subtraction operation,
whatever magic the FPU performs to get the right answer, is outside
the scope of the language and, I suggest, of this discussion; all
that's relevant is that subtraction *works*.
I thought you said you could take 2.0 and subtract epsilon and get the next
lowest number? The C standard does not.

Right, I should have stated more clearly that I was assuming an IEC
60559 implementation.

[...]
AH, the fun of the C un-standard!

I don't know what that's supposed to mean. The C standard leaves a
lot of floating-point issues unspecified because it has to, because
there are so many different floating-point implementations in the real
world. But C allows an implementation to claim IEC 60559 conformance,
and for such an implementation the results are reasonably well
defined.
BTW to get the next lower representable value below 2.0 on a
implementation with FLT_EVAL_METHOD 0, subtract 2.0 * epsilon from
2.0 and then add epsilon, assuming binary mantissa and exponent.

Or just subtract epsilon from 2.0 and let the FPU (or whatever the
system uses for floating-point computations) worry about using however
many extra bits it needs to get the write answer. I challenge you to
show me a system where that doesn't work, for any C floating-point
type you choose. (C doesn't forbid such a system, but I doubt that
you can find any in existence.)
 
K

Keith Thompson

Golden California Girls said:
Keith said:
Golden California Girls said:
Keith Thompson wrote:
What is the value of FLT_EVAL_METHOD?

2, which means to "evaluate all operations and constants to the range
and precision of the long double type", but I don't think that's
relevant.

[snip]
FLT_EVAL_METHOD refers to results of C floating-point operations such
as ordinary subtraction. What you're talking about is intermediate
results that occur during the execution of that subtraction operation,
probably within the machine's floating-point unit. As I understand
it, FPUs do have to use extra bits internally to get correct results.

I believe that is why the C standard has a FLT_EVAL_METHOD "-1
indeterminable" Unless of course there is some confusion over the
meaning of "all" and "operations" in "all operations."

I think there is some confusion there, and I don't think it's mine.
If an operation such as the subtraction in (2.0 - DBL_EPSILON) needs
to use extra bits *within the FPU* to produce a correct double result,
that's just part of the way the double operation is performed; it's
not performing the operation in type long double.

In other words, whatever goes on inside the implementation of "-"
isn't an "operation" as far as the standard is concerned; only the
entire "-" is an "operation".

But even if FLT_EVAL_METHOD == 2, I think you'll find that
2.0L - LDBL_EPSILON < 2.0L
though the C standard doesn't require this.

Here's another program:

#include <stdio.h>
#include <float.h>
int main(void)
{
long double below = 2.0L - LDBL_EPSILON;
long double two = 2.0L;
printf("FLT_EVAL_METHOD = %d\n", FLT_EVAL_METHOD);
if (below < two && two - below == LDBL_EPSILON) {
puts("Yes");
}
else {
puts("No");
}
return 0;
}

All calculations are done in long double, so the value of
FLT_EVAL_METHOD is irrelevant (unless perhaps it's negative).
Why don't you do the opposite and prove there aren't any that do?
To make your search faster, start with chips that don't have FPU's
and the floating point is done in software.

I'm not sure that I have access to any such systems, and if I do, I'm
not going to spend the time trying to prove a negative.

You're claiming that the trick of subtracting 2.0*epsilon from 2.0 and
then adding epsilon is necessary. I claim that it isn't, at least on
typical implementations with sane floating-point support. (The C
standard doesn't require sanity in the sense that I'm using the word;
IEC 60559 requires sanity and more.) There may well be systems with,
for example, software floating-point where this fails -- but it's
quite likely that the authors of software FP implementations have gone
to the trouble of getting this right. And I believe (though I haven't
checked) that IEC 60559 *requires* the program to print "Yes".

So here's my new challenge. Find a C implementation where the above
program compiles and prints "No". You might have to specify C99
compliance to make FLT_EVAL_METHOD visible; feel free to comment out
that line if you need to.
 
R

Richard Bos

Golden California Girls said:
Yes, we read the IEEE standard for floating point, understand base 2 math, took
a class in numeric analysis and understand about rational and irrational numbers.

IIRC the question being posited was if it was possible to have a case construct
that used a floating point switch.

No, it wasn't. The underlying question was not whether it was possible -
it's _possible_ to have a case statements which uses strings - but
whether it is worth the trouble.
No one is saying that using equality comparisons with floating point numbers
that are results of calculations is a good idea.

Actually, yes. By asking for a case statement using FPNs, some people
are doing exactly that.

Richard
 
I

Ike Naar

What is the value of FLT_EVAL_METHOD on your system?
It doesn't work to do long double math with FLT_EPSILON.

On the x86 (OS is OpenBSD 3.3), FLT_EVAL_METHOD is not defined.
On the Sun, FLT_EVAL_METHOD is 0 .
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,278
Latest member
BuzzDefenderpro

Latest Threads

Top