floating point over-/under-flow

Mark L Pappin · Dec 2, 2004

<puts on Compiler Vendor hat>

I've recently discovered that [at least two of] our compilers don't
make any attempt to handle floating point overflow in add/subtract/
multiply/divide, with the result that programmers who don't range-
check their data can e.g. multiply two very tiny values and end up
with a very large one. This is (AFAICT) quite fine according to the
Standard but is certainly a QoI issue and I'd like to raise ours.

I believe that it's the programmer's job to know what possible ranges
his values could take, and to check they're sane before performing
operations upon them. If this is done, then overflow and underflow
can never happen. Rarely is this done, of course, so the compromise
suggested by TPTB is to offer a safety net in the form of adding
inexpensive checks to the generated code which can propagate status
back to the user.

The approach I'm taking is to detect overflow or underflow and set a
flag (in the implementation namespace) as appropriate, but leave the
[invalid] value in the result. This way, if it really is important to
the user to eke that last bit (pun not intended) of information out of
the operation then the way is clear for them to do so. For example, a
multiplication which overflows will end up with a value pow(2,256)
smaller than the correct [unrepresentable] value, thanks to wraparound
of our 8-bit signed binary exponent, but the mantissa will still have
full precision - some recovery may be possible.

An alternative approach (since we use an IEEE-754-compatible
representation already) is to go the whole hog and implement
Infinities, NaNs, and Denormals. At this time we don't want to do this
because it appears to be a significant cost (see below) for not much
return. In any case, how to deal with those types of value in C is
still not defined, so any solution of this type would still be Not C.

I'm interested to hear the c.l.c set of viewpoints on silent vs. noisy
underflow and overflow - in this case, coercing values to 0 or Inf,
vs. leaving them as is and setting a user-checkable flag.
('Very-noisy' would be throwing an exception of some kind, also
undefined by the Standards.) What practices to you use to avoid
hitting overflow and underflow? Is our plan to let the dodgy value
continue to exist of any conceivable value?

For those who've read this far: we make cross-compilers for [often
tiny; <256 bytes of RAM is common] CPUs commonly used in embedded
systems, and we do our damnedest to make them meet C90 and are inching
toward C99 in places. Floating point has become popular, some
deficiencies have been found, and Muggins put his hand up to fix them.

mlp

Tim Prince · Dec 2, 2004

Mark L Pappin said:
<puts on Compiler Vendor hat>

I've recently discovered that [at least two of] our compilers don't
make any attempt to handle floating point overflow in add/subtract/
multiply/divide, with the result that programmers who don't range-
check their data can e.g. multiply two very tiny values and end up
with a very large one. .....

The approach I'm taking is to detect overflow or underflow and set a
flag (in the implementation namespace) as appropriate, but leave the
[invalid] value in the result. This way, if it really is important to
the user to eke that last bit (pun not intended) of information out of
the operation then the way is clear for them to do so. For example, a
multiplication which overflows will end up with a value pow(2,256)
smaller than the correct [unrepresentable] value, thanks to wraparound
of our 8-bit signed binary exponent, but the mantissa will still have
full precision - some recovery may be possible.
....
I'm interested to hear the c.l.c set of viewpoints on silent vs. noisy
underflow and overflow - in this case, coercing values to 0 or Inf,
vs. leaving them as is and setting a user-checkable flag.
('Very-noisy' would be throwing an exception of some kind, also
undefined by the Standards.) What practices to you use to avoid
hitting overflow and underflow? Is our plan to let the dodgy value
continue to exist of any conceivable value?

Long, long ago, before C, the usual practice for such hardware was to throw
an exception. If the application wished to continue, it had to provide an
exception handler. Thus, the application would make the choice whether to
set 0 or Inf and continue silently, or issue a diagnostic, or take a branch
to process the out of range. Continuing without fixing it up seems unlikely
to be useful. When IEEE-754 came in, the argument was this exception
handling is an unnecessary burden.

Kevin Bracey · Dec 2, 2004

In message <[email protected]>

Mark L Pappin said:
An alternative approach (since we use an IEEE-754-compatible
representation already) is to go the whole hog and implement
Infinities, NaNs, and Denormals. At this time we don't want to do this
because it appears to be a significant cost (see below) for not much
return. In any case, how to deal with those types of value in C is
still not defined, so any solution of this type would still be Not C.

Actually, C99 does fully define how to deal with all those values for
IEEE754, in Annex F. Only "signalling NaNs" aren't covered.

The main problem with implementation of NaNs etc is the extra cost
of handling them as INPUTs to operations. All your operations will need
to be able to detect them and propagate them accordingly, otherwise there's
no point generating them in the first place. This can be expensive.

I'd be tempted to say that the basic operations should signal SIGFPE in the
event of an error (ie enable Invalid Operation, Divide By Zero and Overflow
traps, in IEEE754 terminology), and kill the program. This means NaNs and
Infinities can't get into the system. I've used such systems quite
extensively.

If you intend to just set a flag and continue execution, then I feel it's
important that some sort of indicator value be left in the result, rather
than just leaving it wrapped. If you can't manage "NaN" or "Inf", then at the
very least it should be "HUGE_VAL", as per the <math.h> functions.

As for denormals, flushing all tiny results to zero is a reasonable thing to
do if you can't implement denormals properly. Kahan wouldn't approve, as it
leaves a massive hole in the number line, but it's far better than your
current "tiny * tiny -> huge".

By the way, the overflow flags you're describing are defined in <fenv.h> in
C99 - there's no need for you to invent your own, I believe.

You may want to consider the FENV #pragmas. I'm not sure that FENV_ACCESS
will give you any leeway on shrinking your code size, but you should have a
look at that area. Otherwise some sort of compiler option or private #pragma
to suppress the extra code bloat of proper range checking will probably be in
order in your situation.

Floating point has become popular, some deficiencies have been found, and
Muggins put his hand up to fix them.

Well, this will be an important life lesson for you.

Gordon Burditt · Dec 2, 2004

I've recently discovered that [at least two of] our compilers don't

make any attempt to handle floating point overflow in add/subtract/
multiply/divide, with the result that programmers who don't range-
check their data can e.g. multiply two very tiny values and end up
with a very large one. This is (AFAICT) quite fine according to the
Standard but is certainly a QoI issue and I'd like to raise ours.

What floating point hardware / software allows you to multiply two
small floating-point numbers and end up with a huge one? The
implementations I know of either end up with zero or a small number.
IEEE-754 certainly requires that. You also can't multiply two large
numbers and end up with a small one on any hardware I know of.
Overflow sticks at either Inf or the largest possible value, or
traps. Now you can still end up with an answer that is a large
number of orders of magnitude wrong mathematically, but it won't
appear to be a small, reasonable result when it overflows.

I consider your hardware / software emulation broken, although ANSI
C doesn't.

I believe that it's the programmer's job to know what possible ranges
his values could take, and to check they're sane before performing
operations upon them.

The problem here is that individually reasonable values may produce
a collectively unreasonable result. For example, linear interpolation
using two points that are nearly identical. Roundoff error in
decimal conversion can also change a reasonable situation to an
unreasonable one (e.g. change division by zero, which you've handled,
to division by what was supposed to be zero but for roundoff error,
which you didn't because in a corner case it was more than expected).

If this is done, then overflow and underflow
can never happen. Rarely is this done, of course, so the compromise
suggested by TPTB is to offer a safety net in the form of adding
inexpensive checks to the generated code which can propagate status
back to the user.

The approach I'm taking is to detect overflow or underflow and set a
flag (in the implementation namespace) as appropriate, but leave the
[invalid] value in the result.

I believe the appropriate action is to substitute +Inf, -Inf,
+DBL_MAX, or -DBL_MAX for overflow, or cause a trap.

Underflow to zero is problematical as it provides no clear way to
check for an error, but then it isn't always obvious when an underflow
*IS* an error. Is 0.10 - 0.10 underflowing to zero an error?
Or is it a case of "I had a dime and I spent it"?

This way, if it really is important to
the user to eke that last bit (pun not intended) of information out of
the operation then the way is clear for them to do so. For example, a
multiplication which overflows will end up with a value pow(2,256)
smaller than the correct [unrepresentable] value, thanks to wraparound
of our 8-bit signed binary exponent, but the mantissa will still have
full precision - some recovery may be possible.

Do you know if anyone is actually trying to do that kind of recovery?
It's very, very system-specific. How often do you deal in numbers
that are within 10 orders of magnitude of DBL_MAX? I consider it
dangerous.

An alternative approach (since we use an IEEE-754-compatible
representation already) is to go the whole hog and implement
Infinities, NaNs, and Denormals.

Overflow to +/- DBL_MAX might cover most of the issues if the
whole IEEE implementation is too expensive.

At this time we don't want to do this
because it appears to be a significant cost (see below) for not much
return. In any case, how to deal with those types of value in C is
still not defined, so any solution of this type would still be Not C.

How costly is it to check for the binary exponent overflow
and overflow to +/- DBL_MAX and underflow to 0?

I'm interested to hear the c.l.c set of viewpoints on silent vs. noisy
underflow and overflow - in this case, coercing values to 0 or Inf,
vs. leaving them as is and setting a user-checkable flag.
('Very-noisy' would be throwing an exception of some kind, also
undefined by the Standards.) What practices to you use to avoid
hitting overflow and underflow?

In most cases, range-checking the result to within sane values is
sufficient, given overflow to +/- Inf or +/- DBL_MAX. Legitimate
values are usually smaller in absolute value than the 10th root of
DBL_MAX.

It would be really nasty having 1e+200 * 1e+200 end up being 1e+4
(two outrageous values ending up with a sane-looking product).

Is our plan to let the dodgy value
continue to exist of any conceivable value?

I think it is of negative value, unless someone really needs to
use the full range. How often do you deal with numbers like 1e+300?

If someone really wants to check this hidden flag, could they not
get the dodgy value from some other hidden place, where the
normal result is DBL_MAX?

For those who've read this far: we make cross-compilers for [often
tiny; <256 bytes of RAM is common] CPUs commonly used in embedded
systems, and we do our damnedest to make them meet C90 and are inching
toward C99 in places. Floating point has become popular, some
deficiencies have been found, and Muggins put his hand up to fix them.

Gordon L. Burditt

C++ SSE and SSE2 compiler settings, and their Floating Point effects.	0	May 31, 2022
Java OpenJDK Floating Point Dare	3	Jan 17, 2023
Avoiding NaN and Inf on floating point division	14	Jan 4, 2014
Floating point library RC	0	May 27, 2016
under/over flow problem?	3	Apr 7, 2006
How to use Flow-guided video completion (FGVC)?	0	Jan 25, 2021
floating point in c99	7	Aug 17, 2010
analysis of floating point values	15	Feb 18, 2011

floating point over-/under-flow

Mark L Pappin

Tim Prince

Kevin Bracey

Gordon Burditt

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads