Dark-corners floating-point behavior in C++

Dave Rahardja · Sep 7, 2005

Does the C++ standard specify the behavior of floating point numbers during
"exceptional" (exceptional with respect to floating point numbers, not
exceptions) conditions?

For example:

double a = 1.0 / 0.0; // What is the value of a? Infinity?
double b = 0.0 / 0.0; // What is the value of b? NaN?

What about overflow/underflow conditions in the library? Is HUGE_VAL always a
defined constant?

#include <cmath>
using namespace std;

double c = pow(1e100, 1e100); // What is the value of c?
double d = pow(0.5, 1e100); // What is the value ef d?

Jack Klein · Sep 7, 2005

Does the C++ standard specify the behavior of floating point numbers during
"exceptional" (exceptional with respect to floating point numbers, not
exceptions) conditions?

For example:

double a = 1.0 / 0.0; // What is the value of a? Infinity?
double b = 0.0 / 0.0; // What is the value of b? NaN?

The behavior of both of the above is just plain undefined. There is
no requirement or specification of a value, there is none.

What about overflow/underflow conditions in the library? Is HUGE_VAL always a
defined constant?

HUGE_VAL is a required macro in <math.h>/<cmath>. It expands to a
positive double constant expression not necessarily representable in a
float. The intent is that it be the largest possible value that can
be held in a double, or a representation of positive infinity, if the
floating point type has such a representation.

#include <cmath>
using namespace std;

double c = pow(1e100, 1e100); // What is the value of c?

Assuming that the result is too large to be represented in a double,
his one should result in HUGE_VAL, and errno is set to ERANGE.

double d = pow(0.5, 1e100); // What is the value ef d?

Again assuming limitations on the range of a double, so that this
underflows to 0, the return will be 0.0, errno might or might not be
set to ERANGE.

But don't confuse library functions, which have fully defined
behavior, with overflow or underflow when using the build-in
arithmetic operators on any arithmetic type, be it floating point or
signed integer types. The latter has just plain undefined behavior.
Division by 0, even with unsigned integer types, is also undefined
behavior.

None of the <math.h>/<cmath> functions are allowed to produce
undefined behavior when passed arguments representable in the argument
type. If the arguments are out of range for the function, such as an
argument outside the range of [-1,+1] to acos(), an
implementation-defined value is returned and errno is set to EDOM.

If the result of the function is not representable in the return type,
the functions mostly return HUGE_VAL or -HUGE_VAL and set errno to
ERANGE on magnitude overflow, and 0 with errno possible set to ERANGE.

velthuijsen · Sep 7, 2005

First: appologies for sending the first version to you instead of the
NG.

If your compiler follows the float standard
Which you can check by including <limits> then test for conformance
using:
numeric_limits<float>::is_iec559

The following should result if it follows the standard:

double a = 1.0 / 0.0; // What is the value of a? Infinity?

An error since the compiler should discover a division by 0 when it
attempts to turn 1.0/0.0 into a constant.
Ignoring that: +Infinity (infinity keeps sign)

double b = 0.0 / 0.0; // What is the value of b? NaN?

See comment above.
The result would be: Indetermined

What about overflow/underflow conditions in the library? Is HUGE_VAL always a
defined constant?

This kind of error generally gets trapped at the processor.
There are OS specific solutions to get at these errors.

double c = pow(1e100, 1e100); // What is the value of c? +Infinity

double d = pow(0.5, 1e100); // What is the value ef d?

0

And depending on how the OS works all 4 calculations will result in
various errors that can be recoved using os specific functions.

Dave Rahardja · Sep 7, 2005

Assuming that the result is too large to be represented in a double,
his one should result in HUGE_VAL, and errno is set to ERANGE.

Thanks for your reply. Is there a portable way to detect these conditions at
run-time, even after they have happened?

My trouble with errno is that it is usable only in a single-threaded
application. My applications are almost always multithreaded at some point.

There are hooks in some libraries like _matherr() that allow me to trap for
these errors, but I suspect those hooks are not part of the standard library.
Am I right?

-dr

red floyd · Sep 7, 2005

Dave said:
My trouble with errno is that it is usable only in a single-threaded
application. My applications are almost always multithreaded at some point.

[DISCLAIMER: NON STANDARD SINCE THE STANDARD DOESN'T ADDRESS THREADS]

The implementations that I've seen (MSVC and G++) define errno in a
multithreaded environment as a macro which gives thread-specific error info.

i.e. [SAMPLE ONLY, NOT ACTUAL IMPLEMENTATION]

int *thread_specific_errno();
#define errno (*thread_specific_errno())

Pete Becker · Sep 7, 2005

First: appologies for sending the first version to you instead of the
NG.

If your compiler follows the float standard
Which you can check by including <limits> then test for conformance
using:
numeric_limits<float>::is_iec559

The following should result if it follows the standard:

An error since the compiler should discover a division by 0 when it
attempts to turn 1.0/0.0 into a constant.
Ignoring that: +Infinity (infinity keeps sign)

It's not an error. The result, as you say, is +infinity.

See comment above.
The result would be: Indetermined

No. Again, it's not an error. The result is NaN (not a number).

This kind of error generally gets trapped at the processor.
There are OS specific solutions to get at these errors.

Under IEC 60559 (the successor to IEC 559, which is the basis for the
name used in numeric_limits) the result of an overflow is a suitably
signed infinity. The result of an underflow is a denormal value if the
compiler supoprts them, or a suitably signed zero.

+Infinity

If the value is too large for double to handle, which is usually the
case. But it isn't required.

0

If the value is too small for double to handle, which is usually the
case. But it isn't required.

And depending on how the OS works all 4 calculations will result in
various errors that can be recoved using os specific functions.

No, under IEC 60559 these are not errors. They lead to floating-point
exceptions, which are nothing like C++ exceptions. The default behavior
is to continue the computation, with (mostly) well-defined rules for
what happens when code manipulates NaNs and infinities. You then test at
the end to see whether you got a NaN or an infinity, in which case you
conclude that you screwed up, or whether one or more of the exception
sticky bits is set, in which case you conclude that you might have
screwed up.

C99 incorporates most of this in the language and library. The C++ TR1
incorporates the library portions of C99, and C++0x will almost
certainly have all that C99 has in this area.

Dave Rahardja · Sep 7, 2005

Under IEC 60559 (the successor to IEC 559, which is the basis for the
name used in numeric_limits) the result of an overflow is a suitably
signed infinity. The result of an underflow is a denormal value if the
compiler supoprts them, or a suitably signed zero.

So can we assume the behaviors you described if
std::numeric_limits<double>::is_iec559 == true?

-dr

Pete Becker · Sep 8, 2005

Dave said:
So can we assume the behaviors you described if
std::numeric_limits<double>::is_iec559 == true?

is_iec559 asserts that the implementation conforms to the IEC 559
standard. IEC 60559 is the same thing.

Dave Rahardja · Sep 8, 2005

is_iec559 asserts that the implementation conforms to the IEC 559
standard. IEC 60559 is the same thing.

Thanks Pete!

Just got my hands on a copy of IEEE 754. How is IEC 559/60559 different from
the IEEE standards?

-dr

velthuijsen · Sep 9, 2005

Just got my hands on a copy of IEEE 754. How is IEC 559/60559 different from

the IEEE standards?

They are the same standard only classified by a different organisation.

velthuijsen · Sep 9, 2005

Pete said:
It's not an error. The result, as you say, is +infinity.

Please read again. I point out that the compiler would probably have a
fit as it would try to replace 1.0/0.0 with a constant and generate a
compile time error for division by 0.

No. Again, it's not an error. The result is NaN (not a number).

The result is indetermined not NaN seeing that it can be any number.
This is one of the few situations you do not get a NaN returned
(although a lot of material on the net assumes you get one) the other
situation I can come up with would trying to divide infinity by
infinity.

Dave Rahardja · Sep 9, 2005

The result is indetermined not NaN seeing that it can be any number.
This is one of the few situations you do not get a NaN returned
(although a lot of material on the net assumes you get one) the other
situation I can come up with would trying to divide infinity by
infinity.

Actually, not if your compiler (and library) are IEEE-754 compliant. From the
standard:

7.1 Invalid Operation

The invalid operation exception is signaled if an operand is invalid for the
operation on to be performed. The result, when the exception occurs without a
trap, shall be a quiet NaN (6.2) provided the destination has a floating-point
format. The invalid operations are

1) Any operation on a signaling NaN (6.2)

2) Addition or subtraction—magnitude subtraction of infinites such as, (+inf)
+ (-inf)

3) Multiplication—0 * inf

4) Division—0/0 or inf/inf

5) Remainder— x REM y, where y is zero or x is infinite

6) Square root if the operand is less than zero

7) Conversion of a binary floating-point number to an integer or decimal
format when overflow, infinity, or NaN precludes a faithful representation in
that format and this cannot otherwise be signaled

8) Comparison by way of predicates involving < or >, without ?, when the
operands are unordered (5.7, Table 4)

Item 4 specifies that 0/0 results in a NaN if not otherwise trapped.

-dr

Pete Becker · Sep 11, 2005

Please read again. I point out that the compiler would probably have a
fit as it would try to replace 1.0/0.0 with a constant and generate a
compile time error for division by 0.

It should not generate a compile time error under IEC 60559. It should
use the value +infinity.

The result is indetermined not NaN seeing that it can be any number.

No, that's the classic case where NaN is appropriate. Look it up.

This is one of the few situations you do not get a NaN returned
(although a lot of material on the net assumes you get one) the other
situation I can come up with would trying to divide infinity by
infinity.

Which also produces a NaN.

Java OpenJDK Floating Point Dare	3	Jan 17, 2023
C++ SSE and SSE2 compiler settings, and their Floating Point effects.	0	May 31, 2022
Weird Behavior with Rays in C and OpenGL	4	Feb 12, 2024
Implementing a Q-Learning Algorithm with Logistic Regression Normalization in C++	0	Jun 4, 2025
Floating point linkage	37	Oct 13, 2013
floating point calculation problem	4	Jan 5, 2010
Function is not worked in C	2	Jun 26, 2023
binary for floating point numbers - small?	1	Jan 27, 2011

Dark-corners floating-point behavior in C++

Dave Rahardja

Jack Klein

velthuijsen

Dave Rahardja

red floyd

Pete Becker

Dave Rahardja

Pete Becker

Dave Rahardja

velthuijsen

velthuijsen

Dave Rahardja

Pete Becker

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads