IEEE floating point

Ingo Nolden · Jun 13, 2004

Hi,

I am not really sure wether there is a better group for this. I already
tried a VC group, but they cannot tell me how other stdlibs are doing
it.

I tested a lot with inf and NaNs. I read some docs from docs.sun.com
about the standard. It seems to be an commented inerpretation of the
IEEE standard.

So it seems my VC.net's standard c++ library is not really meeting the
standard, or the standard is not what I think.

I found out that:

the numeric_limits<double> knows only a infinite which is the standards
+inf.

it knows a sNaN which is different from the one in the standard, it is
rather the same as the standards -inf

if I calculate d = d / 0 I get what numeric_limit says is the sNaN. This
is fine even though the bit pattern is different from the standard, but
I can use it just as is.

But, d = -d * d returns the same bit pattern. In this case it is
returning the standards -inf as ist should, but there is no -inf in the
numeric_limits of the library.

All in all the behaviour is weird, but the only problem is, that I can
not distinguish between -inf and sNaN. Whis is in case of some matrix
caclulations the difference between division by zero and division by
some very small ( neg ) number.

Is someone familiar with this topic, or can tell me about his version of
the stdlib or compiler, or can point me to another interpretation of the
IEEE standard?

Here is the code that I used to examine the problem:

#include <iostream>

#include <math.h>

#include <limits>

using namespace std;

const __int64 NAN_BITS = 0x7ff0000000000000;
bool IsNAN( const double d )
{
return ( ( *reinterpret_cast<const __int64*>( &d ) & NAN_BITS ) ==
NAN_BITS )
&& ( *reinterpret_cast<const __int64*>( &d ) != NAN_BITS );
}

void hexout( void* p )
{
cout << "\t";
cout.width( 8 );
cout.fill( '0' );
long* pl = (long*)p;
cout << hex << pl[1] << " ";
cout.width( 8 );
cout.fill( '0' );

cout << hex << pl[0] << endl;
}

template< typename T >
void report( )
{

cout << "numeric_limits<double>::\n" << endl;

double
d = numeric_limits<T>::denorm_min( ); cout << "denorm_min = ";

hexout( &d );
// d = numeric_limits<T>::denorm_max( ); cout << "denorm_max = ";

// hexout( &d );

d = numeric_limits<T>::epsilon( ); cout << "epsilon = ";

hexout( &d );

d = numeric_limits<T>::infinity( ); cout << "infinity = ";
hexout( &d );
// cout << "is_bounded = " << numeric_limits<T>::is_bounded( ) <<
endl;
// cout << "is_exact = " << numeric_limits<T>::is_exact( ) << endl;
// cout << "is_iec559 = " << numeric_limits<T>::is_iec559( ) << endl;

d = numeric_limits<T>::max( ); cout << "max =
\t";
hexout( &d );

d = numeric_limits<T>::min( ); cout << "min =
\t";
hexout( &d );

d = numeric_limits<T>::quiet_NaN( );cout << "quiet_NaN = ";
hexout( &d );
d = numeric_limits<T>::signaling_NaN( );cout << "signaling_NaN =
";
hexout( &d );

// produce an infinite number:
d = numeric_limits<T>::max( ); d = d * d;cout << "pos inf = \t";
hexout( &d );
cout << "Is positive infinity NaN ? " << IsNAN( d ) << endl;

d = numeric_limits<T>::max( ); d = -d * d;cout << "neg inf = \t";
hexout( &d );
cout << "Is negative infinity NaN ? " << IsNAN( d ) << endl;

d = d / 0;
cout << "Div by zero returns: "; hexout( &d );

cout << "Is div by zero NaN ? " << IsNAN( d ) << endl;

}

int main(int argc, char* argv[])
{

report< double >( );

cout << endl << endl;
cout << "sizeof( float ) = " << sizeof( float ) << endl;
cout << "sizeof( double ) = " << sizeof( double ) << endl;
cout << "sizeof( long double ) = " << sizeof( long double ) <<
endl;

getchar( );
return 0;
}

P.J. Plauger · Jun 13, 2004

I tested a lot with inf and NaNs. I read some docs from docs.sun.com
about the standard. It seems to be an commented inerpretation of the
IEEE standard.

So it seems my VC.net's standard c++ library is not really meeting the
standard, or the standard is not what I think.

I found out that:

the numeric_limits<double> knows only a infinite which is the standards
+inf.

it knows a sNaN which is different from the one in the standard, it is
rather the same as the standards -inf

Yep, it's a bug. Fixed in later releases.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

Ingo Nolden · Jun 13, 2004

Yep, it's a bug. Fixed in later releases.

And I should have checked with the newest release I have. Indeed, in
VC.net2003 it is correct. And what I used before was not VS.net as said,
but VS6.0, so sorry and thanks.

Ingo

Is there such thing as invalid floating point ?	4	Nov 27, 2007
std::numeric_limits<double>::infinity() returns 0.0 on Intel/Linux g++ 3.2.3	1	Aug 1, 2006
Floating Point comparison problem	26	Feb 15, 2008
const_cast issue	2	Nov 17, 2011
Whether to choose an overload of template function or a partialspecialization of a functor(function	0	Aug 12, 2013
sanity check - floating point comparison	32	Apr 25, 2006
converting floating point types round off error ....	13	Oct 5, 2008
Overloaded function template resolution	1	Jan 7, 2014

IEEE floating point

Ingo Nolden

P.J. Plauger

Ingo Nolden

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads