Mixed type math... What is the correct answer?

Daniel T. · Feb 9, 2009

The expression is:

unsigned int x = 3333702325u + 120u * 1000 / 500.0f;

With normal math, where I don't have to worry about overflow, the
answer is x == 3333702565, but when I look at the value of 'x' it
reads: 3333702656.

If I change any of the values to type double, then I get the correct
answer. So am I loosing precision, or is there a compiler bug?

Juha Nieminen · Feb 9, 2009

Daniel said:
The expression is:

unsigned int x = 3333702325u + 120u * 1000 / 500.0f;

First "120u * 1000" is conferted to a value of type float (not
double), after which it's divided by 500.0f. After that the 3333702325u
is converted to float, overflows, and the result is something depending
on how the hardware handles float overflow.

Why are you dividing by 500.0f? In the end it doesn't make any
difference if you divided by 500u.

Daniel T. · Feb 9, 2009

By "overflows" you mean "loses precision", I guess. Since the range of
a 'float' is guaranteed to be at least 1E37, there can be no overflow
from converting what is ~3.3337E9 to a 'float'.

But am I loosing precision when dealing with numbers that big? I'm
guessing so...

> and the result is something depending

There is no overflow to handle. There is loss of precision.

The formula I posted uses spicific numbers, but my code uses
variables. That "500.0f" represents a floating point type that could
equal some real value.

pasa · Feb 9, 2009

unsigned int x = 3333702325u + 120u * 1000 / 500.0f;

With normal math, where I don't have to worry about overflow, the
answer is x == 3333702565, but when I look at the value of 'x' it
reads: 3333702656.

If I change any of the values to type double, then I get the correct
answer. So am I loosing precision, or is there a compiler bug?

Yu get what you asked. by the language rules your integrals are
converted to fp -- in the case to float, there you lose precision,
what shows in the result. In general, type float is something you
shall forget (unless you work with SIMD or some specific stuff). Even
with double you can lose precision on current machines, as it often
has 64 bits overall (mantissa + exponent), so it will lose some
signoficant bits of a 64-bit long.

Juha Nieminen · Feb 9, 2009

Daniel said:
But am I loosing precision when dealing with numbers that big? I'm
guessing so...

In most architectures a variable of type float is an IEEE 32-bit
floating point number. It uses 23 bits for the base. You are giving it
32 bits of data. The lowest 9 bits are going to be dropped because they
simply can't fit into the 23 bits.

(Ok, technically speaking only the 8 least significant bits are lost,
but anyways.)

The formula I posted uses spicific numbers, but my code uses
variables. That "500.0f" represents a floating point type that could
equal some real value.

By using the 'float' type you are accepting that you will have only 23
bits of accuracy (plus the exponent). If you want more, use 'double'
instead.

James Kanze · Feb 10, 2009

The expression is:

unsigned int x = 3333702325u + 120u * 1000 / 500.0f;

With normal math, where I don't have to worry about overflow, the
answer is x == 3333702565, but when I look at the value of 'x' it
reads: 3333702656.

If I change any of the values to type double, then I get the correct
answer. So am I loosing precision, or is there a compiler bug?

You might be interested in the output of the following simple
program:

int
main()
{
std::cout.precision( 20 ) ;
for ( unsigned int i = 3333702565u - 4 ;
i <= 3333702565u + 4 ;
++ i ) {
float f = i ;
double d = i ;
std::cout
<< i
<< ", as float " << f
<< ", as double " << d << std::endl ;
}
return 0 ;
}

Depending on the compiler and the hardware you are using (and
the degree of optimization you've demanded), all of the
operations except the 12ou * 1000 in your expression may be done
in float. (They may also be done in a floating point format
with more precision than float---this is implementation
defined.) If your machine uses IEEE floating point (like mine,
and most other desktops), 3333702565 is not representable as a
float value; the closest you can come is 3333702656. So there's
no way any arithmetic in float can ever result in 3333702565.
(If your compiler does use a larger precision, you might get
3333702565 with your expression.)

In general, with IEEE float, you can't count on more than about
six decimal digits, with IEEE double, 15.

SG · Feb 10, 2009

The expression is:

unsigned int x = 3333702325u + 120u * 1000 / 500.0f;

With normal math, where I don't have to worry about overflow, the
answer is x == 3333702565, but when I look at the value of 'x' it
reads: 3333702656.

You already got useful answers. Let me add the following two links:

http://letmegooglethatforyou.com/?q=machine+epsilon&l=1

http://www.cplusplus.com/reference/std/limits/numeric_limits.html

Cheers!
SG

James Kanze · Feb 11, 2009

You already got useful answers. Let me add the following two links:

http://www.cplusplus.com/reference/std/limits/numeric_limits.html

The most important link is surely:
http://docs.sun.com/source/806-3568/ncg_goldberg.html

Pointer-to-Object type error	0	Mar 26, 2022
What is the most astounding C++ syntax construct?	0	Dec 22, 2022
Why is each iteration accumulating the values here?	0	Aug 10, 2023
Mixed signed/unsigned	2	Apr 5, 2012
what is the default constructor for the POD type	5	Feb 19, 2012
What is the base type?	3	Jan 20, 2013
What is the purpose of type() and the types module and what is a type?	1	Jun 27, 2013
What is the type of back_inserter(container)?	2	Aug 24, 2011

Mixed type math... What is the correct answer?

Daniel T.

Juha Nieminen

Daniel T.

pasa

Juha Nieminen

James Kanze

SG

James Kanze

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads