Mixed type math... What is the correct answer?

D

Daniel T.

The expression is:

unsigned int x = 3333702325u + 120u * 1000 / 500.0f;

With normal math, where I don't have to worry about overflow, the
answer is x == 3333702565, but when I look at the value of 'x' it
reads: 3333702656.

If I change any of the values to type double, then I get the correct
answer. So am I loosing precision, or is there a compiler bug?
 
J

Juha Nieminen

Daniel said:
The expression is:

unsigned int x = 3333702325u + 120u * 1000 / 500.0f;

First "120u * 1000" is conferted to a value of type float (not
double), after which it's divided by 500.0f. After that the 3333702325u
is converted to float, overflows, and the result is something depending
on how the hardware handles float overflow.

Why are you dividing by 500.0f? In the end it doesn't make any
difference if you divided by 500u.
 
D

Daniel T.

By "overflows" you mean "loses precision", I guess.  Since the range of
a 'float' is guaranteed to be at least 1E37, there can be no overflow
from converting what is ~3.3337E9 to a 'float'.

But am I loosing precision when dealing with numbers that big? I'm
guessing so...
 > and the result is something depending


There is no overflow to handle.  There is loss of precision.

The formula I posted uses spicific numbers, but my code uses
variables. That "500.0f" represents a floating point type that could
equal some real value.
 
P

pasa

unsigned int x = 3333702325u + 120u * 1000 / 500.0f;

With normal math, where I don't have to worry about overflow, the
answer is x == 3333702565, but when I look at the value of 'x' it
reads: 3333702656.

If I change any of the values to type double, then I get the correct
answer. So  am I loosing precision, or is there a compiler bug?

Yu get what you asked. by the language rules your integrals are
converted to fp -- in the case to float, there you lose precision,
what shows in the result. In general, type float is something you
shall forget (unless you work with SIMD or some specific stuff). Even
with double you can lose precision on current machines, as it often
has 64 bits overall (mantissa + exponent), so it will lose some
signoficant bits of a 64-bit long.
 
J

Juha Nieminen

Daniel said:
But am I loosing precision when dealing with numbers that big? I'm
guessing so...

In most architectures a variable of type float is an IEEE 32-bit
floating point number. It uses 23 bits for the base. You are giving it
32 bits of data. The lowest 9 bits are going to be dropped because they
simply can't fit into the 23 bits.

(Ok, technically speaking only the 8 least significant bits are lost,
but anyways.)
The formula I posted uses spicific numbers, but my code uses
variables. That "500.0f" represents a floating point type that could
equal some real value.

By using the 'float' type you are accepting that you will have only 23
bits of accuracy (plus the exponent). If you want more, use 'double'
instead.
 
J

James Kanze

The expression is:
unsigned int x = 3333702325u + 120u * 1000 / 500.0f;
With normal math, where I don't have to worry about overflow, the
answer is x == 3333702565, but when I look at the value of 'x' it
reads: 3333702656.
If I change any of the values to type double, then I get the correct
answer. So am I loosing precision, or is there a compiler bug?

You might be interested in the output of the following simple
program:

int
main()
{
std::cout.precision( 20 ) ;
for ( unsigned int i = 3333702565u - 4 ;
i <= 3333702565u + 4 ;
++ i ) {
float f = i ;
double d = i ;
std::cout
<< i
<< ", as float " << f
<< ", as double " << d << std::endl ;
}
return 0 ;
}

Depending on the compiler and the hardware you are using (and
the degree of optimization you've demanded), all of the
operations except the 12ou * 1000 in your expression may be done
in float. (They may also be done in a floating point format
with more precision than float---this is implementation
defined.) If your machine uses IEEE floating point (like mine,
and most other desktops), 3333702565 is not representable as a
float value; the closest you can come is 3333702656. So there's
no way any arithmetic in float can ever result in 3333702565.
(If your compiler does use a larger precision, you might get
3333702565 with your expression.)

In general, with IEEE float, you can't count on more than about
six decimal digits, with IEEE double, 15.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top