Unwanted rounding

J

Joe Wright

Gordon said:
I have :
float f = 36.09999999;

There is no exact representation of 36.09999999 in binary floating point.

36.0999999999 as long double:
Before: 36.09999999989999999727707802321674535050988197326660156250000000000000000000000000
Value: 36.09999999990000000074652497517035953933373093605041503906250000000000000000000000
After: 36.09999999990000000421597192712397372815757989883422851562500000000000000000000000

36.0999999999 as double:
Before: 36.099999999899992531027237419039011001586914062500000000000000
Value: 36.099999999899999636454595020040869712829589843750000000000000
After: 36.099999999900006741881952621042728424072265625000000000000000

36.0999999999 as float:
Before: 36.099994659423828125000000000000000000000000000000000000000000
Value: 36.099998474121093750000000000000000000000000000000000000000000
After: 36.100002288818359375000000000000000000000000000000000000000000
When I do :
char cf[25];
sprintf(cf,"%0.03lf", f);

I get : 36.100

There is no exact representation of 36.100 in binary floating point.

36.100 as long double:
Before: 36.09999999999999999514277426726494013564661145210266113281250000000000000000000000
Value: 36.09999999999999999861222121921855432447046041488647460937500000000000000000000000
After: 36.10000000000000000208166817117216851329430937767028808593750000000000000000000000

36.100 as double:
Before: 36.099999999999994315658113919198513031005859375000000000000000
Value: 36.100000000000001421085471520200371742248535156250000000000000
After: 36.100000000000008526512829121202230453491210937500000000000000

36.100 as float:
Before: 36.099994659423828125000000000000000000000000000000000000000000
Value: 36.099998474121093750000000000000000000000000000000000000000000
After: 36.100002288818359375000000000000000000000000000000000000000000
How could I get 36.099 ?

There is no exact representation of 36.099 in binary floating point.

36.099 as long double:
Before: 36.09899999999999999772404279951842909213155508041381835937500000000000000000000000
Value: 36.09900000000000000119348975147204328095540404319763183593750000000000000000000000
After: 36.09900000000000000466293670342565746977925300598144531250000000000000000000000000

36.099 as double:
Before: 36.098999999999989540810929611325263977050781250000000000000000
Value: 36.098999999999996646238287212327122688293457031250000000000000
After: 36.099000000000003751665644813328981399536132812500000000000000

36.099 as float:
Before: 36.098995208740234375000000000000000000000000000000000000000000
Value: 36.098999023437500000000000000000000000000000000000000000000000
After: 36.099002838134765625000000000000000000000000000000000000000000
Thanks in advance.
Given our ubiquitous 64-bit IEEE double (53 mantissa bits)
36.099 as double has no precision beyond
3.6098999999999997e+01
That printf("%.60f", 36.099) can give you something like
36.098999999999996646238287212327122688293000000000000000000000
might tease you to believe you have precision to 40+ digits. You don't.
 
G

Gordon Burditt

I have :
float f = 36.09999999;

There is no exact representation of 36.09999999 in binary floating point.

36.0999999999 as long double:
Before: 36.09999999989999999727707802321674535050988197326660156250000000000000000000000000
Value: 36.09999999990000000074652497517035953933373093605041503906250000000000000000000000
After: 36.09999999990000000421597192712397372815757989883422851562500000000000000000000000

36.0999999999 as double:
Before: 36.099999999899992531027237419039011001586914062500000000000000
Value: 36.099999999899999636454595020040869712829589843750000000000000
After: 36.099999999900006741881952621042728424072265625000000000000000

36.0999999999 as float:
Before: 36.099994659423828125000000000000000000000000000000000000000000
Value: 36.099998474121093750000000000000000000000000000000000000000000
After: 36.100002288818359375000000000000000000000000000000000000000000
When I do :
char cf[25];
sprintf(cf,"%0.03lf", f);

I get : 36.100

There is no exact representation of 36.100 in binary floating point.

36.100 as long double:
Before: 36.09999999999999999514277426726494013564661145210266113281250000000000000000000000
Value: 36.09999999999999999861222121921855432447046041488647460937500000000000000000000000
After: 36.10000000000000000208166817117216851329430937767028808593750000000000000000000000

36.100 as double:
Before: 36.099999999999994315658113919198513031005859375000000000000000
Value: 36.100000000000001421085471520200371742248535156250000000000000
After: 36.100000000000008526512829121202230453491210937500000000000000

36.100 as float:
Before: 36.099994659423828125000000000000000000000000000000000000000000
Value: 36.099998474121093750000000000000000000000000000000000000000000
After: 36.100002288818359375000000000000000000000000000000000000000000
How could I get 36.099 ?

There is no exact representation of 36.099 in binary floating point.

36.099 as long double:
Before: 36.09899999999999999772404279951842909213155508041381835937500000000000000000000000
Value: 36.09900000000000000119348975147204328095540404319763183593750000000000000000000000
After: 36.09900000000000000466293670342565746977925300598144531250000000000000000000000000

36.099 as double:
Before: 36.098999999999989540810929611325263977050781250000000000000000
Value: 36.098999999999996646238287212327122688293457031250000000000000
After: 36.099000000000003751665644813328981399536132812500000000000000

36.099 as float:
Before: 36.098995208740234375000000000000000000000000000000000000000000
Value: 36.098999023437500000000000000000000000000000000000000000000000
After: 36.099002838134765625000000000000000000000000000000000000000000
Thanks in advance.
Given our ubiquitous 64-bit IEEE double (53 mantissa bits)
36.099 as double has no precision beyond
3.6098999999999997e+01
That printf("%.60f", 36.099) can give you something like
36.098999999999996646238287212327122688293000000000000000000000
might tease you to believe you have precision to 40+ digits. You don't.

Since the value given above doesn't end in 5 followed by trailing
zeroes, and it's not an exact integer, your example won't happen
unless printf() is introducing unwanted rounding.

The point of the output is that you have three consecutive floating
point numbers (with no intermediate values in between) so rounding
decimal numbers to put them in floating-point variables is inevitable
and will result in errors.

A floating-point variable contains a number (except when it's NaN
or Inf or some such thing) and it is perfectly possible and reasonable
to print out *EXACTLY* what that value is, to infinite precision,
particularly when investigating problems of unwanted precision loss
or comparing what you got with what you should have gotten if
everything was done in infinite-precision math.

Now, if that number 36.099 represents the weight in kilograms of
something, you are correct that it is highly unlikely to have
anywhere near 17 digits of precision in the result.
 
J

Joe Wright

Gordon Burditt wrote:
[ snip ]
Since the value given above doesn't end in 5 followed by trailing
zeroes, and it's not an exact integer, your example won't happen
unless printf() is introducing unwanted rounding.
Where is (at what position) printf introducing this rounding?
The point of the output is that you have three consecutive floating
point numbers (with no intermediate values in between) so rounding
decimal numbers to put them in floating-point variables is inevitable
and will result in errors.
I'm at a loss here. I have no idea what you mean.
A floating-point variable contains a number (except when it's NaN
or Inf or some such thing) and it is perfectly possible and reasonable
to print out *EXACTLY* what that value is, to infinite precision,
particularly when investigating problems of unwanted precision loss
or comparing what you got with what you should have gotten if
everything was done in infinite-precision math.
A floating point variable (double, let's say) can hold a value precise
to approximately 17 decimal digits. Nothing infinite about it.
Now, if that number 36.099 represents the weight in kilograms of
something, you are correct that it is highly unlikely to have
anywhere near 17 digits of precision in the result.
Why kilograms? The double has 53 bits and about 17 digits of precision
no matter whether its value is kilos, nanos or light years. Using printf
and friends to show decimal digits beyond 17 or so is misleading.
 
G

Gordon Burditt

Given our ubiquitous 64-bit IEEE double (53 mantissa bits)
Where is (at what position) printf introducing this rounding?


36.098999999999997 as double:
Before: 36.098999999999989540810929611325263977050781250000000000000000
Value: 36.098999999999996646238287212327122688293457031250000000000000
After: 36.099000000000003751665644813328981399536132812500000000000000

Assuming that this is stored in a IEEE 64-bit floating point number,
the rounding is 4 places in the 16th digit to the left of the decimal point.
The actual number needs to be one of the three listed above, or something
even farther away, since there aren't any numbers between the Before: and
Value: numbers or between the Value: and After: numbers.

Value: 36.098999999999996646238287212327122688293457031250000000000000
Input: 36.098999999999997000000000000000000000000000000000000000000000
^^
I'm at a loss here. I have no idea what you mean.

The program prints three consecutive floating point numbers.
There's no numbers in between them at the specified precision.
If you want to represent something close to the one in the middle,
you've got these three choices. Anything else is further away.
A floating point variable (double, let's say) can hold a value precise
to approximately 17 decimal digits. Nothing infinite about it.

When you convert a floating point number (say, double) to decimal,
it may take many more digits than 17 to represent EXACTLY the value
it represents.
Why kilograms? The double has 53 bits and about 17 digits of precision
no matter whether its value is kilos, nanos or light years. Using printf
and friends to show decimal digits beyond 17 or so is misleading.

It's not misleading to represent the exact value of a floating-point
number in decimal when discussing rounding error and the limits of
precision of various types.
 
D

Dik T. Winter

> Gordon Burditt wrote:
> [ snip ]
> >
> > Since the value given above doesn't end in 5 followed by trailing
> > zeroes, and it's not an exact integer, your example won't happen
> > unless printf() is introducing unwanted rounding.
> >
> Where is (at what position) printf introducing this rounding?

A floating point number is (by definition) a number of the form
m * base^exp
where m and exp are integer (the possibility that m is a fraction
can be ignored because it can be made integer by suitable change
of the exponent) and base is the base of the representation,
which is 2 in IEEE. So a floating point number is in essence a
rational number. If the base contains only prime factors 2 and/or
5, the denumerator of that number is a divisor of a power of 10,
and so the number has an exact representation in finite decimal
notation. So if printf is giving the above representation it is
doing some rounding, because that is not the exact representation
of an IEEE floating point number.
> I'm at a loss here. I have no idea what you mean.

Given some number in decimal notation there is either a single
floating point number that it matches, or there are two floating
point numbers, one of them larger and one of them smaller than
the number given. 36.099 does not have an exact representation,
so there are two numbers, one larger and one smaller.
> A floating point variable (double, let's say) can hold a value precise
> to approximately 17 decimal digits. Nothing infinite about it.

Each floating point number is exactly representable in decimal notation.
Again, nothing infinite in it.
> Why kilograms? The double has 53 bits and about 17 digits of precision
> no matter whether its value is kilos, nanos or light years. Using printf
> and friends to show decimal digits beyond 17 or so is misleading.

That may be quite something else. But showing exact representations
to show that 36.099 is not representable as a floating point number is
not misleading at all.
 
C

CBFalconer

Dik T. Winter said:
.... snip ...

A floating point number is (by definition) a number of the form
m * base^exp
where m and exp are integer (the possibility that m is a fraction
can be ignored because it can be made integer by suitable change
of the exponent) and base is the base of the representation,
which is 2 in IEEE. So a floating point number is in essence a
rational number. If the base contains only prime factors 2 and/or
5, the denumerator of that number is a divisor of a power of 10,
and so the number has an exact representation in finite decimal
notation. So if printf is giving the above representation it is
doing some rounding, because that is not the exact representation
of an IEEE floating point number.

You neglect that the usual base is 2, and that 10 is only used for
input/output translation to/from text format.
Given some number in decimal notation there is either a single
floating point number that it matches, or there are two floating
point numbers, one of them larger and one of them smaller than
the number given. 36.099 does not have an exact representation,
so there are two numbers, one larger and one smaller.

Since in general we cannot make an exact equivalent between
representation as (2 ** binexp) and (10 ** decexp) the two
representations cannot be exact equivalents (outside of a few
specific values).
 
L

langwadt

Richard Heathfield skrev:
Marco said:
Hello,

I have :
float f = 36.09999999;

I recommend double rather than float.
When I do :
char cf[25];
sprintf(cf,"%0.03lf", f);

Not lf - just f.
I get : 36.100

How could I get 36.099 ?

Remove the precision specification (03), and then search the string for the
decimal point, using strchr. Check that you have at least three valid
characters (non-'\0') after the decimal point, and make the fourth one '\0'
to truncate the string at the point you want.
how about..

sprintf(cf,"%0.03lf", floor(f*1000)/1000);

-Lasse
 
E

Ernie Wright

CBFalconer said:
Since in general we cannot make an exact equivalent between
representation as (2 ** binexp) and (10 ** decexp) the two
representations cannot be exact equivalents (outside of a few
specific values).

You might want to give the phrasing of this some more thought.

We aren't comparing logarithms, which is what binexp and decexp are.
The question would be whether a number represented as

(1 + m / 2^b) * 2^e

with e, m, b integers, can be written exactly (with a finite number of
digits) as a number in base 10, and as it happens, *all* of them can.

Every one of the numbers that can be represented in IEEE 754, by far the
most common floating-point encoding, can also be represented exactly in
base 10, because 10 contains 2 as a factor. It's only in the other
direction that we have an issue.

You can convince yourself of this by looking at the decimal expansion
for various values of binexp.

-1 0.5
-2 0.25
-3 0.125
-4 0.0625
-5 0.03125
...
-23 0.00000011920928955078125

To create the decimal expansion of a binary fraction with b binary
digits, you just need to add up the numbers on the right for each
corresponding 1 bit in the binary fraction. This'll produce a decimal
expansion with at most b decimal digits.

For floats, b = 23, and for doubles, b = 53.

The problems arise in the other direction.

The OP had asked how to use printf() to display

float f = 36.09999999;

as "36.099" rather than "36.100". More generally, he wants rounding
toward zero, rather than rounding to nearest.

Someone suggested using

printf( "%.3f\n", f - fmod( f, 1e-3 ));

This looks like a good approach, but it can only touch the rounding done
in printf(). The compiler must still round the decimal value of f in
order to binary-encode it as a float, and it'll always do this using
nearest-value rounding, so there are some values that will be rounded up
rather than down.

An example is 31.0999999f. This actually has one fewer 9 than the OP's
example value, but the compiler will round it up to 31.1 before it ever
reaches printf().

31.0999999f can't be represented exactly, so the compiler has to choose
between the two nearest values that it *can* represent exactly,

(1 + 7916748 / 8388608) * 16 = 31.09999847412109375
(1 + 7916749 / 8388608) * 16 = 31.1000003814697265625

and it turns out the second one is closer.

If you think this is just a problem of having too many digits, consider
what happens to 0.9. It can't be represented exactly either, and

float f = 0.9f;
printf( "%.3f\n", f - fmod( f, 1e-3 ));

prints "0.899", probably not what the OP wanted.

The only surefire way to handle this is not to allow any rounding that
you don't control. In particular, you can't store numbers in base 2.
You have to maintain them as a string of decimal digits, and you have to
perform all of the arithmetic in decimal.

- Ernie http://home.comcast.net/~erniew
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,680
Members
48,796
Latest member
Greg L.

Latest Threads

Top