operations with double integers

Alexander Stoyakin · Mar 13, 2007

Hello,
please advise on the following issue. I need to check that difference
between two double values is not higher than defined limit.

int main()
{
double limit = 0.3;
double val1 = 0.5, val2 = 0.2;

if ( (val1 - val2) > limit )
puts("Difference higher than limit!");

return 0;
}

This program displays message due to undefined binary presentation of
double, so result is unexpected. I see that it is behavior by
standard. But what can I do to make this code work as expected, i.e.
treat 0.3 as 0.3, not as (for example) 0.3000000000000000012? Are
there any standard ways (macros, functions) to fix the precision for
double integers? Everything I found is just formatting for input-
output functions, but it doesn't affect the binary presentation, just
visualisation.

Thank you in advance.

santosh · Mar 13, 2007

Alexander said:
Hello,
please advise on the following issue. I need to check that difference
between two double values is not higher than defined limit.

int main()
{
double limit = 0.3;
double val1 = 0.5, val2 = 0.2;

if ( (val1 - val2) > limit )
puts("Difference higher than limit!");

return 0;
}

This program displays message due to undefined binary presentation of
double, so result is unexpected. I see that it is behavior by
standard. But what can I do to make this code work as expected, i.e.
treat 0.3 as 0.3, not as (for example) 0.3000000000000000012? Are
there any standard ways (macros, functions) to fix the precision for
double integers? Everything I found is just formatting for input-
output functions, but it doesn't affect the binary presentation, just
visualisation.

Thank you in advance.

Floating-point maths is one place C99 has made improvements. If your
implementation is at least partially C99 compliant, you might want to
see if you can use the is* macros in math.h. In this particular case
the isgreater macro can be used like so:

if(isgreater(diff, 0.3)) /* ... */

Otherwise, you should allow for a small delta around the value you're
comparing to.

user923005 · Mar 13, 2007

Hello,
please advise on the following issue. I need to check that difference
between two double values is not higher than defined limit.

int main()
{
double limit = 0.3;
double val1 = 0.5, val2 = 0.2;

if ( (val1 - val2) > limit )
puts("Difference higher than limit!");

return 0;

}

This program displays message due to undefined binary presentation of
double, so result is unexpected. I see that it is behavior by
standard. But what can I do to make this code work as expected, i.e.
treat 0.3 as 0.3, not as (for example) 0.3000000000000000012? Are
there any standard ways (macros, functions) to fix the precision for
double integers? Everything I found is just formatting for input-
output functions, but it doesn't affect the binary presentation, just
visualisation.

The type of double is not integral unless you store integers in it.
It is floating point (which is really a base 2 fractional
representation internally).

Typically, numbers like 0.3 and 1.7 cannot be stored as exact values
in floating point representation.

Here is a very good document that explains floating point in html
format:
http://docs.sun.com/source/806-3568/ncg_goldberg.html

Here is the same thing as a PDF file:
http://cch.loria.fr/documentation/IEEE754/ACM/goldberg.pdf

Related information from the C-FAQ:
14.1: When I set a float variable to, say, 3.1, why is printf printing
it as 3.0999999?

A: Most computers use base 2 for floating-point numbers as well as
for integers. In base 2, one divided by ten is an infinitely-
repeating fraction (0.0001100110011...), so fractions such as
3.1 (which look like they can be exactly represented in decimal)
cannot be represented exactly in binary. Depending on how
carefully your compiler's binary/decimal conversion routines
(such as those used by printf) have been written, you may see
discrepancies when numbers (especially low-precision floats) not
exactly representable in base 2 are assigned or read in and then
printed (i.e. converted from base 10 to base 2 and back again).
See also question 14.6.

4.4: My floating-point calculations are acting strangely and giving
me different answers on different machines.

A: First, see question 14.2 above.

If the problem isn't that simple, recall that digital computers
usually use floating-point formats which provide a close but by
no means exact simulation of real number arithmetic. Underflow,
cumulative precision loss, and other anomalies are often
troublesome.

Don't assume that floating-point results will be exact, and
especially don't assume that floating-point values can be
compared for equality. (Don't throw haphazard "fuzz factors"
in, either; see question 14.5.)

These problems are no worse for C than they are for any other
computer language. Certain aspects of floating-point are
usually defined as "however the processor does them" (see also
question 11.34), otherwise a compiler for a machine without the
"right" model would have to do prohibitively expensive
emulations.

This article cannot begin to list the pitfalls associated with,
and workarounds appropriate for, floating-point work. A good
numerical programming text should cover the basics; see also the
references below.

References: Kernighan and Plauger, _The Elements of Programming
Style_ Sec. 6 pp. 115-8; Knuth, Volume 2 chapter 4; David
Goldberg, "What Every Computer Scientist Should Know about
Floating-Point Arithmetic".

14.5: What's a good way to check for "close enough" floating-point
equality?

A: Since the absolute accuracy of floating point values varies, by
definition, with their magnitude, the best way of comparing two
floating point values is to use an accuracy threshold which is
relative to the magnitude of the numbers being compared. Rather
than

double a, b;
...
if(a == b) /* WRONG */

use something like

#include <math.h>

if(fabs(a - b) <= epsilon * fabs(a))

for some suitably-chosen degree of closeness epsilon (as long as
a is nonzero!).

References: Knuth Sec. 4.2.2 pp. 217-8.

14.6: How do I round numbers?

A: The simplest and most straightforward way is with code like

(int)(x + 0.5)

This technique won't work properly for negative numbers,
though (for which you could use something like
(int)(x < 0 ? x - 0.5 : x + 0.5)).

Keith Thompson · Mar 13, 2007

santosh said:
Floating-point maths is one place C99 has made improvements. If your
implementation is at least partially C99 compliant, you might want to
see if you can use the is* macros in math.h. In this particular case
the isgreater macro can be used like so:

if(isgreater(diff, 0.3)) /* ... */

The only difference between the isgreater() macro and the built-in ">"
operator is it behavior in the presence of NaNs, which doesn't help at
all in this case.

Otherwise, you should allow for a small delta around the value you're
comparing to.

See user923005's followup for more information.

The phrase "double integer" is incorrect; double is a floating-point
type, not an integer type.

If you want to represent values like 0.3 exactly, you might consider
using scaled integers. For example, you might use the integer 3 to
represent the real value 0.3. All the code that works with these
values will then have to allow for the scaling factor. One example of
this kind of thing is treating a quantity of money as an integer
number of cents rather than a floating-point number of dollars (or
whatever your local currency is).

santosh · Mar 13, 2007

[ ... ]

The only difference between the isgreater() macro and the built-in ">"
operator is it behavior in the presence of NaNs, which doesn't help at
all in this case.

Thanks for the correction.

<snip>

christian.bau · Mar 13, 2007

Hello,
please advise on the following issue. I need to check that difference
between two double values is not higher than defined limit.

int main()
{
double limit = 0.3;
double val1 = 0.5, val2 = 0.2;

if ( (val1 - val2) > limit )
puts("Difference higher than limit!");

return 0;

}

This program displays message due to undefined binary presentation of
double, so result is unexpected. I see that it is behavior by
standard. But what can I do to make this code work as expected, i.e.
treat 0.3 as 0.3, not as (for example) 0.3000000000000000012? Are
there any standard ways (macros, functions) to fix the precision for
double integers? Everything I found is just formatting for input-
output functions, but it doesn't affect the binary presentation, just
visualisation.

You first have to figure out what exactly you want. If val1 was 0.5 +
1e-12, would you want the answer "greater" or not?

It seems that "is val1 minus val2 greater than limit"? is not the
question you want to answer, so you must have a different question. .
Maybe your question is "If I printed val1 with two decimal digits, and
printed val2 with two decimal digits, would the printed result look as
if val1 - val2 is greater than 0.2"? If that was your question, you
would check whether floor (100 * val1 + 0.5) - floor (100 * val2 +
0.5) > 20.0. Maybe your question is "is val1 - val2 so large that I am
sure the difference would be greater than 0.2, even if I had used more
precision"? In that case you have to check how val1 and val2 were
calculated, what rounding errors you could have made, and check if the
difference is greater than 0.2 plus any possible rounding errors added
up.

If you choose a small constant k, maybe k = 10, and check whether val1
- val2 > limit + k * DBL_EPSILON + max (|val1|, |val2|. |limit|), then
it is relatively likely that you get the right result. Depends how
val1, val2, limit have been calculated.

Ernie Wright · Mar 14, 2007

Alexander said:
please advise on the following issue. I need to check that difference
between two double values is not higher than defined limit.

int main()
{
double limit = 0.3;
double val1 = 0.5, val2 = 0.2;

if ( (val1 - val2) > limit )
puts("Difference higher than limit!");

return 0;
}

This program displays message due to undefined binary presentation of
double, so result is unexpected.

It's more fundamental than that.

In base 2, both 0.2 and 0.3 are repeating fractions.

0.2 decimal = 0.001100 1100 1100 1100 ... binary
0.3 decimal = 0.010011 0011 0011 0011 ... binary

This is true, in fact, for *every* decimal fraction that doesn't end in
the digit 5 (and for some that do). No finite amount of precision is
enough to represent them exactly.

I see that it is behavior by standard. But what can I do to make this
code work as expected, i.e. treat 0.3 as 0.3, not as (for example)
0.3000000000000000012?

Using the information supplied in other answers, you have to either
design a test more sophisticated than

(val1 - val2) > limit

or represent val1, val2, and limit as something other than a double
with a fractional part.

- Ernie http://home.comcast.net/~erniew

RNGs: A double KISS	10	Apr 14, 2010
problem with lookup sine table	0	Jun 30, 2004
When (32-bit) double precision isn't precise enough	5	Sep 10, 2003
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Jan 12, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Mar 1, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Dec 15, 2007
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Apr 1, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Mar 15, 2008

operations with double integers

Alexander Stoyakin

santosh

user923005

Keith Thompson

santosh

christian.bau

Ernie Wright

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads