inaccurate floating point and DBL_MAX

S

Siemel Naran

About inaccurate floating point and DBL_MAX.

double x = 2;
double y = 1 + 1;
assert(x == y);

Because of inaccurate floating point representation, x and y may not be
equal (ie. they may differ by 0.0000000001 or some small number) on some
implementations.

But how about DBL_MAX?

double x = DBL_MAX;
f(x);
assert(x != DBL_MAX);

Is this above comparison exact, or should we say assert(x < DBL_MAX).

Thanks.
 
R

Ron Natalie

Siemel said:
About inaccurate floating point and DBL_MAX.

double x = 2;
double y = 1 + 1;
assert(x == y);

Because of inaccurate floating point representation, x and y may not be
equal (ie. they may differ by 0.0000000001 or some small number) on some
implementations.

Actually, I've never seen such an implementation. The standard says that
if you have values that can not be precisely represented in a floating
variable then one of the two adjacent values is chosen. THIS IS WHERE
THE IMPRECISION COMES FROM along with the assumption that non-repreating
decimal fractions are non-repeating binary fractions as well.

I've never seen an implementation that can't precisely represent 1 and 2.
But how about DBL_MAX?

double x = DBL_MAX;
f(x);
assert(x != DBL_MAX);

Since DBL_MAX is of type double, it's already got a double representation.
There's no imprecision. You can test it for equality (provided you haven't
converted it to some other type or done calculations with it).
 
J

Jerry Coffin

Siemel said:
About inaccurate floating point and DBL_MAX.

double x = 2;
double y = 1 + 1;
assert(x == y);

Because of inaccurate floating point representation, x and y may not
be equal (ie. they may differ by 0.0000000001 or some small number)
on some implementations.

This is incorrect for a couple of reasons -- first of all, '1+1' is an
integer expression so it is done as an integer computation, then the
result is converted to a double. IOW, in both cases, you're creating 2
(as an integer) and then converting that integer to a double. The
result is clearly the same in both cases.

Even if you changed it to something like:

double x = 2.0;
dobule y = 1.0 + 1.0;
assert (x==y);

the assertion still can't fail on any conforming implementation of C++.
Even for a floating point type, there is a range of integers that must
be represented exactly, and 1 and 2 fall (well) inside of that range.
But how about DBL_MAX?

double x = DBL_MAX;
f(x);
assert(x != DBL_MAX);

Is this above comparison exact, or should we say assert(x < DBL_MAX).

DBL_MAX is (by definition) the largest possible double. That means
x!=DBL_MAX and x<DBL_MAX mean exactly the same thing unless you want to
deal with infinity, NaNs, etc. As far as precision goes, however,
DBL_MAX is already a double value, so it's not rounded during
assignment to a double.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,802
Messages
2,569,662
Members
45,432
Latest member
KetoMeltIngredients

Latest Threads

Top