Floating point subtraction with FLT_MAX error

spooler123 · Oct 26, 2007

Just a small little program. Can not figure out what am I doing wrong.

#include <stdio.h>
#include <limits.h>
#include <float.h>

int main()
{

double max = FLT_MAX;
double sub = 16703.627681;

double result = max - sub;

printf("%f - %f = %f\n", max, sub, result);

return 0;
}

Output:
340282346638528859811704183484516925440.000000 - 16703.627681 =
340282346638528859811704183484516925440.000000

Any help would be highly appreciated.

Thanks

husterk · Oct 26, 2007

Just a small little program. Can not figure out what am I doing wrong.

#include <stdio.h>
#include <limits.h>
#include <float.h>

int main()
{

double max = FLT_MAX;
double sub = 16703.627681;

double result = max - sub;

printf("%f - %f = %f\n", max, sub, result);

return 0;

}

Output:
340282346638528859811704183484516925440.000000 - 16703.627681 =
340282346638528859811704183484516925440.000000

Any help would be highly appreciated.

Thanks

It appears that you are trying to print a double (%Lf) by using a
float (%f) hence the error you are seeing. Check out the link below
for more info on the printf() function and it's modifiers.

http://www.cplusplus.com/reference/clibrary/cstdio/printf.html

Keith
http://www.doubleblackdesign.com
http://www.doubleblackdesign.com/forums

spooler123 · Oct 26, 2007

It appears that you are trying to print a double (%Lf) by using a
float (%f) hence the error you are seeing. Check out the link below
for more info on the printf() function and it's modifiers.

http://www.cplusplus.com/reference/clibrary/cstdio/printf.html

Keithhttp://www.doubleblackdesign.comhttp://www.doubleblackdesign.com/forums

Tried that already. Same results.

%lf
340282346638528859811704183484516925440.000000 - 16703.627681 =
340282346638528859811704183484516925440.000000

%LF (Isn't it for long double?)
340282346638528859811704183484516925440.000000 - 16703.627681 =
0.000000

Thanks for the reply.

Also as far as I know printf does not care if it is %f or %lf as
everything goes as double, it is scanf which complains. Correct me if
I am wrong.

santosh · Oct 26, 2007

Just a small little program. Can not figure out what am I doing wrong.

#include <stdio.h>
#include <limits.h>
#include <float.h>

int main()
{

double max = FLT_MAX;
double sub = 16703.627681;

double result = max - sub;

printf("%f - %f = %f\n", max, sub, result);

return 0;
}

Output:
340282346638528859811704183484516925440.000000 - 16703.627681 =
340282346638528859811704183484516925440.000000

What happens if you do:
double max = DBL_MAX;

Walter Roberson · Oct 26, 2007

Just a small little program. Can not figure out what am I doing wrong.

double max = FLT_MAX;
double sub = 16703.627681;

double result = max - sub;

Floating point does not have indefinite precision. What you
have discovered is that near FLT_MAX, the numbers that your
floating point system are able to represent are more than 16703 apart.

Different systems use different schemes for floating point. One
of the most common schemes is IEEE 754,
http://en.wikipedia.org/wiki/IEEE_floating-point_standard

Joe Wright · Oct 26, 2007

Just a small little program. Can not figure out what am I doing wrong.

#include <stdio.h>
#include <limits.h>
#include <float.h>

int main()
{

double max = FLT_MAX;
double sub = 16703.627681;

double result = max - sub;

printf("%f - %f = %f\n", max, sub, result);

return 0;
}

Output:
340282346638528859811704183484516925440.000000 - 16703.627681 =
340282346638528859811704183484516925440.000000

Any help would be highly appreciated.

Thanks

I've changed your code slightly for format considerations.

#include <stdio.h>
#include <float.h>

int main(void) {
double max = FLT_MAX;
double sub = 16703.627681;
double result = max - sub;
printf("%f - %f =\n%f\n", max, sub, result);
return 0;
}

Output:
340282346638528859820000000000000000000.000000 - 16703.627681 =
340282346638528859820000000000000000000.000000

FLT_MAX has magnitude e+38 while double has precision of 16 digits or
so. Your subtrahend (minuend?) is simply too small to make a difference.

Jack Klein · Oct 26, 2007

It appears that you are trying to print a double (%Lf) by using a
float (%f) hence the error you are seeing. Check out the link below
for more info on the printf() function and it's modifiers.

Did you write this in an absent-minded moment, or are you actually
this mistaken about printf() conversion specifiers?

http://www.cplusplus.com/reference/clibrary/cstdio/printf.html

Even the page you reference states:

"L The argument is interpreted as a long double (only applies to
floating point specifiers: e, E, f, g and G)."

It is undefined behavior to pass a double to printf() with a "%Lf"
conversion specifier, although it will probably work on
implementations, like Microsoft's compilers for 32-bit Windows, where
double and long double have the same size and representation.

"%f" is, has always been, and always will be the correct conversion
specifier for double.

Keith

Also, please set your posting software to add a proper signature
delimiter, namely "-- \n".

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://c-faq.com/
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.club.cc.cmu.edu/~ajo/docs/FAQ-acllc.html

Martin Ambuhl · Oct 26, 2007

Just a small little program. Can not figure out what am I doing wrong.

#include <stdio.h>
#include <limits.h>

Note that said:
#include <float.h>

int main()
{
double max = FLT_MAX;
double sub = 16703.627681;
double result = max - sub;
printf("%f - %f = %f\n", max, sub, result);
return 0;
}

Output:
340282346638528859811704183484516925440.000000 - 16703.627681 =
340282346638528859811704183484516925440.000000

Any help would be highly appreciated.

/* The values FLT_EPSILON, DBL_EPSILON, and LDBL_EPSILON are defined
to be, for each type, the smallest x > 0.0 such that 1.0+x > x.
These suggests (but does not guarantee) that for a value y > 0.0,
y*(1.0+x) is the smallest value larger than y. These values are
closely linked to the number of significant bits in the
representation of the type. There is a limit FLT_DIG, DBL_DIG, or
LDBL_DIG representing the guaranteed number of (decimal) digits of
precision. Check the following program. */
#include <stdio.h>
#include <float.h>
#include <math.h>

int main()
{
double max = FLT_MAX;
double sub = 16703.627681;
double diff;
printf
("The following values are all dependent on the implementation.\n"
"They may or may not be similar to the values for your "
"implementation.\n\n");

printf("FLT_DIG = %d\n", FLT_DIG);
printf("FLT_MAX = %.*g, FLT_EPSILON = %.*g,\n"
"FLT_MAX - FLT_MAX/(1.0+FLT_EPSILON) = %.*g,\n"
"log10(1.0/FLT_EPSILON - 1.0) = %.*g\n\n",
FLT_DIG, FLT_MAX,
FLT_DIG, FLT_EPSILON,
FLT_DIG, FLT_MAX - FLT_MAX / (1.0 + FLT_EPSILON),
FLT_DIG, log10(1.0 / FLT_EPSILON - 1.0));

printf("DBL_DIG = %d\n", DBL_DIG);
printf("DBL_MAX = %.*g, DBL_EPSILON = %.*g,\n"
"DBL_MAX - DBL_MAX/(1.0+DBL_EPSILON) = %.*g,\n"
"log10(1.0/DBL_EPSILON - 1.0) = %.*g\n\n",
DBL_DIG, DBL_MAX,
DBL_DIG, DBL_EPSILON,
DBL_DIG, DBL_MAX - DBL_MAX / (1.0 + DBL_EPSILON),
DBL_DIG, log10(1.0 / DBL_EPSILON - 1.0));

printf("LDBL_DIG = %d\n", LDBL_DIG);
printf("LDBL_MAX = %.*Lg, LDBL_EPSILON = %.*Lg,\n"
"LDBL_MAX - LDBL_MAX/(1.0+LDBL_EPSILON) = %.*Lg,\n"
"log10(1.0/LDBL_EPSILON - 1.0) = %.*g\n\n",
LDBL_DIG, LDBL_MAX,
LDBL_DIG, LDBL_EPSILON,
LDBL_DIG, LDBL_MAX - LDBL_MAX / (1.0 + LDBL_EPSILON),
DBL_DIG, log10(1.0 / LDBL_EPSILON - 1.0));

diff = max * (1. - 1. / (1.0 + DBL_EPSILON));
printf("max is a double with value %.*g.\n"
"We expect the next lower distinguishable double to\n"
" differ from max by about %.*g.\n"
"The original poster wanted to distinguish a value %.*g\n"
"less than max, but this value is only"
" %.*g * the (likely) smallest significant difference.\n",
DBL_DIG, max, DBL_DIG, diff, DBL_DIG, sub,
DBL_DIG, sub / diff);

return 0;
}

[Output]

The following values are all dependent on the implementation.
They may or may not be similar to the values for your implementation.

FLT_DIG = 6
FLT_MAX = 3.40282e+38, FLT_EPSILON = 1.19209e-07,
FLT_MAX - FLT_MAX/(1.0+FLT_EPSILON) = 4.05648e+31,
log10(1.0/FLT_EPSILON - 1.0) = 6.92369

DBL_DIG = 15
DBL_MAX = 1.79769313486232e+308, DBL_EPSILON = 2.22044604925031e-16,
DBL_MAX - DBL_MAX/(1.0+DBL_EPSILON) = 3.99168061906944e+292,
log10(1.0/DBL_EPSILON - 1.0) = 15.653559774527

LDBL_DIG = 18
LDBL_MAX = 1.18973149535723177e+4932, LDBL_EPSILON =
1.08420217248550443e-19,
LDBL_MAX - LDBL_MAX/(1.0+LDBL_EPSILON) = 1.28990947194073851e+4913,
log10(1.0/LDBL_EPSILON - 1.0) = 18.9648897268308

max is a double with value 3.40282346638529e+38.
We expect the next lower distinguishable double to
differ from max by about 7.55578592223147e+22.
The original poster wanted to distinguish a value 16703.627681
less than max, but this value is only 2.21070684809276e-19 * the
(likely) smallest significant difference.

CBFalconer · Oct 26, 2007

#include <stdio.h>
#include <limits.h>
#include <float.h>

int main() {
double max = FLT_MAX;
double sub = 16703.627681;
double result = max - sub; /* illegal */

printf("%f - %f = %f\n", max, sub, result); /* illegal */
return 0;
}

I reformatted your code for sanity. See the added comments. A
float is not a double. An object is not a constant.

Charlie Gordon · Oct 26, 2007

CBFalconer said:
I reformatted your code for sanity. See the added comments. A
float is not a double. An object is not a constant.

What is illegal on the lines you commented ?

James Kuyper · Oct 26, 2007

Just a small little program. Can not figure out what am I doing wrong.

#include <stdio.h>
#include <limits.h>
#include <float.h>

int main()
{

double max = FLT_MAX;
double sub = 16703.627681;

double result = max - sub;

printf("%f - %f = %f\n", max, sub, result);

return 0;
}

Output:
340282346638528859811704183484516925440.000000 - 16703.627681 =
340282346638528859811704183484516925440.000000

Try this:
#include <math.h>
#include <stdio.h>
#include <limits.h>
#include <float.h>

int main(void)
{
double max = FLT_MAX;
double next = nextafter(max, DBL_MAX);
double min_diff = next - max;
printf("%f - %f = %f\n", next, max, min_diff);
return 0;
}

On my desktop, I got:
340282346638528897590636046441678635008.000000
-340282346638528859811704183484516925440.000000
=37778931862957161709568.000000

nextafter(x,y) is the next representable number after x, in the
direction of y. Therefore, adding 16703.627681 to FLT_MAX gives a number
that is not sufficiently different from FLT_MAX to have a different
representation from FLT_MAX.
Note: nextafter() was added in the 1999 version of the C standard; if
you're compiling in C90 mode, it might not be available.

CBFalconer · Oct 26, 2007

Charlie said:
What is illegal on the lines you commented ?

An object is not a constant. A float is not a double.

Walter Roberson · Oct 26, 2007

I reformatted your code for sanity. See the added comments. A
float is not a double. An object is not a constant.

Your first comment, "a float is not a double":

1A) The comment could potentially make a difference in
double max = FLT_MAX;
but only if FLT_MAX could exceed DBL_MAX; otherwise the usual
promotions would take care of converting the float to double for
storage in max; {In C89 I don't immediately see a prohibition
against FLT being bigger than DBL; it is implied by the usual
promotions, though.}

2B) the comment could potentially make a difference in
double sub = 16703.627681;
but the usual promotions take care of that conversion, and
16703.627681 is guaranteed to be within the representable range of
a double

2C) the comment could potentially make a difference in
printf("%f - %f = %f\n", max, sub, result);
but according to C89 4.9.6.1 "The fprintf Function",

f the double argument is converted to decimal notation
in the style [-]ddd.ddd where the number of digits after
the decimal-point character is equal to the precision
specification.

Therefore it is completely legal (and requried!) to pass a double
in at a position to be printed with a %f format.

Your second comment, "An object is not a constant": yes, but so what?
C89 3.5.7 Initialization

All the expressions in an initializer for an object that has
static storage duration or in an initializer list for an object
that has aggregate or union type shall be constant expressions.

If the declaration of an identifier has block scope, and the identifier
has external or internal linkage, the declaration shall have no
initializer for the identifier.

Neither of these apply: the object "result" is automatic storage
duration, not static storage duration, and the object "result" has
block scope but does not have external or internal linkage. Therefore
it is legal to initialize "result", and it is legal to initialize
it with a non-constant expression.

If you have different interpretations of the standards, you
are invited to cite the appropriate sections and clauses.

spooler123 · Oct 26, 2007

CBFalconer said:
CBFalconer said:

(e-mail address removed) wrote:

Click to expand...

I reformatted your code for sanity. See the added comments. A
float is not a double. An object is not a constant.

Click to expand...

Your first comment, "a float is not a double":

1A) The comment could potentially make a difference in
double max = FLT_MAX;
but only if FLT_MAX could exceed DBL_MAX; otherwise the usual
promotions would take care of converting the float to double for
storage in max; {In C89 I don't immediately see a prohibition
against FLT being bigger than DBL; it is implied by the usual
promotions, though.}

2B) the comment could potentially make a difference in
double sub = 16703.627681;
but the usual promotions take care of that conversion, and
16703.627681 is guaranteed to be within the representable range of
a double

2C) the comment could potentially make a difference in
printf("%f - %f = %f\n", max, sub, result);
but according to C89 4.9.6.1 "The fprintf Function",

f the double argument is converted to decimal notation
in the style [-]ddd.ddd where the number of digits after
the decimal-point character is equal to the precision
specification.

Therefore it is completely legal (and requried!) to pass a double
in at a position to be printed with a %f format.

Your second comment, "An object is not a constant": yes, but so what?
C89 3.5.7 Initialization

All the expressions in an initializer for an object that has
static storage duration or in an initializer list for an object
that has aggregate or union type shall be constant expressions.

If the declaration of an identifier has block scope, and the identifier
has external or internal linkage, the declaration shall have no
initializer for the identifier.

Neither of these apply: the object "result" is automatic storage
duration, not static storage duration, and the object "result" has
block scope but does not have external or internal linkage. Therefore
it is legal to initialize "result", and it is legal to initialize
it with a non-constant expression.

If you have different interpretations of the standards, you
are invited to cite the appropriate sections and clauses.
--
"I will speculate that [...] applications [...] could actually see a
performance boost for most users by going dual-core [...] because it
is running the adware and spyware that [...] are otherwise slowing
down the single CPU that user has today" -- Herb Sutter

Thank you everyone for your replies. It cleared a lot of my confusions
about floating point.

Charlie Gordon · Oct 26, 2007

CBFalconer said:
An object is not a constant. A float is not a double.

What object?
As far as floats, I can see FLT_MAX being converted to double to be stored
into max.
%f format in printf takes a double as provided.
result is assigned a non constant expression, but what problem does it pose
for a local variable with automatic storage?

I fail to see any problem with this code.

CBFalconer · Oct 26, 2007

Walter said:
Your first comment, "a float is not a double":
.... snip ...

Therefore it is completely legal (and requried!) to pass a double
in at a position to be printed with a %f format.

Your second comment, "An object is not a constant": yes, but so what?

The float illegal comment was wrong. I was erroneously referring
to the printf statement. The point of the second is that the
"double result = ...;" line requires a constant to perform the
initialization, or so I thought. I guess the fact that it is an
automatic object makes a difference. This leaves very little
useful in my post. I shall hang my head in abject shame, and
accept wet noodle flogging.

Keith Thompson · Oct 26, 2007

CBFalconer said:
An object is not a constant. A float is not a double.

Both of those repeated statements are correct. Neither is responsive
to the question.

The correct answer is that *nothing* is illegal on either line.
Neither line uses an object in a context that requires a constant,
or vice versa. Neither line uses a float in a context that requires
a double, or vice versa. The program is legal and portable
(and fails to be strictly conforming only because its output is
implementation-defined).

If you disagree, please explain without repeating your previous
statements.

The problem that I presume the OP was worried about is that
``max - sub'' yielded the same value as ``sub''. In fact, this is
to be expected, given the finite precision of floating-point
operations.

All floating-point expressions in the program, other than FLT_MAX,
are of type double. "%f" is a correct format for printing a value
of type double. ("%f" can also be used for type float, since float
arguments are promoted to type double. "%Lf" is for type long double,
which is not used here.)

Initializing an object of type double with the value of FLT_MAX is odd,
and probably not what the OP really intended, but it's perfectly legal.
Using DBL_MAX would make more sense here.

CBFalconer · Oct 27, 2007

Keith said:
Both of those repeated statements are correct. Neither is responsive
to the question.

The correct answer is that *nothing* is illegal on either line.
Neither line uses an object in a context that requires a constant,
or vice versa. Neither line uses a float in a context that requires
a double, or vice versa. The program is legal and portable
(and fails to be strictly conforming only because its output is
implementation-defined).

If you disagree, please explain without repeating your previous
statements.

I don't disagree, and I made an abject apology about 6 hours ago.

Java OpenJDK Floating Point Dare	3	Jan 17, 2023
question : floating point precision.	41	Jun 4, 2009
Using type prefixes with floating point constants	4	Mar 26, 2009
[C language] Issue in the Lotka-Volterra model.	0	Jun 28, 2023
Addition and substraction of polynomials is working fine but the multiplication isn't; what's wrong with my code	1	Nov 22, 2022
integer and floating point casts queries	14	Dec 8, 2008
Trouble with integer floating point conversion	49	Dec 12, 2007
floating point arithmetic	34	Jul 17, 2009

Floating point subtraction with FLT_MAX error

spooler123

husterk

spooler123

santosh

Walter Roberson

Joe Wright

Jack Klein

Martin Ambuhl

CBFalconer

Charlie Gordon

James Kuyper

CBFalconer

Walter Roberson

spooler123

Charlie Gordon

CBFalconer

Keith Thompson

CBFalconer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads