Are floating-point zeros required to stay exact?

A

army1987

Is this guaranteed to always work in all conforming implementations?

#include <stdio.h>
#include <stdlib.h>
int main(void)
{
double zero = 0.;
if (zero == 0.) {
puts("Okay!");
return 0;
} else {
puts("Uh-oh!");
return EXIT_FAILURE;
}
}
 
S

Shao Miller

Is this guaranteed to always work in all conforming implementations?

#include <stdio.h>
#include <stdlib.h>
int main(void)
{
double zero = 0.;
if (zero == 0.) {
puts("Okay!");
return 0;

Might as well've used 'EXIT_SUCCESS', there.
} else {
puts("Uh-oh!");
return EXIT_FAILURE;
}
}

I'd say so, but I think you meant to ask if it would output "Okay!" on
all conforming implementations, which I'd also say "yes" to.
 
J

James Kuyper

Is this guaranteed to always work in all conforming implementations?

#include <stdio.h>
#include <stdlib.h>
int main(void)
{
double zero = 0.;
if (zero == 0.) {
puts("Okay!");
return 0;
} else {
puts("Uh-oh!");
return EXIT_FAILURE;
}
}

No.
"The accuracy of the floating-point operations (+, -, *, /) and of the
library functions in <math.h> and <complex.h> that return floating-point
results is implementation defined, as is the accuracy of the conversion
between floating-point internal representations and string
representations performed by the library functions in <stdio.h>,
<stdlib.h>, and <wchar.h>. The implementation may state that the
accuracy is unknown." (5.2.4.2.2p6)

In principle, that clause allows LDBL_MIN==LDBL_MAX to be true; only by
checking your implementation's documentation can you be sure that the
accuracy of your implementation is good enough for LDBL_MIN==LDBL_MAX to
be guaranteed false. In practice, floating point roundoff errors tend to
be proportional to the magnitude of the relevant values, so zero == 0.
should be exact on almost all implementations.
 
S

Shao Miller

No.
"The accuracy of the floating-point operations (+, -, *, /)

Which do not appear in the code.
and of the
library functions in <math.h> and <complex.h> that return floating-point
results

Which do not appear in the code.
is implementation defined, as is the accuracy of the conversion
between floating-point internal representations

I do not see conversions happening in the code.
and string
representations performed by the library functions in <stdio.h>,
<stdlib.h>, and <wchar.h>.

No library functions are using floating types in the code and there are
no string representations of a floating value in the code.
The implementation may state that the
accuracy is unknown." (5.2.4.2.2p6)

[...Depends on above...]
 
E

Eric Sosman

No.
"The accuracy of the floating-point operations (+, -, *, /) and of the
library functions in <math.h> and <complex.h> that return floating-point
results is implementation defined, as is the accuracy of the conversion
between floating-point internal representations and string
representations performed by the library functions in <stdio.h>,
<stdlib.h>, and <wchar.h>. The implementation may state that the
accuracy is unknown." (5.2.4.2.2p6)

I don't think this paragraph applies. None of (+, -, *, /)
is performed, no library function from <math.h> or <complex.h> is
used, and there are no conversions from strings to floating-point.

Instead, I think we have to rely on 6.4.4.2p5:

"Floating constants are converted to internal format as
if at translation-time. [...] All floating constants of
the same source form shall convert to the same internal
format with the same value."

Thus, both appearances of `0.' produce the same value, and the
same value compares equal to itself. (In p7 we learn that the
value "should" be the same that the library would produce for a
conversion, but that's a "should" and not a "shall," and is in
a subsection titled "Recommended practice." I think it would be
hard to argue that p7's "should" brings 5.2.4.2.2p6 into play.)

I suppose that although both appearances of `0.' must produce
the same value, that value might not be "zero." If instead of
`if (zero == 0.)' the program had used `if (!zero)', perhaps it
is possible that "Uh-oh!" could be printed. On a really perverse
implementation, I imagine `0.' and `0.0' and `.0' and `0e0' and so
on could produce different values; it might be a QOI issue.
In principle, that clause allows LDBL_MIN==LDBL_MAX to be true; only by
checking your implementation's documentation can you be sure that the
accuracy of your implementation is good enough for LDBL_MIN==LDBL_MAX to
be guaranteed false. [...]

Could you explain your reasoning here? 5.2.4.2.2p{12,13}
require that LDBL_{MAX,MIN} be representable long double values,
hence, no round-off or approximation is involved. Also, `==' is
not in the list of operators whose accuracy is unspecified (and
indeed, floating-point `==' can be evaluated without any rounding
at all).
 
J

James Kuyper

On 2/27/2013 3:29 PM, James Kuyper wrote: ....

I don't think this paragraph applies. None of (+, -, *, /)
is performed,

I've always assumed that "floating point operations" was the key phrase,
and that "(+, -, *, /)" should be taken only as examples, implying, in
particular, that the relational and equality operators were also
intended to be covered by that clause.

On the other hand, You might be right. If so, does that mean that the
unary +, unary -, !, ?:, ++, --, and compound assignment operators
acting on floating point values are also not covered, and must therefore
always return exact values? That's trivial to achieve for the first four
of those operators, but I don't think it's possible for the others - but
perhaps the others are covered by their definitions in terms of the four
explicitly-listed operations.

Still,

LDBL_MIN + LDBL_EPSILON == LDBL_MAX - LDBL_EPSILON

is unambiguously covered by that clause, and the same is true of

nextafterl(LDBL_MIN, 0.0) == nextafterl(LDBL_MAX,0.0)

and having either of those evaluate as true is an equally disconcerting
possibility.
 
S

Shao Miller

I've always assumed that "floating point operations" was the key phrase,
and that "(+, -, *, /)" should be taken only as examples, implying, in
particular, that the relational and equality operators were also
intended to be covered by that clause.

On the other hand, You might be right. If so, does that mean that the
unary +, unary -, !, ?:, ++, --, and compound assignment operators
acting on floating point values are also not covered, and must therefore
always return exact values? That's trivial to achieve for the first four
of those operators, but I don't think it's possible for the others - but
perhaps the others are covered by their definitions in terms of the four
explicitly-listed operations.

Still,

LDBL_MIN + LDBL_EPSILON == LDBL_MAX - LDBL_EPSILON

is unambiguously covered by that clause, and the same is true of

nextafterl(LDBL_MIN, 0.0) == nextafterl(LDBL_MAX,0.0)

and having either of those evaluate as true is an equally disconcerting
possibility.

I seem to recall that 0 is special. It can be positive or negative, but
it is zero. It's not "imprecise zero," it's zero. The result of a
computation might or might not be zero, but there is no computation, here.

Consider 5.2.4.2.2p4's mention of "zero" alongside infinities and NaNs.
These three are all special. Also consider the alternative...
6.5.9p3 says that it'll be either true or false for a given pair. This
seems to suggest that the values matter, and the values are the same,
and having them compare as unequal would be nonsense.
 
L

Les Cargill

James said:
No.
"The accuracy of the floating-point operations (+, -, *, /) and of the
library functions in <math.h> and <complex.h> that return floating-point
results is implementation defined, as is the accuracy of the conversion
between floating-point internal representations and string
representations performed by the library functions in <stdio.h>,
<stdlib.h>, and <wchar.h>. The implementation may state that the
accuracy is unknown." (5.2.4.2.2p6)

In principle, that clause allows LDBL_MIN==LDBL_MAX to be true; only by
checking your implementation's documentation can you be sure that the
accuracy of your implementation is good enough for LDBL_MIN==LDBL_MAX to
be guaranteed false. In practice, floating point roundoff errors tend to
be proportional to the magnitude of the relevant values, so zero == 0.
should be exact on almost all implementations.


If, for an implementation, it does not hold that "Okay!" is printed,
I would consider it strong evidence that no floating point should be
used for that implementation. If it must be used, then something
close to an exhaustive test suite would be required - which is another
way of saying "don't use floating point."
 
F

Fred J. Tydeman

On a really perverse
implementation, I imagine `0.' and `0.0' and `.0' and `0e0' and so
on could produce different values; it might be a QOI issue.

You have listed four representations of zero.
At least two of them must have the same value.

6.4.4.2 Floating constants, paragraph 3 has:

For decimal floating constants,
and also for hexadecimal floating constants when FLT_RADIX is not a power of 2,
the result is either the nearest representable value,
or the larger or smaller representable value
immediately adjacent to the nearest representable value,
chosen in an implementation-defined manner.

So, any given source FP value can convert into one of three
representable values.

Annex F (if in effect) requires that all the various forms of zero
all convert to the same value.
---
Fred J. Tydeman Tydeman Consulting
(e-mail address removed) Testing, numerics, programming
+1 (775) 287-5904 Vice-chair of PL22.11 (ANSI "C")
Sample C99+FPCE tests: http://www.tybor.com
Savers sleep well, investors eat well, spenders work forever.
 
S

Shao Miller

If, for an implementation, it does not hold that "Okay!" is printed,
I would consider it strong evidence that no floating point should be
used for that implementation. If it must be used, then something
close to an exhaustive test suite would be required - which is another
way of saying "don't use floating point."

Is there any floating point representation where precise zero cannot be
represented? Strong evidence, indeed. :)
 
G

glen herrmannsfeldt

Eric Sosman said:
I don't think this paragraph applies. None of (+, -, *, /)
is performed, no library function from <math.h> or <complex.h> is
used, and there are no conversions from strings to floating-point.
Instead, I think we have to rely on 6.4.4.2p5:
"Floating constants are converted to internal format as
if at translation-time. [...] All floating constants of
the same source form shall convert to the same internal
format with the same value."

I believe that C is more strict on floating point values
than Fortran. As I understand it, a valid Fortran implementation
could give 46 as the value of all floating point operations,
though not so good in QoI.

As mentioned, some implementations have different routines
for compile time conversion and run-time (library routine)
conversions. That is especially true for cross compilers.
(And, even more, when the host and target have different
floating point representation.)
Thus, both appearances of `0.' produce the same value, and the
same value compares equal to itself. (In p7 we learn that the
value "should" be the same that the library would produce for a
conversion, but that's a "should" and not a "shall," and is in
a subsection titled "Recommended practice." I think it would be
hard to argue that p7's "should" brings 5.2.4.2.2p6 into play.)

Not likely, but I suppose someone might design a system with
no representation for zero, but instead only allow for the
smallest positive value. Though in that case I would hope that
the equality would still be true. It might be, though, that
the value with all bits zero isn't 0.0.
I suppose that although both appearances of `0.' must produce
the same value, that value might not be "zero." If instead of
`if (zero == 0.)' the program had used `if (!zero)', perhaps it
is possible that "Uh-oh!" could be printed. On a really perverse
implementation, I imagine `0.' and `0.0' and `.0' and `0e0' and so
on could produce different values; it might be a QOI issue.
In principle, that clause allows LDBL_MIN==LDBL_MAX to be true; only by
checking your implementation's documentation can you be sure that the
accuracy of your implementation is good enough for
LDBL_MIN==LDBL_MAX to be guaranteed false. [...]
Could you explain your reasoning here? 5.2.4.2.2p{12,13}
require that LDBL_{MAX,MIN} be representable long double values,
hence, no round-off or approximation is involved. Also, `==' is
not in the list of operators whose accuracy is unspecified (and
indeed, floating-point `==' can be evaluated without any rounding
at all).

A low QoI implementation could have only one value, and it might
not be zero. As far as I know, that is true in Fortran, but
maybe not in C.

-- glen
 
G

glen herrmannsfeldt

Shao Miller said:
(snip)
(snip)
I seem to recall that 0 is special. It can be positive or negative,
but it is zero. It's not "imprecise zero," it's zero. The result
of a computation might or might not be zero, but there is no
computation, here.

Yes, I believe that zero is special in that -0.0 has to compare the
same as +0.0, but that wasn't the question.

The biggest cause of floating point comparison problems is the
x87 keeping temporary values with extra precision. That shouldn't
cause problems with zero, but can easily cause problems when
subtracting values that are supposed to be equal, and then
comparing to zero.

Then there is the Cray machine with non-commutative multiplication.
I don't remember that zero was affected, but it might be that
the expression 0.0*x==0.0 could be false on that one.
Consider 5.2.4.2.2p4's mention of "zero" alongside infinities and NaNs.
These three are all special. Also consider the alternative...
6.5.9p3 says that it'll be either true or false for a given pair. This
seems to suggest that the values matter, and the values are the same,
and having them compare as unequal would be nonsense.

Well, pretty much you get no more than the machine gives you.
If you run on a machine with strange properties, then expect
strange results.

-- glen
 
G

glen herrmannsfeldt

(snip)
I'm not sure if we're discussing only IEEE-style FP or not, but there
have been FP formats* where the mantissa was stored twos-complement.
On such a machine negation could well overflow.
*The CDC Cyber 205, for example. Whether or not there have been C
implementations for such is a different question.

I believe many DSPs also used twos complement significand.

The PDP-10 twos complements the whole word.

-- glen
 
J

James Kuyper

On 02/27/2013 09:30 PM, Robert Wessel wrote:
....
I'm not sure if we're discussing only IEEE-style FP or not, but there

I'm discussing the fact that the C standard allows arbitrarily low
accuracy for floating point operations. It even allows accuracy so low
that LDBL_EPSILON - LDBL_MAX == LDBL_MAX - LDBL_EPSILON. That's
definitively not IEEE compliant. IEEE requires the maximum feasible
accuracy (or very close to it) for every operation.
 
F

Fred J. Tydeman

Is there any floating point representation where precise zero cannot be
represented? Strong evidence, indeed. :)

Systems that use a logarithmic number system:
http://en.wikipedia.org/wiki/Logarithmic_number_system
---
Fred J. Tydeman Tydeman Consulting
(e-mail address removed) Testing, numerics, programming
+1 (775) 287-5904 Vice-chair of PL22.11 (ANSI "C")
Sample C99+FPCE tests: http://www.tybor.com
Savers sleep well, investors eat well, spenders work forever.
 
B

Ben Bacarisse

army1987 said:
Is this guaranteed to always work in all conforming implementations?

#include <stdio.h>
#include <stdlib.h>
int main(void)
{
double zero = 0.;
if (zero == 0.) {
puts("Okay!");
return 0;
} else {
puts("Uh-oh!");
return EXIT_FAILURE;
}
}

The thread seems to have gone off on some tangent about operations that
are not involved here so I'll just stick these marks "out there" in case
they are useful.

The model that C uses for floating point means that all conforming
implementations must have at least one exact representation for zero.

For decimal floating constants like '0.', C does not guarantee that it
will be converted to this value! The conversion may produce either of
the two adjoining values, but the implementation must specify what
happens. If it says that the conversion is always done the same way,
the two constants will convert to the same value and the test will
evaluate to true, even if the value used is not exactly zero.

Form a QoI point of view, I can't conceive of an implementation that
does not convert '0.' to an exact representation of zero much less one
where the two '0.'s in the code above convert to different values.

Curiously, using 0 instead of '0.' is safer because the conversion
from an integer constant to a floating point value must be exact if the
value can be represented in the floating type.

If you know that you will be using a floating point system that has a
radix that is a power of two, you could also use a hexadecimal floating
point constant because that, too, will be "exactly rounded", but there
is no advantage over using an integer constant.
 
J

James Kuyper

...but not permitted in C implementations. The C standard lays out a
model for how floating point types must be represented, and zero can
always be represented exactly.

Footnote 21: "The floating-point model is intended to clarify the
description of each floating-point characteristic and does not require
the floating-point arithmetic of the implementation to be identical."
 
G

glen herrmannsfeldt

Footnote 21: "The floating-point model is intended to clarify the
description of each floating-point characteristic and does not require
the floating-point arithmetic of the implementation to be identical."

In systems that don't use a hidden '1', there is a natural zero.
With a hidden '1', zero is a special case. If you leave out the
special case, the result is similar to logarithmic, though addition
is much easier. (Note the reason why floating point does not
have a 'mantissa'.)

-- glen
 
T

Tim Rentsch

Ben Bacarisse said:
...but not permitted in C implementations. The C standard lays out a
model for how floating point types must be represented, and zero can
always be represented exactly.

For those who don't know, Fred Tydeman specializes in concerns
related to floating-point arithmetic, and also is a long-standing
member of WG14 (I believe since its inception, but I'm not sure
about that). He may be the world's foremost authority on
floating point in ISO C; he certainly is among the group who
are candidates for that distinction.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top