x/x=1.0 in double-Arithmetik

T

Tobias

Let:

double x,y;
double z = max(x,y);

Does the standard ensure that

( z == 0.0 ) || ( x/z == 1.0 ) || ( y/z == 1.0 )

always gives true?

With best regards,
Tobias
 
V

Victor Bazarov

Tobias said:
Let:

double x,y;

Uninitialised. Contain indeterminate values, possibly such that
can cause a hardware exception. And, of course, there is no
guarantee that 'x' and 'y' contain the same indeterminate value.
double z = max(x,y);

Passing indeterminate values to 'max' probably leads to undefined
behaviour.
Does the standard ensure that

( z == 0.0 ) || ( x/z == 1.0 ) || ( y/z == 1.0 )

always gives true?

No guarantees because of indeterminate values used.

V
 
?

=?ISO-8859-1?Q?Erik_Wikstr=F6m?=

Let:

double x,y;
double z = max(x,y);

Does the standard ensure that

( z == 0.0 ) || ( x/z == 1.0 ) || ( y/z == 1.0 )

always gives true?

No, the first one requires that either x, y, or both are zero and the
other is less than zero, which the standard can not guarantee.

The second ones requires exact arithmetic to always be true, due to the
way floating point operations this might not always be the case, but it
will be close.
 
T

Tobias

Passing indeterminate values to 'max' probably leads to undefined
behaviour.

I'm sorry, I should have written:

bool foo(double x, double y)
{
double z = max(x,y);
return ( z == 0.0 ) || ( x/z == 1.0 ) || ( y/z == 1.0 );
}

That would make the real question more obvious.

Thanks for the comment anyway.

With best regards,
Tobias
 
T

Tobias

AFAIK this is wrong if x or y is positive infinity; there may well be
other such problems with "special values".

Thanks for the comment.
In my special application infinity will not occur.

With best regards,
Tobias
 
T

Tobias

The second ones requires exact arithmetic to always be true, due to the
way floating point operations this might not always be the case, but it
will be close.

That is the real question. I'm especially interested in the case when
x and y become very small.

Somebody assured me that division a double by the same double (in the
same representation) gives always the exact double 1.0.

But might there be a problem with normalization or with representation
of the double numbers in the registers of the numerical core?

Best regards,
Tobias
 
A

Alf P. Steinbach

* Tobias:
I'm sorry, I should have written:

bool foo(double x, double y)
{
double z = max(x,y);
return ( z == 0.0 ) || ( x/z == 1.0 ) || ( y/z == 1.0 );
}

That would make the real question more obvious.

Unless x and y are special values, with modern floating point
implementations this is guaranteed. Floating point division, with a
modern floating point implementation, isn't inherently inexact: it just
rounds the result to some specific number of binary digits. Hence x/x,
with normal non-zero numbers, produces exactly 1.

Effectively, therefore, the foo function above checks whether x or y has
a special value where the relationship doesn't hold.

For IEEE floating point numbers this would be NaN and Infinity values.

According to the IEEE standard infinities are calculated with as limits,
however Inf/Inf is singled out as a special case producing NaN.

std::numeric_limits refers to the IEEE 754 standard as IEC 559 or
something, so in order to check whether you're using IEEE standard
floating point you'll have to use the member named is_iec_Something.

Cheers, and hth.,

- Alf
 
D

Daniel Kraft

Tobias said:
Let:

double x,y;
double z = max(x,y);

Does the standard ensure that

( z == 0.0 ) || ( x/z == 1.0 ) || ( y/z == 1.0 )

always gives true?

I'm not that proficient with IEEE 754 floating point arithmetic, but
AFAIK this is wrong if x or y is positive infinity; there may well be
other such problems with "special values".

Cheers,
Daniel
 
M

Markus Schoder

That is the real question. I'm especially interested in the case when x
and y become very small.

Somebody assured me that division a double by the same double (in the
same representation) gives always the exact double 1.0.

That is true for IEEE 754 conforming arithmetic. Basic arithmetic
operations need to be done with exact rounding i.e. conceptually the
operation is carried out with infinite precision and then rounded (using
the current rounding mode) to the nearest representable number.
But might there be a problem with normalization or with representation
of the double numbers in the registers of the numerical core?

Since on the Intel architecture double and extended precision are mixed
and the actual CPU instructions make it very hard to avoid this most IEEE
754 guarantees are moot (thank you Intel).

For stable numerical behaviour you need to avoid the Intel architecture
or force your compiler to use the newer SSE2 instruction set exclusively
for floating point arithmetic if possible.
 
P

pan

* Tobias:
Unless x and y are special values, with modern floating point
implementations this is guaranteed. Floating point division, with a
modern floating point implementation, isn't inherently inexact: it
just rounds the result to some specific number of binary digits.
Hence x/x, with normal non-zero numbers, produces exactly 1.

I had a funny experience that allows me to say that the number of
binarydigits is not so specific, but can vary unexpectedly =)

I had this compare predicate which gave me funny assertions when used
incombination with particular optimizations:

(unchecked code, just to give the idea)

struct Point {
double a, b;
};

bool anglesorter(Point a, Point b) {
return atan2(a.y, a.x) < atan2(b.y, b.x);
}

which, passed to a sort function, lead to assertions when a point in
the sequence was replicated. The assertion (inserted by microsoft as a
checkfor the predicates) has this condition:

pred(a, b) && !pred(b, a)

which is quite funny. After disassembling i found out that one of the
two atan2 was kept in a x87 register (80-bit), while the other was
stored inmemory (64-bit precision), so the result of

anglesorter(a, a)

would give true for some values of a.

Now, this doesn't look the same as the OP's example, but I wouldn't
trust the floating point optimizations again: what happens if foo gets
inlined, its parameters are into extended precision register, but the
result ofmax, for some odd reason, does get trimmed to
double-precision?

I must admit I don't like the alternative version too:

x/z == 1.0

becomes

fabs(x/z-1.0) < numeric_limits<double>::epsilon()

or, avoiding divisions:

fabs(x-z) < numeric_limits<double>::epsilon()*z

because they are both unreadable.
I'd try to avoid the need of such a compare in the first
place.
 
A

Alf P. Steinbach

* pan:
which is quite funny. After disassembling i found out that one of the
two atan2 was kept in a x87 register (80-bit), while the other was
stored inmemory (64-bit precision), so the result of

anglesorter(a, a)

would give true for some values of a.

Now, this doesn't look the same as the OP's example, but I wouldn't
trust the floating point optimizations again: what happens if foo gets
inlined, its parameters are into extended precision register, but the
result ofmax, for some odd reason, does get trimmed to
double-precision?

This sounds like a case of not really having established the reason,
e.g., with a clearly established reason you'd be able to say that under
such and such conditions, x == x would yield false (which it does for
IEEE NaN). That said, Visual C++ 6.0 or thereabouts had notoriously
unreliable floating point optimization. If I had to guess, I'd guess
that then year old compiler, but simply encountering a NaN or two
without understanding it could also be the Real Problem.

If for a normal value x = x evaluates to false, then you have a serious
bug in e.g. the compiler.

It would be like a car that in some rare cases exploded like a nuclear
bomb: if such a car could exist it should be taken off the market
immediately. It's so serious that we don't really expect such a car to
ever be on the market in the first place. So when people report having
encountered such exploding cars, we don't really believe them.
 
P

Pete Becker

* pan:

This sounds like a case of not really having established the reason,
e.g., with a clearly established reason you'd be able to say that under
such and such conditions, x == x would yield false (which it does for
IEEE NaN). That said, Visual C++ 6.0 or thereabouts had notoriously
unreliable floating point optimization. If I had to guess, I'd guess
that then year old compiler, but simply encountering a NaN or two
without understanding it could also be the Real Problem.

But this example isn't x == x, it's anglesorter(x, x), where
anglesorter applies the same computation to both of its arguments. C
and C++ both allow the implementation to do computations involving
floating point values at higher precision than that of the actual type.
For x87, this means that computations involving doubles (64 bits) can
be done at full width for the processor (80 bits). When you compare the
result of atan2 computed at 80 bits with the result of atan2 rounded to
64 bits, the results will almost certainly be different. And that's
allowed by the language definition (in C99 you can check how the
implementation handles this by looking at the value of the macro
FLT_EVAL_METHOD).

What isn't allowed is hanging on to the extra precision across a store:

double x1 = atan2(a.y, a.x);
double x2 = atan2(b.y, b.x);
return x1 < x2;

In this case, the values of both x1 and x2 must be rounded to double,
even if the compiler elides the actual store and holds them in
registers. Many compilers skip this step, because it's an impediment to
optimization, unless you set a command line switch to tell the compiler
that your really mean it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top