Are multiplication and division of double exactly defined under C99with __STDC_IEC_559__ ?

F

Francois Grieu

[reposted with tweaks, and code that compiles]

Assume a conforming C99 implementation that defines __STDC_IEC_559__.
"double" is a floating point type capable of exactly representing all
integers in the range [0..0x1FFFFFFFFFFFFF] (that's 53 bit). Also assume
FE_TONEAREST is in effect.

Can it be said that multiplication and division of double are exactly
defined? Or/and that when x and y are in range [0..0x1FFFFFFFFFFFFF],
the result of the following functions mym() and myd() is fully specified ?

Note: that would seem to imply over 105 bits of internal precision on
intermediary results before the rounding occurs.

TIA,

Francois Grieu


#include <stdlib.h>
#include <math.h>
#include <fenv.h>
#include <stdio.h>

#ifndef __STDC_IEC_559__
// #error "No claim IEC 60559 FP is available"
#endif

#ifndef FE_TONEAREST
#error "error: FE_TONEAREST is not available"
#endif

typedef unsigned long long tu53;

// question: is that portable?
// it would imply implies a 106-bit wide multiplier output
tu53 mym(tu53 x, tu53 y)
{
return llrint( ( ((double)1/(1ull<<53)) * x ) * y );
}

// question: is that portable?
// it would imply implies a 106-bit wide multiplier output
tu53 myd(tu53 x, tu53 y)
{
return y > x ? llrint( ( (double)(1ull<<53) * x ) / y ) : 0;
}

tu53 myatou53(const char *inStr)
{
long long x;
return inStr!=NULL && (x = atoll(inStr))>=0 && x<1ull<<53 ? x : 0;
}

// for each pair of arguments x y, sghow x y mym(x,y) myd(x,y)
int main(int argc, char *argv[])
{
int j;
if (fesetround(FE_TONEAREST)==0)
for( j=2; j<argc; j+= 2 )
{
tu53 x,y;
x = myatou53(argv[j-1]);
y = myatou53(argv[j]);
// <OT> if that does not work under Win32, change ll to I64 </OT>
printf("#\t%llu\t%llu\t%llu\t%llu\n",
x, y, mym(x,y), myd(x,y) );
}
return EXIT_SUCCESS;
}
 
T

Thad Smith

Francois said:
[reposted with tweaks, and code that compiles]

Assume a conforming C99 implementation that defines __STDC_IEC_559__.
"double" is a floating point type capable of exactly representing all
integers in the range [0..0x1FFFFFFFFFFFFF] (that's 53 bit). Also assume
FE_TONEAREST is in effect.

Can it be said that multiplication and division of double are exactly
defined?

After a single multiply or divide, followed by a cast to double or storage to a
double variable, the resulting expression should be determined by the IEC 559
specification.
Or/and that when x and y are in range [0..0x1FFFFFFFFFFFFF],
the result of the following functions mym() and myd() is fully specified ?

If the input parameters are constrained to 53 bits or less and the arithmetic
results are an integer of 64 or fewer bits, I believe so. I think calling the
llrint() function, even if internal inline, forces a rounding to strict double
precision prior to integer conversion.
Note: that would seem to imply over 105 bits of internal precision on
intermediary results before the rounding occurs.

No. Using a shift and conditional add algorithm on normalized values, unused
low-order bits can be safely discarded. I think two guard bits, updated with
each iteration, are sufficient (with single bit multiply cycles) to distinguish
0, <0.5, 0.5, and >0.5 LSB for correct rounding.
 
F

Francois Grieu

Francois said:
[reposted with tweaks, and code that compiles]

Assume a conforming C99 implementation that defines __STDC_IEC_559__.
"double" is a floating point type capable of exactly representing all
integers in the range [0..0x1FFFFFFFFFFFFF] (that's 53 bit). Also assume
FE_TONEAREST is in effect.

Can it be said that multiplication and division of double are exactly
defined?

After a single multiply or divide, followed by a cast to double or
storage to a double variable, the resulting expression should be
determined by the IEC 559 specification.

That's also my understanding of the rehash of the standard that I read
here and there.
Or/and that when x and y are in range [0..0x1FFFFFFFFFFFFF],
the result of the following functions mym() and myd() is fully
specified ?

If the input parameters are constrained to 53 bits or less and the
arithmetic results are an integer of 64 or fewer bits,
[that's hopefully the case in my test code]
I believe so. I
think calling the llrint() function, even if internal inline, forces a
rounding to strict double precision prior to integer conversion.


No. Using a shift and conditional add algorithm on normalized values,
unused low-order bits can be safely discarded. I think two guard bits,
updated with each iteration, are sufficient (with single bit multiply
cycles) to distinguish 0, <0.5, 0.5, and >0.5 LSB for correct rounding.

Indeed, I missed that. More generally if the multiplication is performed
in n successive multiply/add/round steps, we can live with ceil(53/n)+53
bits of intermediary result, or something on that tune.
I guess something similar is possible with division.

Thanks!

Francois Grieu
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top