P
Phil Carmody
Dik T. Winter said:Ok, I think that's going to have to be good enough for me--I'm running
out of time to keep up. I think it's a step closer to Chuck's
position--maybe not enough for him, though.
Not enough at all. Consider the following piece of code (where for
simplicity I assume a > b > 0.0, otherwise it is a bit more
cumbersome) [detypoed]
void add(double a, double b, double *c, double *cc) {
*c = a + b;
*cc = a - *c + b;
}
(and assume also that the value of *c used in the calculation of *cc is the
value actually stored, for some compilers you may need some flags for that.)
Assuming something about the arithmetic on the machine (which IEEE machines
do satisfy), after that operation, a + b == *c + *cc exactly, while *cc is much
less than *c.
It's only a minor point, but I find it more instructive to view
the above as:
void add(double a, double b, double *c, double *cc) {
*c = a + b;
double tmp = *c - a;
*cc = b - tmp;
}
I desperately wanted to add some comments to that to explain
what the final two steps (which you merged), but decided that
if my claim of more instructive were to be supportable, then
it should be so without comments!
This is a basic building block to (approximately) double the maxmimum precision
on a processor using build-in arithmetic only. Assuming that the 'c' stored
is an approximation and not assuming that the 'c' used in the calculation of
'cc' is not the value actually stored in 'c' voids the whole argument.
I may note that this technique (due to T.J. Dekker in the sixties) was used
when IBM implemented double precision on the PC/RT.
For this technique to work it is absolutely *not* necessary to know what the
value stored in 'c' actually is, it is only necessary to know that what did
get in also gets out, and some bounds on what did get in (that is: the
properly rounded result of the addition, called "faithful" in the original
article).
(The other basic block is a routine of the format:
void mul(double a, double b, double *c, double *cc)
where after the operation a * b == c + cc exactly. It uses similar basic
operations but is a bit concoluted.)
There are interesting techniques in full-precision fused multiply-add
architectures where you can create exact 104-bit products from 2 52-bit
mupliplicands. If you don't have such an architecture then it is indeed
more complicated, but with one it's trivial. (Power being my arch of
choice in this regard.)
Phil