Cosine algo, was Re: sqrt algo

Dik T. Winter · Oct 20, 2005

> I would have argued like this: With 64 bit doubles, there are roughly
> 2^62 double values >= 1 (half of the doubles are negative, so we don't
> care for the cosine function, and roughly another quarter is less than
> 1). Each of these is at most pi/2 away from (n + 1/2) * pi. We could
> expect one of those values within (pi/2) / 2^62 of (n + 1/2) pi.

This is a probabilistic argument. While it works in most cases, it may
not work in a particular case. Moreover, it does only show an expectation
that something is closer than some particular value to (n + 1/2) pi.
You need to show that *nothing* is closer than some (other) particular
value to (n + 1/2) pi. See my other article where I did a full analysis
of the IEEE double case.

Alas, continued fractions are not well-known. Quite a bit is told as
the only reasonable book on it in English by W. Olds is only a summary.
A complete reference is only vailable in German (O. Perron). The subject
is actually reasonably important in computer science.

Quite some time ago somebody came to me. He had programmed a system
in Ada, using fixed point. His problem was the multiplication of
numbers by pi; he got bad results. The solution is the "largest"
continued fraction convergent that fits (as a rational) in two fixed
point numbers of the given size. First multiply by the numerator of
the convergent giving a double precision result. Next divide that by
the denominator. This gives the best possible result in this situation.

I think I should write something about it that is reasonably available.

Dik T. Winter · Oct 20, 2005

> Keith Thompson wrote: ....
>
> DBL_MIN is defined in float.h at my house. Printing it "%.16e" gives..
>
> 2.2250738585072014e-308
> ..considerably smaller than 1e-37. It is FLT_MIN with "%.8e" which is..
> 1.17549435e-38

Yup. But the C standard allows for systems where a double has the same
exponent range as a float (like the Vax originally). On the Vax FLT_MIN
and DBL_MIN would be equal, there was only a difference in precision.
Similar for a few other systems. Most notably the IBM mainframes.

Arthur J. O'Dwyer · Oct 20, 2005

If and only if the pi you enter is used exactly by your calculator, which
is only possible if it is doing symbolic manipulation. If the pi your
calculator uses is not the exact value of pi, it shows that the
implementation of the cosine function is imprecise. It may work out
perfect in this formula, but will give suboptimal results for other
arguments.

No, that's not true. cos(pi/2) is precisely zero, and the fact that
Gerry's calculator says so is perfectly normal and to be expected.

Sure, /if/ Gerry's calculator used a stupid algorithm, or only kept as
many bits of precision as it bothered to show, it would be reasonable
to expect its estimate of cos(pi/2) to be off by some amount. And in
practice, you'll see that effect if you try calculating cos(10001*pi/2)
or the cosine of some even larger number. But that doesn't mean Gerry's
calculator is "wrong" or "imprecise" if it correctly answers 'zero' when
asked for the cosine of one-half pi. 'Zero' is the correct answer!

As for your mathematical arguments, they look highly suspect to me;
you're trying to use probabilistic arguments to prove that something
/won't/ happen --- i.e., that cos(k*pi/2) is /never/ less than DBL_MIN
for any integer k.

P.J. Plauger wrote that he has access to a program implementing the
IEEE() function I described earlier; I'm still waiting for someone to
post a link or code to such a program, because once we have that program,
we're done.

-Arthur

Keith Thompson · Oct 20, 2005

Arthur J. O'Dwyer said:
No, that's not true. cos(pi/2) is precisely zero, and the fact
that Gerry's calculator says so is perfectly normal and to be expected.

Certainly cos(pi/2) is *mathematically* zero, but calculators don't
use perfect mathematical numbers. Or, in a sense, they do, but they
can't represent pi/2 exactly.

Sure, /if/ Gerry's calculator used a stupid algorithm, or only kept as
many bits of precision as it bothered to show, it would be reasonable
to expect its estimate of cos(pi/2) to be off by some amount. And in
practice, you'll see that effect if you try calculating cos(10001*pi/2)
or the cosine of some even larger number. But that doesn't mean Gerry's
calculator is "wrong" or "imprecise" if it correctly answers 'zero'
when asked for the cosine of one-half pi. 'Zero' is the correct answer!

Consider a simpler example. Assume a decimal representation (as most
pocket calculators use), and assume, say, 10 significant digits.

Computing 1 / 9 gives 0.1111111111. Multiplying that by 9 gives
0.999999999. If the calculator displays fewer digits than it stores,
that might well appear as 1.0. Now substract 1.0; the result is
-0.0000000001, which might display as -1.0e-10.

If a calculator displayed 1/9*9-1 as 0.0, I'd be suspicious that it's
designed to give answers that *look* correct rather than answers that
*are* correct (unless it actually does the calculations symbolically
rather than numerically).

Applying cos() to a close but inexact approximation to pi/2 should
yield a result close to, but not exactly equal to, 0.0. Since
floating-point has much finer resolution near 0.0, it should be
displayed as a non-zero value. If it isn't, I might suspect that the
implementation is guessing that the given approximation of pi/2 is
intended to be exactly pi/2. So where does that leave me if I really
want to compute cos(pi/2+epsilon) for some very small epsilon?

Dik T. Winter · Oct 20, 2005

> As for your mathematical arguments, they look highly suspect to me;
> you're trying to use probabilistic arguments to prove that something
> /won't/ happen --- i.e., that cos(k*pi/2) is /never/ less than DBL_MIN
> for any integer k.

What is probabilistic about my mathematical arguments? It has been
*proven* that cos(k * pi/2) is /never/ less than DBL_MIN for any
integer k and IEEE double precision. Better, I have proven that it
is always larger (in absolute value) than 1e-34.

P.J. Plauger · Oct 20, 2005

P.J. Plauger wrote that he has access to a program implementing the
IEEE() function I described earlier; I'm still waiting for someone to
post a link or code to such a program, because once we have that program,
we're done.

We have access to such a program because we wrote it. We're a
commercial software house that licenses our work for a living.
I'm sure that others have done something similar, but you'll
have to wait for them to come forward.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

Gerry Quinn · Oct 20, 2005

Applying cos() to a close but inexact approximation to pi/2 should
yield a result close to, but not exactly equal to, 0.0. Since
floating-point has much finer resolution near 0.0, it should be
displayed as a non-zero value. If it isn't, I might suspect that the
implementation is guessing that the given approximation of pi/2 is
intended to be exactly pi/2. So where does that leave me if I really
want to compute cos(pi/2+epsilon) for some very small epsilon?

In trouble if you think calculators are the best way to do it.

Differentiating cos( pi/2 + x ) gives - sin( pi/2 + x ) ~= -1 when x is
small, so in fact the answer to your problem is approximately -epsilon.

If you want it more accurate, you should analyse.

- Gerry Quinn

cs · Oct 22, 2005

I just wrote a cosine function called c_os, for no special reason.
It calls no functions that I haven't also written.

On my machine, it seems to work better than
my MS compiler's standard library cos function.
They don't get the zeros right. I do.

c_os(-123 * pi / 6) is 0.000000e+000
c_os(-129 * pi / 6) is 0.000000e+000

cos(-123 * pi / 6) is 7.839514e-015
cos(-129 * pi / 6) is -4.409261e-015

/* BEGIN c_os.c output */

const double pi = 4 * atan(1);

c_os(-120 * pi / 6) is 1.000000e+000
c_os(-121 * pi / 6) is 8.660254e-001
c_os(-122 * pi / 6) is 5.000000e-001
c_os(-123 * pi / 6) is 0.000000e+000
c_os(-124 * pi / 6) is -5.000000e-001
c_os(-125 * pi / 6) is -8.660254e-001
c_os(-126 * pi / 6) is -1.000000e+000
c_os(-127 * pi / 6) is -8.660254e-001
c_os(-128 * pi / 6) is -5.000000e-001
c_os(-129 * pi / 6) is 0.000000e+000
c_os(-130 * pi / 6) is 5.000000e-001
c_os(-131 * pi / 6) is 8.660254e-001

cos(-120 * pi / 6) is 1.000000e+000
cos(-121 * pi / 6) is 8.660254e-001
cos(-122 * pi / 6) is 5.000000e-001
cos(-123 * pi / 6) is 7.839514e-015
cos(-124 * pi / 6) is -5.000000e-001
cos(-125 * pi / 6) is -8.660254e-001
cos(-126 * pi / 6) is -1.000000e+000
cos(-127 * pi / 6) is -8.660254e-001
cos(-128 * pi / 6) is -5.000000e-001
cos(-129 * pi / 6) is -4.409261e-015
cos(-130 * pi / 6) is 5.000000e-001
cos(-131 * pi / 6) is 8.660254e-001

it seems this cosr() is better

(P,s,I)(-120*pi6)= +1.000000000000000 +1.000000000000000
+1.000000000000000
(P,s,I)(-121*pi6)= +0.866025403784437 +0.866025403784438
+0.866025403784439
(P,s,I)(-122*pi6)= +0.499999999999995 +0.499999999999997
+0.500000000000000
(P,s,I)(-123*pi6)= -0.000000000000000 -0.000000000000006
+0.000000000000000
(P,s,I)(-124*pi6)= -0.499999999999998 -0.499999999999996
-0.500000000000000
(P,s,I)(-125*pi6)= -0.866025403784439 -0.866025403784438
-0.866025403784439
(P,s,I)(-126*pi6)= -1.000000000000000 -1.000000000000000
-1.000000000000000
(P,s,I)(-127*pi6)= -0.866025403784435 -0.866025403784437
-0.866025403784439
(P,s,I)(-128*pi6)= -0.499999999999992 -0.499999999999994
-0.500000000000000
(P,s,I)(-129*pi6)= -0.000000000000000 -0.000000000000004
+0.000000000000000
(P,s,I)(-130*pi6)= +0.500000000000001 +0.499999999999999
+0.500000000000000
(P,s,I)(-131*pi6)= +0.866025403784441 +0.866025403784439
+0.866025403784439
(P,s,I)(-132*pi6)= +1.000000000000000 +1.000000000000000
+1.000000000000000
(P,s,I)(-133*pi6)= +0.866025403784441 +0.866025403784442
+0.866025403784439
Delta=5.745404e-15 x0=35878.400000000001460
c_os(x0)=0.158232478605360 cosr(x0)=0.158232478606736
cos(x0)=0.158232478606742

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <errno.h>
#include <float.h>
#include <time.h>

#define R return
#define P printf
#define F for
#define W while
#define U unsigned

long double Kpi
=3.14159265358979323846264338327950288419716939937510L;
long double K2pi
=6.28318530717958647692528676655900576839433879875021L;
long double Kpi2
=1.57079632679489661923132169163975144209858469968755L;

long double Kpi4
=0.78539816339744830961566084581987572104929234984377L;
long double cKpi4
=0.70710678118654752440084436210484903928483593768847L;
long double sKpi4
=0.70710678118654752440084436210484903928483593768847L;

long double Kpi6
=0.52359877559829887307710723054658381403286156656251L;
long double cKpi6
=0.86602540378443864676372317075293618347140262690519L;
long double sKpi6
=0.50000000000000000000000000000000000000000000000000L;

long double Kpi3
=1.0471975511965977461542144610931676280657231331250352L;
long double
cKpi3=0.50000000000000000000000000000000000000000000000000L;
long double
sKpi3=0.86602540378443864676372317075293618347140262690519L;

long double Kpi10
=0.314159265358979323846264338327950288419716939937510L;
long double
cKpi10=0.951056516295153572116439333379382143405698634125750L;
long double
sKpi10=0.309016994374947424102293417182819058860154589902881L;

long double Kpi9
=0.3490658503988659153847381536977225426885743777083450L;
long double
cKpi9=0.9396926207859083840541092773247314699362081342644646L;
long double
sKpi9=0.3420201433256687330440996146822595807630833675141606L;

long double Kpi8
=0.3926990816987241548078304229099378605246461749218882L;
long double
cKpi8=0.9238795325112867561281831893967882868224166258636424L;
long double
sKpi8=0.3826834323650897717284599840303988667613445624856270L;

long double Kpi7
=0.4487989505128276054946633404685004120281670570535865L;
long double
cKpi7=0.9009688679024191262361023195074450511659191621318571L;
long double
sKpi7=0.4338837391175581204757683328483587546099907277874598L;

long double Kpi5
=0.6283185307179586476925286766559005768394338798750211L;
long double
cKpi5=0.8090169943749474241022934171828190588601545899028814L;
long double
sKpi5=0.5877852522924731291687059546390727685976524376431459L;

static double eps = 32*DBL_EPSILON;

double c_os(double x);
double p_i (void );
double sq_rt(double x);

double c_os(double x)
{
unsigned n, negative, sine;
double a, b, c;
static double pi, two_pi, four_pi, half_pi, quarter_pi;

if(x >= -DBL_MAX && DBL_MAX >= x){
if(1 > pi){
pi=p_i();
two_pi = 2*pi;
four_pi = 4*pi;
half_pi = pi/2;
quarter_pi= pi/4;
}
if(0 > x){
x=-x;
}
while(x > two_pi){
b=x/2;
a=two_pi;
while(b > a){
a*=2;
}
x-=a;
}
if(x > pi){
x = two_pi - x;
}
if(x > half_pi){
x = pi-x;
negative = 1;
} else {
negative = 0;
}
if(x > quarter_pi){
x = half_pi - x;
sine = 1;
} else {
sine = 0;
}
c = x*x;
x = n = 0;
a = 1;
do {
b = a;
a *= c/++n;
a /= ++n;
b -= a;
a *= c/++n;
a /= ++n;
x += b;
}while(b > DBL_EPSILON/2);
if(sine){
x=sq_rt(1-x*x);
}
if(negative){
x=-x;
}
} else {
x = -HUGE_VAL;
errno = EDOM;
}
return x;
}

double p_i(void)
{
unsigned n;
double p, a, b;

p = 0;
a = 3;
n = 1;
do {
a /= 9;
b = a/n;
a /= 9;
n += 2;
b -= a/n;
n += 2;
p += b;
}while(b > DBL_EPSILON/4);
a = 2;
n = 1;
do {
a /= 4;
b = a/n;
a /= 4;
n += 2;
b -= a/n;
n += 2;
p += b;
}while(b > DBL_EPSILON/2);
return 4*p;
}

double sq_rt(double x)
{
int n;
double a, b;

if(DBL_MAX >= x){
if(x > 0){
for(n = 0; x > 2; x/=4){
++n;
}
while(0.5 > x){
--n;
x *= 4;
}
a = x;
b = (1 + x)/2;
do {
x = b;
b = (a/x + x)/2;
}while(x>b);
while(n > 0){
x *= 2;
--n;
}
while(0 > n){
x /= 2;
++n;
}
} else {
if(0 > x){
errno = EDOM;
x = HUGE_VAL;
}
}
} else {
errno = ERANGE;
}
return x;
}

double cos3(double x)
{double x2, x4, x6, x8;
if(fabs(x) < eps) R 1.0;
if(fabs(x-Kpi3) < eps) R cKpi3;
if(fabs(x-Kpi4) < eps) R cKpi4;
if(fabs(x-Kpi5) < eps) R cKpi5;
if(fabs(x-Kpi6) < eps) R cKpi6;
if(fabs(x-Kpi7) < eps) R cKpi7;
if(fabs(x-Kpi8) < eps) R cKpi8;
if(fabs(x-Kpi9) < eps) R cKpi9;
if(fabs(x-Kpi10)< eps) R cKpi10;
x2=x*x; x4=x2*x2; x6=x4*x2; x8=x4*x4;
R (1.0 - x2/2.0L + x4/24.0 - x6/720.0 + x8/40320.0
- (x6*x4)/3628800.0 + (x6*x6)/479001600.0 -(x8*x6)/
87178291200.0);
}

double sin3(double x)
{double x2, x3, x5, x7, x4, x6, y, t;
if(fabs(x) < eps) R 0.0;
if(fabs(x-Kpi3) < eps) R sKpi3;
if(fabs(x-Kpi4) < eps) R sKpi4;
if(fabs(x-Kpi5) < eps) R sKpi5;
if(fabs(x-Kpi6) < eps) R sKpi6;
if(fabs(x-Kpi7) < eps) R sKpi7;
if(fabs(x-Kpi8) < eps) R sKpi8;
if(fabs(x-Kpi9) < eps) R sKpi9;
if(fabs(x-Kpi10)< eps) R sKpi10;
x2=x*x; x4=x2*x2; x3=x2*x; x5=x3*x2; x6=x4*x2; x7=x5*x2;
R x - x3/6.0 + x5/120.0 - x7/5040.0 +
(x7*x2)/362880 - (x7*x4)/39916800.0 + (x7*x6)/6227020800.0 -

(x7*x7*x)/1307674368000.0;
}

double cosr(const double x);
double sinr(const double x);

double cosr(const double x)
{U sign=0;
double t, y, z, k;
////////////////////
t = x;
if(t<0) t = -t; // cos(t)=cos(-t)
if(t>=K2pi)
{y = t / K2pi;
modf(y, &z); // z=[y] con lo stesso segno
t = t - z*K2pi;
}
if(t>=Kpi) // cos(t)= -cos(pi-t)= -cos(t-pi)
{ t= t-Kpi; sign^=1;}
if(t>=Kpi2) // cos(t)= -cos(pi-t)
{ t= Kpi-t; sign^=1;}
if(t>Kpi4) // cos(t)= sin(pi/2 - t)
{ t=Kpi2-t;
y= sign ? -sinr(t): sinr(t);
R y;
}
y = cos3(t);
R y==0.0 ? -0.0: (sign ? -y: y);
// R sign && (y!=0.0) ? -y : y;
}

double sinr(const double x)
{U sign=0;
double t, y, z, k;
////////////////////
t = x;
if(t<0) // sin(t)=-sin(-t)
{ t= -t; sign^=1; }
if(t>=K2pi)
{y = t / K2pi;
modf(y, &z); // z=[y] con lo stesso segno
t = t - z*K2pi;
}
if(t>=Kpi) // sin(t)= sin(pi-t)= -sin(t-pi)
{ t= t-Kpi; sign^=1;}
if(t>=Kpi2) // sin(t)= sin(pi-t)
t= Kpi-t;
if(t>=Kpi4) // sin(t)= cos(pi/2 - t)
{ t=Kpi2-t;
y = sign ? -cosr(t): cosr(t);
R y;
}
y = sin3(t);
R y==0.0 ? -0.0: (sign ? -y: y);
}

int main( void )
{int i;
double tmp, x, y, z0, z1, x0, delta;

F(i=-120; i>-134 ; --i)
{ P("(P,s,I)(%i*pi6)= %+.15f %+.15f ", i, c_os(i*Kpi6),
cos(i*Kpi6) );
P("%+.15f\n", cosr(i*Kpi6) );
}

srand((U)time(0));
F(i=0, x0=delta=0.0; i<500000; ++i)
{x=rand(); y=rand();
if(y!=0) x/=y;
x+=rand();
z0 = cos(x); z1 = cosr(x);
if( z0 >= z1)
{y = z0-z1;
if(y>delta) { x0=x; delta=y;}
}
else { y = z1 - z0;
if(y>delta) { x0=x; delta=y;}
}
}
P("Delta=%e x0=%.15f\n c_os(x0)=%.15f cosr(x0)=%.15f\n",
delta, x0, c_os(x0), cosr(x0));
P("cos(x0)=%.15f\n", cos(x0));

R 0;
}

che_nome · Oct 22, 2005

i forget to say that this code could have some Copyright by Satoshi
Tomabechi (from his code i have learnt to do functions like cosr sinr)

Dik T. Winter · Oct 23, 2005

> it seems this cosr() is better

Erm. I check sin3:

> double sin3(double x)
> {double x2, x3, x5, x7, x4, x6, y, t;
> if(fabs(x) < eps) R 0.0;

This should be "return x".

> R x - x3/6.0 + x5/120.0 - x7/5040.0 +
> (x7*x2)/362880 - (x7*x4)/39916800.0 + (x7*x6)/6227020800.0 -
>
> (x7*x7*x)/1307674368000.0;

But it can be improved quite a bit (for IEEE double):
double cos3(double x) {
double y;

y = x * x;
return ((((((-0.00000000001136800186 * y + 0.00000000208758867982) * y -
0.00000027557315566764) * y + 0.00002480158729369483) * y -
0.00138888888888807796) * y + 0.04166666666666662966) * y -
0.5) * y + 1.0;
}
double sin3(double x) {
double y;

y = x * x;
return ((((((0.00000000015918135277 * y - 0.00000002505113193827) * y +
0.00000275573161030613) * y - 0.00019841269836759707) * y +
0.00833333333333094971) * y - 0.16666666666666662966) * y +
1.0) * x;
}
these are tuned to a very good relative precision on [-pi/4,pi/4], and with
an exact 1.0 for cos(0.0). Precision is comparable to your routines, but
the number of operations is considerably less.

> if(t>=K2pi)
> {y = t / K2pi;
> modf(y, &z); // z=[y] con lo stesso segno
> t = t - z*K2pi;
> }

From a numerical standpoint, this range reduction is incorrect.

Some basic knowledge about numerical mathematics would be preferable
before somebody attempts to implement sine and cosine functions. You
may wonder why in the coefficients above there is such a large deviation
from 1/(n!). That belongs to the field of numerical mathematics.

cs · Oct 24, 2005

Erm. I check sin3:

This should be "return x".

the meaning is this: if i have a number x and here
|x|<eps=32*DBL_EPSILON=0.000000000000007
then sin(x) is 0.0
the same for other constants

But it can be improved quite a bit (for IEEE double):

this is my last sin3

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <errno.h>
#include <float.h>
#include <time.h>

#define R return
#define P printf
#define F for
#define W while
#define U unsigned

long double
K2pi=6.28318530717958647692528676655900576839433879875021L;

long double kp[]=
{0.0L,
3.14159265358979323846264338327950288419716939937510L, // pi/1
1.57079632679489661923132169163975144209858469968755L, // pi/2
0.0L, // pi/3
0.78539816339744830961566084581987572104929234984377L, // pi/4
0.62831853071795864769252867665590057683943387987502L, // pi/5
0.52359877559829887307710723054658381403286156656251L, // pi/6
0.44879895051282760549466334046850041202816705705358L, // pi/7
0.39269908169872415480783042290993786052464617492188L, // pi/8
0.34906585039886591538473815369772254268857437770834L, // pi/9
0.31415926535897932384626433832795028841971693993751L // pi/10
};

long double kps[]=
{0.0L, 0.0L, 0.0L, 0.0L, // 0, pi, pi/2 pi/3
0.70710678118654752440084436210484903928483593768847L, // pi/4
0.58778525229247312916870595463907276859765243764314L, // pi/5
0.50000000000000000000000000000000000000000000000000L, // pi/6
0.43388373911755812047576833284835875460999072778745L, // pi/7
0.38268343236508977172845998403039886676134456248562L, // pi/8
0.34202014332566873304409961468225958076308336751416L, // pi/9
0.30901699437494742410229341718281905886015458990288L // pi/10
};

long double kpc[]=
{0.0L, 0.0L, 0.0L, 1.0L, // 0, pi, pi/2 pi/3
0.70710678118654752440084436210484903928483593768847L, // pi/4
0.80901699437494742410229341718281905886015458990288L, // pi/5
0.86602540378443864676372317075293618347140262690519L, // pi/6
0.90096886790241912623610231950744505116591916213185L, // pi/7
0.92387953251128675612818318939678828682241662586364L, // pi/8
0.93969262078590838405410927732473146993620813426446L, // pi/9
0.95105651629515357211643933337938214340569863412575L // pi/10
};

static double eps = 32*DBL_EPSILON;

double cos3(double x)
{int i;
double y;
static double h[] = { 1.0/2.0, 1.0/24.0, 1.0/720.0, 1.0/40320.0,
1.0/3628800.0, 1.0/479001600.0, 1.0/87178291200.0};
///////////////////////////

asm{ mov ecx, 30
mov esi, offset kp
fld eps // st = eps
fld x // st = x, eps
l0:
fld tbyte ptr [esi+ecx]
// st = kp[c], x, eps
fsub st(0), st(1)
// st = kp[c]-x, x, eps
fabs // st = |kp[c]-x|, x, eps
fcomp st(2) // st = x, eps
fstsw ax
sahf
ja l1
fstp y // st = eps
fstp y // st =
mov esi, offset kpc
fld tbyte ptr [esi+ecx]
jmp end
l1:
add ecx, 10
cmp ecx, 100
jbe l0
// st = x, eps
fstp y
fstp y // st =
fld1 // st = 1
fld1 // st = 1, 1
fld x // st = x, 1, 1
fld x // st = x, x, 1, 1
fmulp st(1), st(0)
// st = x*x, 1, 1
fchs // st = -x*x, 1, 1
// x, y, z
mov ecx, 0
mov esi, offset h
l3:
fmul st(1), st(0) // st = x, y*x, z
fld qword ptr [esi+ecx]
// st = h[c], x, y, z
fmul st(0), st(2)
// st = y*h[c], x, y, z
faddp st(3), st(0)
// st = x, y, z+y/h[c]
add ecx, 8
cmp ecx, 56
jb l3
fstp y
fstp y // st = z 38
end:
}

}

double sin3(double x)
{int i;
double y;
static double h[] = {1.0/6.0, 1.0/120.0, 1.0/5040.0, 1.0/362880.0,
1.0/39916800.0, 1.0/6227020800.0, 1.0/1307674368000.0 };
///////////////////////////

asm{ mov ecx, 30
mov esi, offset kp
fld eps // st = eps
fld x // st = x, eps
l0:
fld tbyte ptr [esi+ecx]
// st = kp[c], x, eps
fsub st(0), st(1)
// st = kp[c]-x, x, eps
fabs // st = |kp[c]-x|, x, eps
fcomp st(2) // st = x, eps
fstsw ax
sahf
ja l1
fstp y // st = eps
fstp y // st =
mov esi, offset kps
fld tbyte ptr [esi+ecx]
jmp end
l1:
add ecx, 10
cmp ecx, 100
jbe l0
// st = x, eps
fstp y
fstp y // st =
fld x // st = x
fld x // st = x, x
fld x // st = x, x, x
fld x // st = x, x, x, x
fmulp st(1), st(0)
// st = x*x, x, x
fchs // st = -x*x, x, x
// x, y, z
mov ecx, 0
mov esi, offset h
l3:
fmul st(1), st(0) // st = x, y*x, z
fld qword ptr [esi+ecx]
// st = h[c], x, y, z
fmul st(0), st(2)
// st = y*h[c], x, y, z
faddp st(3), st(0)
// st = x, y, z+y/h[c]
add ecx, 8
cmp ecx, 56
jb l3
fstp y
fstp y // st = z 38
end:
}
}

double cosr(const double x);
double sinr(const double x);

double cosr(const double x)
{U sign=0;
double t, y, z, k;
////////////////////
t = x;
if(t<0) t = -t; // cos(t)=cos(-t)
if(t>=K2pi)
{y = t / K2pi;
modf(y, &z); // z=[y] con lo stesso segno
t = t - z*K2pi;
}
if(t>=kp[1]) // cos(t)= -cos(pi-t)= -cos(t-pi)
{ t= t-kp[1]; sign^=1;}
if(t>=kp[2]) // cos(t)= -cos(pi-t)
{ t= kp[1]-t; sign^=1;}
if(t>kp[4]) // cos(t)= sin(pi/2 - t)
{ t=kp[2]-t;
y = sin3(t);
y= sign && y!=0.0 ? -y: y;
R y;
}
y = cos3(t);
R sign && y!=0.0 ? -y: y;
}

double sinr(const double x)
{U sign=0;
double t, y, z, k;
////////////////////
t = x;
if(t<0) // sin(t)=-sin(-t)
{ t= -t; sign^=1; }
if(t>=K2pi)
{y = t / K2pi;
modf(y, &z); // z=[y] con lo stesso segno
t = t - z*K2pi;
}
if(t>=kp[1]) // sin(t)= sin(pi-t)= -sin(t-pi)
{ t= t-kp[1]; sign^=1;}
if(t>=kp[2]) // sin(t)= sin(pi-t)
t= kp[1]-t;
if(t>=kp[4]) // sin(t)= cos(pi/2 - t)
{ t=kp[2]-t;
y = cos3(t);
y = sign && y!=0.0 ? -y: y;
R y;
}
y = sin3(t);
R sign && y!=0.0 ? -y: y;
}

int main( void )
{int i;
double tmp, x, y, z0, z1, x0, delta;

F(i=-120; i>-134 ; --i)
{ P("(s,I)(%i*pi6)= %+.15f ", i, cos(i*kp[6]) );
P("%+.15f\n", cosr(i*kp[6]) );
}

srand((U)time(0));
F(i=0, x0=delta=0.0; i<500000; ++i)
{x=rand(); y=rand();
if(y!=0) x/=y;
x+=rand();
z0 = cos(x); z1 = cosr(x);
if( z0 >= z1)
{y = z0-z1;
if(y>delta) { x0=x; delta=y;}
}
else { y = z1 - z0;
if(y>delta) { x0=x; delta=y;}
}
}
P("Delta=%e x0=%.15f\n cosr(x0)=%.15f\n",
delta, x0, cosr(x0));
P("cos(x0)=%.15f %.15f \n", cos(x0), 32.0*DBL_EPSILON);

R 0;
}

double cos3(double x) {
double y;

y = x * x;
return ((((((-0.00000000001136800186 * y + 0.00000000208758867982) * y -
0.00000027557315566764) * y + 0.00002480158729369483) * y -
0.00138888888888807796) * y + 0.04166666666666662966) * y -
0.5) * y + 1.0;
}
double sin3(double x) {
double y;

y = x * x;
return ((((((0.00000000015918135277 * y - 0.00000002505113193827) * y +
0.00000275573161030613) * y - 0.00019841269836759707) * y +
0.00833333333333094971) * y - 0.16666666666666662966) * y +
1.0) * x;
}
these are tuned to a very good relative precision on [-pi/4,pi/4], and with
an exact 1.0 for cos(0.0). Precision is comparable to your routines, but
the number of operations is considerably less.

for new ones i think the number of operation is almost the same

if(t>=K2pi)
{y = t / K2pi;
modf(y, &z); // z=[y] con lo stesso segno
t = t - z*K2pi;
}

Click to expand...

From a numerical standpoint, this range reduction is incorrect.

why?

Some basic knowledge about numerical mathematics would be preferable
before somebody attempts to implement sine and cosine functions.

would you say that your routines are better than my ones?

Why?

(s,I)(-120*pi6)= +1.000000000000000 +1.000000000000000
(s,I)(-121*pi6)= +0.866025403784438 +0.866025403784439
(s,I)(-122*pi6)= +0.499999999999997 +0.500000000000000
(s,I)(-123*pi6)= -0.000000000000006 +0.000000000000000
(s,I)(-124*pi6)= -0.499999999999996 -0.500000000000000
(s,I)(-125*pi6)= -0.866025403784438 -0.866025403784439
(s,I)(-126*pi6)= -1.000000000000000 -1.000000000000000
(s,I)(-127*pi6)= -0.866025403784437 -0.866025403784439
(s,I)(-128*pi6)= -0.499999999999994 -0.500000000000000
(s,I)(-129*pi6)= -0.000000000000004 +0.000000000000000
(s,I)(-130*pi6)= +0.499999999999999 +0.500000000000000
(s,I)(-131*pi6)= +0.866025403784439 +0.866025403784439
(s,I)(-132*pi6)= +1.000000000000000 +1.000000000000000
(s,I)(-133*pi6)= +0.866025403784442 +0.866025403784439
Delta=8.340550e-15 x0=59739.000000000000000
cosr(x0)=0.044880582702773
cos(x0)=0.044880582702764 0.000000000000007

You
may wonder why in the coefficients above there is such a large deviation
from 1/(n!). That belongs to the field of numerical mathematics.

?

Dik T. Winter · Oct 24, 2005

> On Sun, 23 Oct 2005 01:27:16 GMT, "Dik T. Winter" <[email protected]>
> wrote: ....
>
> the meaning is this: if i have a number x and here
> |x|<eps=32*DBL_EPSILON=0.000000000000007
> then sin(x) is 0.0
> the same for other constants

If x is small, sin(x) is approximately equal to x, not equal to 0.
That is why I say you should return x in that case.

> asm{ mov ecx, 30
> mov esi, offset kp
> fld eps // st = eps
> fld x // st = x, eps

Sorry, does not work on the machine I am now typing on, nor on the machine
I have at home. Both are not Intel machines.

> > these are tuned to a very good relative precision on [-pi/4,pi/4], and with
> > an exact 1.0 for cos(0.0). Precision is comparable to your routines, but
> > the number of operations is considerably less.

Click to expand...

>
> for new ones i think the number of operation is almost the same

Possibly, I can not count it.

> > > if(t>=K2pi)
> > > {y = t / K2pi;
> > > modf(y, &z); // z=[y] con lo stesso segno
> > > t = t - z*K2pi;
> > > }

Click to expand...

> >
> > From a numerical standpoint, this range reduction is incorrect.

Click to expand...

>
> why?

That is not a range reduction mod pi, but mod some number close to pi,
and even not perfect there.

>
>
> would you say that your routines are better than my ones?
> Why?

I say comparable.

> (s,I)(-120*pi6)= +1.000000000000000 +1.000000000000000
> (s,I)(-121*pi6)= +0.866025403784438 +0.866025403784439 ULP difference
> (s,I)(-122*pi6)= +0.499999999999997 +0.500000000000000

ULP difference (and you can not state that it *should* be 0.5, because
there is no reason for that, there is no double precision floating point
number f such that sin(f) is exactly 0.5).

> (s,I)(-123*pi6)= -0.000000000000006 +0.000000000000000

See the earlier discussion about cos((n + 1/2).pi) that never should
deliver 0 when (n + 1/2).pi is in IEEE double precision. BTW, something
similar holds for sin(n.pi), unless n == 0.

>
> ?

Look up truncated chebyshov approximations in some textbook on numerical
analysis. They reduce the number of operations needed while precision
is retained.

cs · Oct 24, 2005

...

Click to expand...

....

asm{ mov ecx, 30
mov esi, offset kp
fld eps // st = eps
fld x // st = x, eps

Click to expand...

Sorry, does not work on the machine I am now typing on, nor on the machine
I have at home. Both are not Intel machines.

these are tuned to a very good relative precision on [-pi/4,pi/4], and with
an exact 1.0 for cos(0.0). Precision is comparable to your routines, but
the number of operations is considerably less.

Click to expand...

for new ones i think the number of operation is almost the same

Click to expand...

Possibly, I can not count it.

i have count 8 multiplication for the cos routine you have written and
15 multiplications for my cos3: so your should be 2x faster than mine

if(t>=K2pi)
{y = t / K2pi;
modf(y, &z); // z=[y] con lo stesso segno
t = t - z*K2pi;
}

From a numerical standpoint, this range reduction is incorrect.

Click to expand...

why?

Click to expand...

That is not a range reduction mod pi, but mod some number close to pi,
and even not perfect there.

if i have well understood i add rounding errors in "t" variable

Keith Thompson · Oct 24, 2005

cs said:
#define R return
#define P printf
#define F for
#define W while
#define U unsigned

[snip]

If you think the second version of this sentence is as easy to read
as the first, you can continue using those macros; otherwise, please
spell out the entire word each time you use it.

I Y T T S V O T S I A E T R A T F, Y C C U T M; O, P S O T E W E T Y U I.

Dik T. Winter · Oct 25, 2005

> On Mon said:
> On Mon said:

> >In article said:

> > > On Sun, 23 Oct 2005 01:27:16 GMT, "Dik T. Winter" <[email protected]>
> > > wrote: ....
> > > > > if(t>=K2pi)
> > > > > {y = t / K2pi;
> > > > > modf(y, &z); // z=[y] con lo stesso segno
> > > > > t = t - z*K2pi;
> > > > > }
> > > >
> > > > From a numerical standpoint, this range reduction is incorrect.
> > >
> > > why?

Click to expand...

> >
> >That is not a range reduction mod pi, but mod some number close to pi,
> >and even not perfect there.

Click to expand...

>
> if i have well understood i add rounding errors in "t" variable

There are a few rounding errors here. First, K2pi is not exact. Second,
the multiplication "z * K2pi" induces a rounding error, and third, the
subtraction can induce a rounding error. Try the following:
double K1over3 = 1.0/3.0;
double t = 1.0;
double y, z;
y = t / K1over3;
modf(y, &z);
t = t - z * K1over3;
printf("%g %g\n", z, t);
and see that it does not print 0.0 (although z is correctly printed as 3).

cs · Oct 26, 2005

i have implemented in assembly Dik T. Winter sin333/cos333 routines
(with some change).
Do you see any error? (i'm not expert in FPU)
In your opinion what is the better and why?

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <errno.h>
#include <float.h>
#include <time.h>

#define R return
#define P printf
#define F for
#define W while
#define U unsigned

long double
K2pi=6.28318530717958647692528676655900576839433879875021L;

long double kp[]=
{0.0L,
3.14159265358979323846264338327950288419716939937510L, // pi/1
1.57079632679489661923132169163975144209858469968755L, // pi/2
0.0L, // pi/3
0.78539816339744830961566084581987572104929234984377L, // pi/4
0.62831853071795864769252867665590057683943387987502L, // pi/5
0.52359877559829887307710723054658381403286156656251L, // pi/6
0.44879895051282760549466334046850041202816705705358L, // pi/7
0.39269908169872415480783042290993786052464617492188L, // pi/8
0.34906585039886591538473815369772254268857437770834L, // pi/9
0.31415926535897932384626433832795028841971693993751L // pi/10
};

long double kps[]=
{0.0L, 0.0L, 0.0L, 0.0L, // 0, pi, pi/2 pi/3
0.70710678118654752440084436210484903928483593768847L, // pi/4
0.58778525229247312916870595463907276859765243764314L, // pi/5
0.50000000000000000000000000000000000000000000000000L, // pi/6
0.43388373911755812047576833284835875460999072778745L, // pi/7
0.38268343236508977172845998403039886676134456248562L, // pi/8
0.34202014332566873304409961468225958076308336751416L, // pi/9
0.30901699437494742410229341718281905886015458990288L // pi/10
};

long double kpc[]=
{0.0L, 0.0L, 0.0L, 1.0L, // 0, pi, pi/2 pi/3
0.70710678118654752440084436210484903928483593768847L, // pi/4
0.80901699437494742410229341718281905886015458990288L, // pi/5
0.86602540378443864676372317075293618347140262690519L, // pi/6
0.90096886790241912623610231950744505116591916213185L, // pi/7
0.92387953251128675612818318939678828682241662586364L, // pi/8
0.93969262078590838405410927732473146993620813426446L, // pi/9
0.95105651629515357211643933337938214340569863412575L // pi/10
};

static double eps = 32*DBL_EPSILON;

double cos7(double x)
{double y;
static double b[] =
{-0.00000000001136800186, 0.00000000208758867982,
-0.00000027557315566764, 0.00002480158729369483,
-0.00138888888888807796, 0.04166666666666662966,
-0.5 };
//////////
asm{ fld x // st = x
fld x // st = x, x
mov ecx, 8
fmulp st(1), st(0) // st = x*x
mov esi, offset b
fld qword ptr [esi] // st = a[0], x2
l2:
fmul st(0), st(1) // st = a[c]*x2, x2
fld qword ptr [esi+ecx] // st = b[c+1], a[c]*x2, x2
add ecx, 8
faddp st(1), st(0) // st = b[c+1]+a[c]*x2, x2
cmp ecx, 48
jbe l2

fmulp st(1), st(0) // st = (b[6]+a[5]*x2)*x2
fld1 // st = 1, (b[6]+a[5]*x2)*x2
faddp st(1), st(0) // st = (b[6]+a[5]*x2)*x2+1
}
}

double sin7(double x)
{double y;
static double b[] =
{0.00000000015918135277, -0.00000002505113193827,
0.00000275573161030613, -0.00019841269836759707,
0.00833333333333094971, -0.16666666666666662966,
1.0 };
////////////
asm{ fld x // st = x
fld x // st = x, x
mov ecx, 8
fmul st(0), st(1) // st = x*x, x
mov esi, offset b
fld qword ptr [esi] // st = a[0], x2, x
l2:
fmul st(0), st(1) // st = a[c]*x2, x2, x
fld qword ptr [esi+ecx] // st = b[c+1], a[c]*x2, x2, x
add ecx, 8
faddp st(1), st(0) // st = b[c+1]+a[c]*x2, x2, x
cmp ecx, 48
jbe l2

fmulp st(2), st(0) // st = x2, (b[6]+a[5]*x2)*x
fstp y // st = (b[6]+a[5]*x2)*x
}
}

double cos333(double x) {
double y;

y = x * x;
return ((((((-0.00000000001136800186 * y +
0.00000000208758867982) * y -
0.00000027557315566764) * y +
0.00002480158729369483) * y -
0.00138888888888807796) * y +
0.04166666666666662966) * y -
0.5) * y + 1.0;
}

double sin333(double x) {
double y;

y = x * x;
return ((((((0.00000000015918135277 * y -
0.00000002505113193827) * y +
0.00000275573161030613) * y -
0.00019841269836759707) * y +
0.00833333333333094971) * y -
0.16666666666666662966) * y +
1.0) * x;
}

double cos3(double x)
{double y;
static double b[] =
{-0.00000000001136800186, 0.00000000208758867982,
-0.00000027557315566764, 0.00002480158729369483,
-0.00138888888888807796, 0.04166666666666662966,
-0.5 };
//////////
asm{ mov ecx, 30
mov esi, offset kp
fld x // st = x
fld eps // st = eps, x

l0:
fld tbyte ptr [esi+ecx]
// st = kp[c], eps, x
fsub st(0), st(2)
// st = kp[c]-x, eps, x
fabs // st = |kp[c]-x|, eps, x
fcomp st(1) // st = eps, x
fstsw ax
sahf
ja l1
fstp y // st = x
fstp y // st =
mov esi, offset kpc
fld tbyte ptr [esi+ecx]
jmp end
l1:
add ecx, 10
cmp ecx, 100
jbe l0
// st = eps, x
fstp y // st = x
fld x // st = x, x
mov ecx, 8
fmulp st(1), st(0) // st = x*x
mov esi, offset b
fld qword ptr [esi] // st = a[0], x2
l2:
fmul st(0), st(1) // st = a[c]*x2, x2
fld qword ptr [esi+ecx] // st = b[c+1], a[c]*x2, x2
add ecx, 8
faddp st(1), st(0) // st = b[c+1]+a[c]*x2, x2
cmp ecx, 48
jbe l2

fmulp st(1), st(0) // st = (b[6]+a[5]*x2)*x2
fld1 // st = 1, (b[6]+a[5]*x2)*x2
faddp st(1), st(0) // st = (b[6]+a[5]*x2)*x2+1
end:
}

}

double sin3(double x)
{double y;
static double b[] =
{0.00000000015918135277, -0.00000002505113193827,
0.00000275573161030613, -0.00019841269836759707,
0.00833333333333094971, -0.16666666666666662966,
1.0 };
////////////
asm{ mov ecx, 30
mov esi, offset kp
fld x // st = x
fld eps // st = eps, x
l0:
fld tbyte ptr [esi+ecx]
// st = kp[c], eps, x
fsub st(0), st(2)
// st = kp[c]-x, eps, x
fabs // st = |kp[c]-x|, eps, x
fcomp st(1) // st = eps, x
fstsw ax
sahf
ja l1
fstp y // st = x
fstp y // st =
mov esi, offset kps
fld tbyte ptr [esi+ecx]
jmp end
l1:
add ecx, 10
cmp ecx, 100
jbe l0
// st = eps, x
fstp y // st = x
fld x // st = x, x
mov ecx, 8
fmul st(0), st(1) // st = x*x, x
mov esi, offset b
fld qword ptr [esi] // st = a[0], x2, x
l2:
fmul st(0), st(1) // st = a[c]*x2, x2, x
fld qword ptr [esi+ecx] // st = b[c+1], a[c]*x2, x2, x
add ecx, 8
faddp st(1), st(0) // st = b[c+1]+a[c]*x2, x2, x
cmp ecx, 48
jbe l2

fmulp st(2), st(0) // st = x2, (b[6]+a[5]*x2)*x
fstp y // st = (b[6]+a[5]*x2)*x
end:
}

}

double cos33(double x)
{double y;
static double h[] = { 1.0/2.0, 1.0/24.0, 1.0/720.0, 1.0/40320.0,
1.0/3628800.0, 1.0/479001600.0, 1.0/87178291200.0};
///////////////////////////

asm{ mov ecx, 30
mov esi, offset kp
fld eps // st = eps
fld x // st = x, eps
l0:
fld tbyte ptr [esi+ecx]
// st = kp[c], x, eps
fsub st(0), st(1)
// st = kp[c]-x, x, eps
fabs // st = |kp[c]-x|, x, eps
fcomp st(2) // st = x, eps
fstsw ax
sahf
ja l1
fstp y // st = eps
fstp y // st =
mov esi, offset kpc
fld tbyte ptr [esi+ecx]
jmp end
l1:
add ecx, 10
cmp ecx, 100
jbe l0
// st = x, eps
fstp y
fstp y // st =
fld1 // st = 1
fld1 // st = 1, 1
fld x // st = x, 1, 1
fld x // st = x, x, 1, 1
fmulp st(1), st(0)
// st = x*x, 1, 1
fchs // st = -x*x, 1, 1
// x, y, z
mov ecx, 0
mov esi, offset h
l3:
fmul st(1), st(0) // st = x, y*x, z
fld qword ptr [esi+ecx]
// st = h[c], x, y, z
fmul st(0), st(2)
// st = y*h[c], x, y, z
faddp st(3), st(0)
// st = x, y, z+y*h[c]
add ecx, 8
cmp ecx, 56
jb l3
fstp y
fstp y // st = z 38
end:
}

}

double sin33(double x)
{double y;
static double h[] = {1.0/6.0, 1.0/120.0, 1.0/5040.0, 1.0/362880.0,
1.0/39916800.0, 1.0/6227020800.0, 1.0/1307674368000.0 };
///////////////////////////

asm{ mov ecx, 30
mov esi, offset kp
fld eps // st = eps
fld x // st = x, eps
l0:
fld tbyte ptr [esi+ecx]
// st = kp[c], x, eps
fsub st(0), st(1)
// st = kp[c]-x, x, eps
fabs // st = |kp[c]-x|, x, eps
fcomp st(2) // st = x, eps
fstsw ax
sahf
ja l1
fstp y // st = eps
fstp y // st =
mov esi, offset kps
fld tbyte ptr [esi+ecx]
jmp end
l1:
add ecx, 10
cmp ecx, 100
jbe l0
// st = x, eps
fstp y
fstp y // st =
fld x // st = x
fld x // st = x, x
fld x // st = x, x, x
fld x // st = x, x, x, x
fmulp st(1), st(0)
// st = x*x, x, x
fchs // st = -x*x, x, x
// x, y, z
mov ecx, 0
mov esi, offset h
l3:
fmul st(1), st(0) // st = x, y*x, z
fld qword ptr [esi+ecx]
// st = h[c], x, y, z
fmul st(0), st(2)
// st = y*h[c], x, y, z
faddp st(3), st(0)
// st = x, y, z+y*h[c]
add ecx, 8
cmp ecx, 56
jb l3
fstp y
fstp y // st = z 38
end:
}

}

double cosr(const double x);
double sinr(const double x);

double cosr(const double x)
{U sign=0;
double t, y, z, k;
////////////////////
t = x;
if(t<0) t = -t; // cos(t)=cos(-t)
if(t>=K2pi)
{y = t / K2pi;
modf(y, &z); // z=[y] con lo stesso segno
t = t - z*K2pi;
}
if(t>=kp[1]) // cos(t)= -cos(pi-t)= -cos(t-pi)
{ t= t-kp[1]; sign^=1;}
if(t>=kp[2]) // cos(t)= -cos(pi-t)
{ t= kp[1]-t; sign^=1;}
if(t>=kp[4]) // cos(t)= sin(pi/2 - t)
{ t=kp[2]-t;
y = sin7(t);
R sign && y!=0.0 ? -y: y;
}
y = cos7(t);
R sign && y!=0.0 ? -y: y;
}

double sinr(const double x)
{U sign=0;
double t, y, z, k;
////////////////////
t = x;
if(t<0) // sin(t)=-sin(-t)
{ t= -t; sign^=1; }
if(t>=K2pi)
{y = t / K2pi;
modf(y, &z); // z=[y] con lo stesso segno
t = t - z*K2pi;
}
if(t>=kp[1]) // sin(t)= sin(pi-t)= -sin(t-pi)
{ t= t-kp[1]; sign^=1;}
if(t>=kp[2]) // sin(t)= sin(pi-t)
t= kp[1]-t;
if(t>kp[4]) // sin(t)= cos(pi/2 - t)
{ t=kp[2]-t;
y = cos7(t);
R sign && y!=0.0 ? -y: y;
}
y = sin7(t);
R sign && y!=0.0 ? -y: y;
}

int main( void )
{int i, j;
U h;
double tmp, x, y, z0, z1, x0, delta;
time_t ti, tf;

F(i=-120; i>-134 ; --i)
{ P("(s,I)(%i*pi6)= %+.15f ", i, cos(i*kp[6]) );
P("%+.15f\n", cosr(i*kp[6]) );
}

srand((U)time(0));
F(i=0, x0=delta=0.0; i<500000; ++i)
{j= rand(); j=j&2;
x=rand(); y=rand();
if(y!=0) x/=y;
x+=rand();
if(j) x=-x;
z0 = cos(x); z1 = cosr(x);
if( z0 >= z1)
{y = z0-z1;
if(y>delta) { x0=x; delta=y;}
}
else { y = z1 - z0;
if(y>delta) { x0=x; delta=y;}
}
}
P("Delta=%e x0=%.15f\n cosr(x0)=%.15f\n",
delta, x0, cosr(x0));
P("cos(x0)=%.15f %.15f \n", cos(x0), 32.0*DBL_EPSILON);

ti=time(0);
F(tmp=0.00000001; tmp<kp[4]; tmp+=0.00000001)
if(0.0==cos3(tmp)) {P("NO ");break;}
tf=time(0);
P("tmp=%g cos3_delta=%g \n", tmp, difftime(tf, ti));

ti=time(0);
F(tmp=0.00000001; tmp<kp[4]; tmp+=0.00000001)
if(0.0==cos33(tmp)) {P("NO ");break;}
tf=time(0);
P("tmp=%g cos33_delta=%g \n", tmp, difftime(tf, ti));

ti=time(0);
F(tmp=0.00000001, h=0; tmp<kp[4]; tmp+=0.00000001, ++h)
if(0.0==cos333(tmp)) {P("NO ");break;}
tf=time(0);
P("tmp=%g cos333_delta=%g giri=%u\n", tmp, difftime(tf, ti), h);

ti=time(0);
F(tmp=0.00000001, h=0; tmp<kp[4]; tmp+=0.00000001, ++h)
if(0.0==cos7(tmp)) {P("NO ");break;}
tf=time(0);
P("tmp=%g cos7_delta=%g giri=%u\n", tmp, difftime(tf, ti), h);

R 0;
}

////////////////////
(s,I)(-120*pi6)= +1.000000000000000 +1.000000000000000
(s,I)(-121*pi6)= +0.866025403784438 +0.866025403784438
(s,I)(-122*pi6)= +0.499999999999997 +0.499999999999997
(s,I)(-123*pi6)= -0.000000000000006 -0.000000000000006
(s,I)(-124*pi6)= -0.499999999999996 -0.499999999999996
(s,I)(-125*pi6)= -0.866025403784438 -0.866025403784438
(s,I)(-126*pi6)= -1.000000000000000 -1.000000000000000
(s,I)(-127*pi6)= -0.866025403784437 -0.866025403784437
(s,I)(-128*pi6)= -0.499999999999994 -0.499999999999994
(s,I)(-129*pi6)= -0.000000000000004 -0.000000000000005
(s,I)(-130*pi6)= +0.499999999999999 +0.499999999999999
(s,I)(-131*pi6)= +0.866025403784439 +0.866025403784439
(s,I)(-132*pi6)= +1.000000000000000 +1.000000000000000
(s,I)(-133*pi6)= +0.866025403784442 +0.866025403784442
Delta=6.217249e-15 x0=43202.000000000000000
cosr(x0)=0.378915528769058
cos(x0)=0.378915528769052 0.000000000000007
tmp=0.785398 cos3_delta=14
tmp=0.785398 cos33_delta=15
tmp=0.785398 cos333_delta=9 giri=78539816
tmp=0.785398 cos7_delta=8 giri=78539816

Keith Thompson · Oct 26, 2005

cs said:
i have implemented in assembly Dik T. Winter sin333/cos333 routines
(with some change).
Do you see any error? (i'm not expert in FPU)
In your opinion what is the better and why?

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <errno.h>
#include <float.h>
#include <time.h>

#define R return
#define P printf
#define F for
#define W while
#define U unsigned

This is where I stopped reading.

If you must, for some reason, use these silly little macros, please
expand them before you post your code for others to see. They only
make your code more difficult to read. We all know what "return"
means; don't make us refer back to the top of your program to figure
out what "R" means.

Richard Bos · Oct 26, 2005

Keith Thompson said:
This is where I stopped reading.

Oh, come on. cs == RoSsWhatever == av == Rosario == the same tired old
troll. Trying to educate it is like trying to civilise a politician.

Richard

Keith Thompson · Oct 26, 2005

Oh, come on. cs == RoSsWhatever == av == Rosario == the same tired old
troll. Trying to educate it is like trying to civilise a politician.

Ok, I didn't make the connection with previous trolling personas.

Default User · Oct 26, 2005

Richard Bos wrote:

Oh, come on. cs == RoSsWhatever == av == Rosario == the same tired old
troll. Trying to educate it is like trying to civilise a politician.

Yep, another plonk for this tiresome goof.

Brian

Text processing	29	Sep 26, 2011
for loop skips items	13	Feb 15, 2012
i=infinity;0= isin kpi, 1=cos kpi, k=m/n, n=4,m=0-00; cG=20=const, 1/sgrt2>G>0.5, 6<N = NA ^2su	13	Aug 8, 2006
It was just a memory aid when reading C code	2	Dec 24, 2007
having trouble with Discrete Cosine Transform program	1	Mar 24, 2006
integer sqrt() table implementation	4	Mar 11, 2005
multiple scnaf's problem !	2	Apr 13, 2011
Need Help speeding up mandelbrot algo	2	Oct 30, 2003

Cosine algo, was Re: sqrt algo

Dik T. Winter

Dik T. Winter

Arthur J. O'Dwyer

Keith Thompson

Dik T. Winter

P.J. Plauger

Gerry Quinn

cs

che_nome

Dik T. Winter

cs

Dik T. Winter

cs

Keith Thompson

Dik T. Winter

cs

Keith Thompson

Richard Bos

Keith Thompson

Default User

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads