Thanks to everyone who has posted in response to my original message.
Perhaps I should clarify what I am asking. I have a suite of
numerical codes that I recently profiled and found that the standard
sin and cos routines are a huge percentage of my total run time (which
is many days). A colleague told me that the standard math libraries
are optimized for size, not speed, and that since I always call sin
=============================
Usually they are optimised for accuracy.
and cos of the same argument, I should look for a speed optimized
sincos routine which shouldn't take much more time than either a sin
or cos take individually while maintaining the same level of accuracy.
BTW, I'm using Borland C++ Builder 6.0 on a Pentium IV in Win2k.
Have you looked at reducing the number of calls to sin and cos? For
example, consecutive values of sin (a + k * b), for k = 0, 1, 2, 3, etc.
can be calculated very easily with a single multiplication and addition.
Anything doing 3D graphics can usually be done with hardly any
trigonometric functions at all.
Do you have values that are very close together?
sin (x + eps) = sin (x) * cos (eps) + sin (eps) * cos (x)
cos (x + eps) = cos (x) * cos (eps) - sin (eps) * sin (x)
(You better check these)
If eps is small enough then you can replace cos (eps) with 1, sin (eps)
with eps and get
sin (x + eps) = sin (x) + eps * cos (x)
cos (x + eps) = cos (x) - eps * sin (x)
Grab the source of an existing implementation of sin and cos. They all
do two steps, for example for sin (x):
Step 1: Given x, find k such that abs (x - k * pi/2) <= pi/4.
Step 2: Let y = x - k * pi/2.
Step 3: Calculate one of sin(y), cos (y), -sin(y), -cos(y),
depending on the last two bits of k, using a polynomial.
By calculating sin and cos simultaneously, you know both will have the
same k and y. You also will have to calculate both sin(y) and cos(y)
using two polynomials, then just pick the right results and apply the
sign. So you win by just merging two such implementations.
If you have many calls, chances are the arguments are close together, so
many consecutive arguments will use the same value k. Try writing a
vectorised function:
void vec_sincos (double s[], double c[], double x[], size_t n);
where you will lose lots of the overhead and give the compiler a chance
of optimising.
BTW. Profilers have been known to lie, especially for small function
calls. Just write a test program that does a billion calls to sin and
cos, profile it, and compare the results with stopwatch results to make
sure you are not going down the wrong path.