complexity of trigonometric functions

Alf P. Steinbach /Usenet · Sep 6, 2010

* James Kanze, on 06.09.2010 13:33:

[...]
According to my 8087 manual (Intel 8086 manual, July 1981,
appendix S), the 8087 instruction set did not include FSIN or
FCOS.

Click to expand...

The 8087 definitely had some support for trig, since I used it
back then. IIRC, it was in the form of an FSINCOS or FCOSSIN
function, which calculated both sin and cos (over a very limited
range).

My 8087 manual was lost in a move, many, many years ago; if
you've got one and can check it, I'd be curious to know.

I don't have an 8087 manual any longer, but my little Turbo Assembler Quick
Reference Guide says FSINCOS is "387 and i486 only".

However, it seems that the 8087 did know about PI (that it supported FLDPI),
which is more than even the C++ standard library does today! <g>

I think back in 1978/79 it was pretty advanced just to have floating point
support on chipset for personal computers.

Cheers & hth.,

- Alf

SG · Sep 6, 2010

I have a piece of code that I am trying to optimize. It is nothing
special - just some calculations, including trigonometric functions
(bunch of multiplications with sin and cos here and there). My code
duplicates in some places, but that is ok.

In case you want to compute a sequence of sine values a la

f[k] = sin(a+b*k)

you can make use of a simple linear recurrence:

f[0] = sin(a)
f[1] = sin(a+b)
f[k] = 2 cos(b) f[k-1] - f[k-2]

where 2*cos(b) is obviously a constant. And since cos(x)=sin(pi/2+x)
you can use this to compute cosine values as well, of course. Only two
FLOPs per value is a very low number of operations. The only drawback
is rounding errors. Every one in a while one should start new or
somehow "normalize" the values so that the amplitude doesn't decrease
or increase due to rounding errors.

Cheers!
SG

Vladimir Jovic · Sep 6, 2010

SG said:
I have a piece of code that I am trying to optimize. It is nothing
special - just some calculations, including trigonometric functions
(bunch of multiplications with sin and cos here and there). My code
duplicates in some places, but that is ok.

Click to expand...

In case you want to compute a sequence of sine values a la

f[k] = sin(a+b*k)

you can make use of a simple linear recurrence:

f[0] = sin(a)
f[1] = sin(a+b)
f[k] = 2 cos(b) f[k-1] - f[k-2]

where 2*cos(b) is obviously a constant. And since cos(x)=sin(pi/2+x)
you can use this to compute cosine values as well, of course. Only two
FLOPs per value is a very low number of operations. The only drawback
is rounding errors. Every one in a while one should start new or
somehow "normalize" the values so that the amplitude doesn't decrease
or increase due to rounding errors.

That is exactly what I am doing. Thanks for the tip.
btw is 1000 iterations (k=1000) going to produce big error?

SG · Sep 6, 2010

SG said:
SG said:

In case you want to compute a sequence of sine values a la

Click to expand...

f[k] = sin(a+b*k)

Click to expand...

you can make use of a simple linear recurrence:

Click to expand...

f[0] = sin(a)
f[1] = sin(a+b)
f[k] = 2 cos(b) f[k-1] - f[k-2]

Click to expand...

where 2*cos(b) is obviously a constant. And since cos(x)=sin(pi/2+x)
you can use this to compute cosine values as well, of course. Only two
FLOPs per value is a very low number of operations. The only drawback
is rounding errors. Every one in a while one should start new or
somehow "normalize" the values so that the amplitude doesn't decrease
or increase due to rounding errors.

Click to expand...

That is exactly what I am doing. Thanks for the tip.
btw is 1000 iterations (k=1000) going to produce big error?

I'm not sure. The rounding error in the precomputed constant for
2*cos(b) obviously affects the frequency to some degree. And a
rounding error 'e' in one of the samples produces an infinite response
(a sinusoid with amplitude e/sin(b)). Assuming the rounding errors are
not correlated and ignoring the frequency error, the expected absolute
error for the k-th sample should be proportional to sqrt(k)/sin(b).

If the frequency dependency (1/sin(b)) is bugging you and you need
both, sin AND cos an alternative which is almost as fast would be to
use the following complex-valued 1st order linear recurrence:

c[k] = exp(i*(a+b*k))
= exp(i* b ) * c[k-1]

where -1=i*i, real(c[k])=cos(a+b*k) and imag(c[k])=sin(a+b*k). With
this recurrence you need 4 multiplications and two additions for a
pair of sin/cos values. That's twice as many multiplications compared
to above. You still have the sqrt(k)-scaling but the frequency
dependency is gone now (yay!).

Since you're concerned about speed, let me share my recent experiences
with GCC's std::complex<double> implementation: Without a compiler
option like -ffast-math operator* maps to a function called __muldc3
(according to the profiler) which is terribly slow compared to 4 MULs
and 2 ADDs for some reason. I also tested operator* for std::complex<>
using a recent Microsoft compiler. According to my measurements
operator* was only half as fast compared to 4 MULs and 2 ADDs on an
old Pentium 4 HT. I guess there's some special case handling involved
with respect to NaN or Inf that is slowing down the multiplication.

Cheers!
SG

Vladimir Jovic · Sep 7, 2010

SG said:
SG said:

In case you want to compute a sequence of sine values a la
f[k] = sin(a+b*k)
you can make use of a simple linear recurrence:
f[0] = sin(a)
f[1] = sin(a+b)
f[k] = 2 cos(b) f[k-1] - f[k-2]
where 2*cos(b) is obviously a constant. And since cos(x)=sin(pi/2+x)
you can use this to compute cosine values as well, of course. Only two
FLOPs per value is a very low number of operations. The only drawback
is rounding errors. Every one in a while one should start new or
somehow "normalize" the values so that the amplitude doesn't decrease
or increase due to rounding errors.

Click to expand...

That is exactly what I am doing. Thanks for the tip.
btw is 1000 iterations (k=1000) going to produce big error?

Click to expand...

I'm not sure. The rounding error in the precomputed constant for
2*cos(b) obviously affects the frequency to some degree. And a
rounding error 'e' in one of the samples produces an infinite response
(a sinusoid with amplitude e/sin(b)). Assuming the rounding errors are
not correlated and ignoring the frequency error, the expected absolute
error for the k-th sample should be proportional to sqrt(k)/sin(b).

That means for low angles, the error is higher. Then I can not use it,
since my angles are around zero (going +-15 degrees in small steps).

If the frequency dependency (1/sin(b)) is bugging you and you need
both, sin AND cos an alternative which is almost as fast would be to
use the following complex-valued 1st order linear recurrence:

c[k] = exp(i*(a+b*k))
= exp(i* b ) * c[k-1]

where -1=i*i, real(c[k])=cos(a+b*k) and imag(c[k])=sin(a+b*k). With
this recurrence you need 4 multiplications and two additions for a
pair of sin/cos values. That's twice as many multiplications compared
to above. You still have the sqrt(k)-scaling but the frequency
dependency is gone now (yay!).

So far, I have implemented this using straight method (using standard
library's sin/cos), and to be honest I haven't thought of using complex
numbers, but I must admit

Since you're concerned about speed, let me share my recent experiences
with GCC's std::complex<double> implementation: Without a compiler
option like -ffast-math operator* maps to a function called __muldc3
(according to the profiler) which is terribly slow compared to 4 MULs
and 2 ADDs for some reason. I also tested operator* for std::complex<>
using a recent Microsoft compiler. According to my measurements
operator* was only half as fast compared to 4 MULs and 2 ADDs on an
old Pentium 4 HT. I guess there's some special case handling involved
with respect to NaN or Inf that is slowing down the multiplication.

Thanks, I will play a bit with the profiler, and see which method is the
best (for my purpose)

Miles Bader · Sep 8, 2010

SG said:
Without a compiler option like -ffast-math operator* maps to a
function called __muldc3 (according to the profiler) which is terribly
slow compared to 4 MULs and 2 ADDs for some reason. I also tested
operator* for std::complex<> using a recent Microsoft
compiler. According to my measurements operator* was only half as fast
compared to 4 MULs and 2 ADDs on an old Pentium 4 HT. I guess there's
some special case handling involved with respect to NaN or Inf that is
slowing down the multiplication.

Perhaps that's this (from the gcc manual):

`-fcx-limited-range'
When enabled, this option states that a range reduction step is not
needed when performing complex division. Also, there is no
checking whether the result of a complex multiplication or
division is `NaN + I*NaN', with an attempt to rescue the situation
in that case. The default is `-fno-cx-limited-range', but is
enabled by `-ffast-math'.

This option controls the default setting of the ISO C99
`CX_LIMITED_RANGE' pragma. Nevertheless, the option applies to
all languages.

`-fcx-fortran-rules'
Complex multiplication and division follow Fortran rules. Range
reduction is done as part of complex division, but there is no
checking whether the result of a complex multiplication or
division is `NaN + I*NaN', with an attempt to rescue the situation
in that case.

The default is `-fno-cx-fortran-rules'.

-miles

Vectorization of template functions	2	Jun 22, 2007
Critique requested.....	5	Jan 25, 2006
Looking for HDL code for sin( a ) and x ** y Functions	13	Oct 2, 2006
function of 2d array	4	Nov 7, 2006
[ANN] Multidimensional Array - MDArray 0.5.5	0	Nov 19, 2013
@PyNoobs: The Fundamental Five Built-in Functions, and Beyond!	4	Jul 27, 2011
Floating-point glitches with the math module. Bug? Or am I missing something?	15	Sep 20, 2004
[ANN] macstl 0.2 -- portable SIMD toolkit, fast valarray transcendentals, fast Mach vectors	0	Feb 2, 2005

complexity of trigonometric functions

Alf P. Steinbach /Usenet

SG

Vladimir Jovic

SG

Vladimir Jovic

Miles Bader

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads