cosine calcs

Niv · Aug 11, 2006

I may need to perform the following function:
Calculate the full included angle between Z axis given X axis angle & Y
axis angle
(I think this is cos(full) = cos(x).cos(Y) [confirm?]

I'm given X & Y (radians or degrees not yet known, but that's just
scaling)
I then have to test to see if the full angle is greater than a fixed
value, lets say 40 deg as an example.
So cos(40) is a constant which I can compare with cos(x).cos(Y).

How do I calc the cos of X & Y? , before I then multiply.
I can't use a lookup table as the X & Y are wide, which would build
a prohibitively large LUT. However, I have plenty of time to do some
sort
of iterative algorithm (200 us plus), so could calc cos(X), store, then
cos(Y), mult
and finally compare to my constant.

TIA, Niv.

Jonathan Bromley · Aug 11, 2006

Kevin,

How do I calc the cos of X & Y? , before I then multiply.
I can't use a lookup table as the X & Y are wide, which would build
a prohibitively large LUT. However, I have plenty of time to do some
sort
of iterative algorithm (200 us plus), so could calc cos(X), store, then
cos(Y), mult

Standard answer: see Ray Andraka's website www.andraka.com
for lots of great information on CORDIC algorithms. Sounds like
it's the perfect match for what you need. Given you have so
much time, a bit-serial CORDIC could be very compact indeed.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
(e-mail address removed)
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Niv · Aug 11, 2006

Jonathan said:
Kevin,

Standard answer: see Ray Andraka's website www.andraka.com
for lots of great information on CORDIC algorithms. Sounds like
it's the perfect match for what you need. Given you have so
much time, a bit-serial CORDIC could be very compact indeed.

OK, looked at Ray website about Cordics, and now my head hurts! (No
maths since Uni, back in the (late) 70's. Well, that's my excuse.
I've copied a 16 bit wide 15 deep pipeline cordic from some free
website & got it
simulating (seems to jump about near 0 though).

But, how to change it so it accepts a wider input value (think that's
easy), but takes more time to calculate because it does many more
iterations. Goal is for a small area implementation.

Niv.

Jonathan Bromley · Aug 11, 2006

OK, looked at Ray website about Cordics, and now my head hurts! (No
maths since Uni, back in the (late) 70's. Well, that's my excuse.

You and me both!

I've copied a 16 bit wide 15 deep pipeline cordic from some free
website & got it
simulating (seems to jump about near 0 though).

It shouldn't. I've a Verilog implementation somewhere that I'll
dig up and send to you - it seems to be pretty well behaved.
The only major issue is that you mustn't give it input angles
outside a certain range - it's close to +/- 100 degrees, can't
remember the exact number. Some implementations cheat
and restrict the range even further, to about half that, to save
one processing step - this is still useful, because you still
have a full quadrant of angle range.

But, how to change it so it accepts a wider input value (think that's
easy), but takes more time to calculate because it does many more
iterations.

I think all you need to do is make the arctan(2^(-N)) table longer
and wider. Once you've hit 16 bits, the "fudge factor" that you
get (product of the cosines of all the values in that table) is so
close to constant with increasing N that you stop worrying
about it. And of course all the datapaths get wider everywhere.

Goal is for a small area implementation.

Hence my comment about bit-serial implementation. I admit that
I found that part of Ray Andraka's description somewhat heavy
going, but the basic idea is easy - just doing the sum/difference
calculations one bit at a time.

I'll dig out what I have over the weekend. No promises about
quality of documentation though - I did it as an experiment,
not intended for public consumption. And I have a nasty feeling
it might be written in Verilog :-(
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
(e-mail address removed)
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Andy Ray · Aug 12, 2006

Jonathan said:
You and me both!

I've been working on a cordic circuit generator built using HDCaml.
It's pretty much working now so I have posted it at:

http://www.evilkid.pwp.blueyonder.co.uk/

Probably a few bugs to work through (and requires some decent
documentation) but for simple stuff (like sin/cos, mag/phase etc) it
seems to be working ok.

It shouldn't. I've a Verilog implementation somewhere that I'll
dig up and send to you - it seems to be pretty well behaved.
The only major issue is that you mustn't give it input angles
outside a certain range - it's close to +/- 100 degrees, can't
remember the exact number.

1.74 radians or 99.6 degrees so you were pretty close...

Some implementations cheat

and restrict the range even further, to about half that, to save
one processing step - this is still useful, because you still
have a full quadrant of angle range.

I think all you need to do is make the arctan(2^(-N)) table longer
and wider. Once you've hit 16 bits, the "fudge factor" that you
get (product of the cosines of all the values in that table) is so
close to constant with increasing N that you stop worrying
about it. And of course all the datapaths get wider everywhere.

If it's the CORDIC from opencores then that implements a fully unrolled
algorithm so each extra stage costs some adders, muxes and registers.
It's quite an expensive architecture but very fast.

Hence my comment about bit-serial implementation. I admit that
I found that part of Ray Andraka's description somewhat heavy
going, but the basic idea is easy - just doing the sum/difference
calculations one bit at a time.

I'll dig out what I have over the weekend. No promises about
quality of documentation though - I did it as an experiment,
not intended for public consumption. And I have a nasty feeling
it might be written in Verilog :-(

It doesn't offer a bit-serial implementation (yet) but it can generate
iterative or pipelined architectures of any precision (in vhdl and
verilog). The iterative version is reasonably efficient. Perhaps if
you have the chance you would compare the implementation to your hand
crafted one.

To generate a sincos core use something like:

cordic_gen -b 24 -f 4 -i 20 -a sincos -m iter -vhdl

For vhdl simulation and synthesis you also need to compile the package
in hdcaml.vhd.

Since I cant seem to get the ocaml native code compilers to work on my
system the program requires an installation of ocaml to work.

I hope someone finds it useful.

Cheers,

Andy.

Jonathan Bromley · Aug 15, 2006

I've copied a 16 bit wide 15 deep pipeline cordic from some free
website & got it
simulating (seems to jump about near 0 though).

But, how to change it so it accepts a wider input value (think that's
easy), but takes more time to calculate because it does many more
iterations. Goal is for a small area implementation.

Can you quantify "small"? I resuscitated my iterative CORDIC (one
clock cycle per bit of data, not pipelined); for 16 data bits and 3
guard bits it needs 62 FFs and about 230 LUTs in a Spartan-3.
I haven't tried really hard to optimise it for area; for example,
the arctangent lookup table is not yet in a ROM, but is in general
logic. I get something like 100MHz clock rate without trying
too hard. There are plenty of opportunities for making the
logic a little smaller and faster, but not much chance of
reducing the FF count. The size scales linearly with bit width
of the data (or very nearly so - the ROM grows faster than
linearly, but it's not a big deal).

Bit-serial implementation would be smaller but needs the same
number of FFs; however, by careful shoehorning you could get
some of the storage into SRL16s and save quite a lot of space.
That sort of jumping-on-the-suitcase is not really my strength,
though!

I'll try to put the example on our website before long, but
meanwhile let me know if you want a copy.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
(e-mail address removed)
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Jan Decaluwe · Aug 15, 2006

Jonathan said:
Can you quantify "small"? I resuscitated my iterative CORDIC (one
clock cycle per bit of data, not pipelined); for 16 data bits and 3
guard bits it needs 62 FFs and about 230 LUTs in a Spartan-3.
I haven't tried really hard to optimise it for area; for example,
the arctangent lookup table is not yet in a ROM, but is in general
logic. I get something like 100MHz clock rate without trying
too hard.

Readers may be interested in the following page which describes
an apparently similar design (in MyHDL/Python):

http://myhdl.jandecaluwe.com/doku.php/cookbook:sinecomp:synthesis

The page includes testbench, code, Verilog output and FPGA synthesis
report. MyHDL currently converts only to Verilog, but VHDL output is
in the works.

Regards,
Jan

Jan Decaluwe · Aug 15, 2006

Jan said:
Readers may be interested in the following page which describes
an apparently similar design (in MyHDL/Python):

http://myhdl.jandecaluwe.com/doku.php/cookbook:sinecomp:synthesis

Mm, that page should really be

http://myhdl.jandecaluwe.com/doku.php/cookbook:sinecomp

Niv · Aug 16, 2006

Jonathan said:
Can you quantify "small"? I resuscitated my iterative CORDIC (one
clock cycle per bit of data, not pipelined); for 16 data bits and 3
guard bits it needs 62 FFs and about 230 LUTs in a Spartan-3.
I haven't tried really hard to optimise it for area; for example,
the arctangent lookup table is not yet in a ROM, but is in general
logic. I get something like 100MHz clock rate without trying
too hard. There are plenty of opportunities for making the
logic a little smaller and faster, but not much chance of
reducing the FF count. The size scales linearly with bit width
of the data (or very nearly so - the ROM grows faster than
linearly, but it's not a big deal).

Bit-serial implementation would be smaller but needs the same
number of FFs; however, by careful shoehorning you could get
some of the storage into SRL16s and save quite a lot of space.
That sort of jumping-on-the-suitcase is not really my strength,
though!

I'll try to put the example on our website before long, but
meanwhile let me know if you want a copy.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
(e-mail address removed)
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

The size you describe sounds plenty small enough, target device is
probably an Actel APA1000, but a lot of other stuff is going in there
as well. 100 MHz is plenty fast enough as well, our master clock is
only 33MHz, possible a bit more, but I have 1/2 ms to do all my calcs
(to be confirmed though).

I'm starting to get my head round the cordic, so I'll persevere a bit
more so I understand it and then code it up myself (VHDL), with loads
of generics to paramaterise it, if I can.
But looking at other designs may help my understanding.
It seems a word serial is only about twice the size of a bit serial, so
may look at both types of implementation. A fully parallel design
would be too resource hungry though, I think.

Regards, Kev P. (AKA Niv).

Jonathan Bromley · Aug 16, 2006

On 16 Aug 2006 00:13:00 -0700, Niv

[me]
whoops, typo, that should have been "about 330 LUTs". Sorry.
This count scales pretty much linearly with the number of bits
in the data path, and is only very slightly affected by the number
of iterations you perform.

[Kevin]

I'm starting to get my head round the cordic, so I'll persevere a bit
more so I understand it and then code it up myself (VHDL), with loads
of generics to paramaterise it, if I can.

I parameterised for...
* bit width of the coordinate and angle I/Os
* number of guard LSBs added to the internal datapaths
* number of iterations of the algorithm

Does it need any more? I'm not sure.

But looking at other designs may help my understanding.

I've emailed it to you separately. The testbench is ropey, but
it shows the thing working as a cos/sin calculator.

It seems a word serial is only about twice the size of a bit serial

Indeed. Unless you do very, very clever stuff with the bit shifters,
you need the same number of FFs for both. But the LUT count should
be significantly lower than for bit-serial. So maybe it could be got
down to half the size of word-serial, or thereabouts.

A fully parallel design
would be too resource hungry though, I think.

Yes, for any respectable accuracy. The fully-parallel design
is simpler because you don't need a barrel shifter, but bigger
because you replicate the adder hardware for each iteration.
I reckon the barel shifter is about as big as 3 adders, so you
can guess that a 16+3-bit, 19-iteration version would come
out around 5X bigger than the word-serial form. And given
you've got so much time to do the calculation, it would be
a completely pointless waste of space.

Thanks for provoking me into bringing that design back to
life - it gave me a chance to tidy it up, and it now exists in
both VHDL and Verilog flavours. --
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
(e-mail address removed)
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Drawing missing in bitmap in a pure C win32 program	4	Jun 3, 2023
Picture Comparison Code Not Working Properly	1	Jul 24, 2021
Multi-function, universal, CORDIC IP available from the OpenCores website	0	Mar 7, 2014
Help me with output TEXTIO please?	0	Oct 30, 2006
Roots Module	10	Dec 25, 2009
[ANN] Multidimensional Array - MDArray 0.5.5	0	Nov 19, 2013
Immutable Geometry Types	2	Dec 17, 2007
Drawing Sinus curve in Python	2	Jun 1, 2014

cosine calcs

Niv

Jonathan Bromley

Niv

Jonathan Bromley

Andy Ray

Jonathan Bromley

Jan Decaluwe

Jan Decaluwe

Niv

Jonathan Bromley

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads