cosine calcs

Discussion in 'VHDL' started by Niv, Aug 11, 2006.

  1. Niv

    Niv Guest

    I may need to perform the following function:
    Calculate the full included angle between Z axis given X axis angle & Y
    axis angle
    (I think this is cos(full) = cos(x).cos(Y) [confirm?]

    I'm given X & Y (radians or degrees not yet known, but that's just
    scaling)
    I then have to test to see if the full angle is greater than a fixed
    value, lets say 40 deg as an example.
    So cos(40) is a constant which I can compare with cos(x).cos(Y).

    How do I calc the cos of X & Y? , before I then multiply.
    I can't use a lookup table as the X & Y are wide, which would build
    a prohibitively large LUT. However, I have plenty of time to do some
    sort
    of iterative algorithm (200 us plus), so could calc cos(X), store, then
    cos(Y), mult
    and finally compare to my constant.

    TIA, Niv.
     
    Niv, Aug 11, 2006
    #1
    1. Advertising

  2. On 11 Aug 2006 04:04:53 -0700, "Niv" <> wrote:

    Kevin,

    >How do I calc the cos of X & Y? , before I then multiply.
    >I can't use a lookup table as the X & Y are wide, which would build
    >a prohibitively large LUT. However, I have plenty of time to do some
    >sort
    >of iterative algorithm (200 us plus), so could calc cos(X), store, then
    >cos(Y), mult


    Standard answer: see Ray Andraka's website www.andraka.com
    for lots of great information on CORDIC algorithms. Sounds like
    it's the perfect match for what you need. Given you have so
    much time, a bit-serial CORDIC could be very compact indeed.
    --
    Jonathan Bromley, Consultant

    DOULOS - Developing Design Know-how
    VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

    Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK

    http://www.MYCOMPANY.com

    The contents of this message may contain personal views which
    are not the views of Doulos Ltd., unless specifically stated.
     
    Jonathan Bromley, Aug 11, 2006
    #2
    1. Advertising

  3. Niv

    Niv Guest

    Jonathan Bromley wrote:

    > On 11 Aug 2006 04:04:53 -0700, "Niv" <> wrote:
    >
    > Kevin,
    >
    > >How do I calc the cos of X & Y? , before I then multiply.
    > >I can't use a lookup table as the X & Y are wide, which would build
    > >a prohibitively large LUT. However, I have plenty of time to do some
    > >sort
    > >of iterative algorithm (200 us plus), so could calc cos(X), store, then
    > >cos(Y), mult

    >
    > Standard answer: see Ray Andraka's website www.andraka.com
    > for lots of great information on CORDIC algorithms. Sounds like
    > it's the perfect match for what you need. Given you have so
    > much time, a bit-serial CORDIC could be very compact indeed.
    > --
    > Jonathan Bromley, Consultant


    OK, looked at Ray website about Cordics, and now my head hurts! (No
    maths since Uni, back in the (late) 70's. Well, that's my excuse.
    I've copied a 16 bit wide 15 deep pipeline cordic from some free
    website & got it
    simulating (seems to jump about near 0 though).

    But, how to change it so it accepts a wider input value (think that's
    easy), but takes more time to calculate because it does many more
    iterations. Goal is for a small area implementation.

    Niv.
     
    Niv, Aug 11, 2006
    #3
  4. On 11 Aug 2006 08:11:39 -0700, Niv <> wrote:


    >OK, looked at Ray website about Cordics, and now my head hurts! (No
    >maths since Uni, back in the (late) 70's. Well, that's my excuse.


    You and me both!

    >I've copied a 16 bit wide 15 deep pipeline cordic from some free
    >website & got it
    >simulating (seems to jump about near 0 though).


    It shouldn't. I've a Verilog implementation somewhere that I'll
    dig up and send to you - it seems to be pretty well behaved.
    The only major issue is that you mustn't give it input angles
    outside a certain range - it's close to +/- 100 degrees, can't
    remember the exact number. Some implementations cheat
    and restrict the range even further, to about half that, to save
    one processing step - this is still useful, because you still
    have a full quadrant of angle range.

    >But, how to change it so it accepts a wider input value (think that's
    >easy), but takes more time to calculate because it does many more
    >iterations.


    I think all you need to do is make the arctan(2^(-N)) table longer
    and wider. Once you've hit 16 bits, the "fudge factor" that you
    get (product of the cosines of all the values in that table) is so
    close to constant with increasing N that you stop worrying
    about it. And of course all the datapaths get wider everywhere.

    > Goal is for a small area implementation.


    Hence my comment about bit-serial implementation. I admit that
    I found that part of Ray Andraka's description somewhat heavy
    going, but the basic idea is easy - just doing the sum/difference
    calculations one bit at a time.

    I'll dig out what I have over the weekend. No promises about
    quality of documentation though - I did it as an experiment,
    not intended for public consumption. And I have a nasty feeling
    it might be written in Verilog :-(
    --
    Jonathan Bromley, Consultant

    DOULOS - Developing Design Know-how
    VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

    Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK

    http://www.MYCOMPANY.com

    The contents of this message may contain personal views which
    are not the views of Doulos Ltd., unless specifically stated.
     
    Jonathan Bromley, Aug 11, 2006
    #4
  5. Niv

    Andy Ray Guest

    Jonathan Bromley wrote:
    > On 11 Aug 2006 08:11:39 -0700, Niv <> wrote:
    >
    >
    >> OK, looked at Ray website about Cordics, and now my head hurts! (No
    >> maths since Uni, back in the (late) 70's. Well, that's my excuse.

    >
    > You and me both!
    >




    I've been working on a cordic circuit generator built using HDCaml.
    It's pretty much working now so I have posted it at:

    http://www.evilkid.pwp.blueyonder.co.uk/

    Probably a few bugs to work through (and requires some decent
    documentation) but for simple stuff (like sin/cos, mag/phase etc) it
    seems to be working ok.



    >> I've copied a 16 bit wide 15 deep pipeline cordic from some free
    >> website & got it
    >> simulating (seems to jump about near 0 though).

    >
    > It shouldn't. I've a Verilog implementation somewhere that I'll
    > dig up and send to you - it seems to be pretty well behaved.
    > The only major issue is that you mustn't give it input angles
    > outside a certain range - it's close to +/- 100 degrees, can't
    > remember the exact number.



    1.74 radians or 99.6 degrees so you were pretty close...


    Some implementations cheat
    > and restrict the range even further, to about half that, to save
    > one processing step - this is still useful, because you still
    > have a full quadrant of angle range.
    >
    >> But, how to change it so it accepts a wider input value (think that's
    >> easy), but takes more time to calculate because it does many more
    >> iterations.

    >
    > I think all you need to do is make the arctan(2^(-N)) table longer
    > and wider. Once you've hit 16 bits, the "fudge factor" that you
    > get (product of the cosines of all the values in that table) is so
    > close to constant with increasing N that you stop worrying
    > about it. And of course all the datapaths get wider everywhere.
    >



    If it's the CORDIC from opencores then that implements a fully unrolled
    algorithm so each extra stage costs some adders, muxes and registers.
    It's quite an expensive architecture but very fast.


    >> Goal is for a small area implementation.

    >
    > Hence my comment about bit-serial implementation. I admit that
    > I found that part of Ray Andraka's description somewhat heavy
    > going, but the basic idea is easy - just doing the sum/difference
    > calculations one bit at a time.
    >
    > I'll dig out what I have over the weekend. No promises about
    > quality of documentation though - I did it as an experiment,
    > not intended for public consumption. And I have a nasty feeling
    > it might be written in Verilog :-(



    It doesn't offer a bit-serial implementation (yet) but it can generate
    iterative or pipelined architectures of any precision (in vhdl and
    verilog). The iterative version is reasonably efficient. Perhaps if
    you have the chance you would compare the implementation to your hand
    crafted one.

    To generate a sincos core use something like:

    cordic_gen -b 24 -f 4 -i 20 -a sincos -m iter -vhdl

    For vhdl simulation and synthesis you also need to compile the package
    in hdcaml.vhd.

    Since I cant seem to get the ocaml native code compilers to work on my
    system the program requires an installation of ocaml to work.

    I hope someone finds it useful.

    Cheers,

    Andy.
     
    Andy Ray, Aug 12, 2006
    #5
  6. On 11 Aug 2006 08:11:39 -0700,
    Niv <> wrote:

    >I've copied a 16 bit wide 15 deep pipeline cordic from some free
    >website & got it
    >simulating (seems to jump about near 0 though).
    >
    >But, how to change it so it accepts a wider input value (think that's
    >easy), but takes more time to calculate because it does many more
    >iterations. Goal is for a small area implementation.


    Can you quantify "small"? I resuscitated my iterative CORDIC (one
    clock cycle per bit of data, not pipelined); for 16 data bits and 3
    guard bits it needs 62 FFs and about 230 LUTs in a Spartan-3.
    I haven't tried really hard to optimise it for area; for example,
    the arctangent lookup table is not yet in a ROM, but is in general
    logic. I get something like 100MHz clock rate without trying
    too hard. There are plenty of opportunities for making the
    logic a little smaller and faster, but not much chance of
    reducing the FF count. The size scales linearly with bit width
    of the data (or very nearly so - the ROM grows faster than
    linearly, but it's not a big deal).

    Bit-serial implementation would be smaller but needs the same
    number of FFs; however, by careful shoehorning you could get
    some of the storage into SRL16s and save quite a lot of space.
    That sort of jumping-on-the-suitcase is not really my strength,
    though!

    I'll try to put the example on our website before long, but
    meanwhile let me know if you want a copy.
    --
    Jonathan Bromley, Consultant

    DOULOS - Developing Design Know-how
    VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

    Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK

    http://www.MYCOMPANY.com

    The contents of this message may contain personal views which
    are not the views of Doulos Ltd., unless specifically stated.
     
    Jonathan Bromley, Aug 15, 2006
    #6
  7. Niv

    Jan Decaluwe Guest

    Jonathan Bromley wrote:
    > On 11 Aug 2006 08:11:39 -0700,
    > Niv <> wrote:
    >
    >
    >>I've copied a 16 bit wide 15 deep pipeline cordic from some free
    >>website & got it
    >>simulating (seems to jump about near 0 though).
    >>
    >>But, how to change it so it accepts a wider input value (think that's
    >>easy), but takes more time to calculate because it does many more
    >>iterations. Goal is for a small area implementation.

    >
    >
    > Can you quantify "small"? I resuscitated my iterative CORDIC (one
    > clock cycle per bit of data, not pipelined); for 16 data bits and 3
    > guard bits it needs 62 FFs and about 230 LUTs in a Spartan-3.
    > I haven't tried really hard to optimise it for area; for example,
    > the arctangent lookup table is not yet in a ROM, but is in general
    > logic. I get something like 100MHz clock rate without trying
    > too hard.


    Readers may be interested in the following page which describes
    an apparently similar design (in MyHDL/Python):

    http://myhdl.jandecaluwe.com/doku.php/cookbook:sinecomp:synthesis

    The page includes testbench, code, Verilog output and FPGA synthesis
    report. MyHDL currently converts only to Verilog, but VHDL output is
    in the works.

    Regards,
    Jan

    --
    Jan Decaluwe - Resources bvba - http://www.jandecaluwe.com
    Losbergenlaan 16, B-3010 Leuven, Belgium
    From Python to silicon:
    http://myhdl.jandecaluwe.com
     
    Jan Decaluwe, Aug 15, 2006
    #7
  8. Niv

    Jan Decaluwe Guest

    Jan Decaluwe, Aug 15, 2006
    #8
  9. Niv

    Niv Guest

    Jonathan Bromley wrote:
    > On 11 Aug 2006 08:11:39 -0700,
    > Niv <> wrote:
    >
    > >I've copied a 16 bit wide 15 deep pipeline cordic from some free
    > >website & got it
    > >simulating (seems to jump about near 0 though).
    > >
    > >But, how to change it so it accepts a wider input value (think that's
    > >easy), but takes more time to calculate because it does many more
    > >iterations. Goal is for a small area implementation.

    >
    > Can you quantify "small"? I resuscitated my iterative CORDIC (one
    > clock cycle per bit of data, not pipelined); for 16 data bits and 3
    > guard bits it needs 62 FFs and about 230 LUTs in a Spartan-3.
    > I haven't tried really hard to optimise it for area; for example,
    > the arctangent lookup table is not yet in a ROM, but is in general
    > logic. I get something like 100MHz clock rate without trying
    > too hard. There are plenty of opportunities for making the
    > logic a little smaller and faster, but not much chance of
    > reducing the FF count. The size scales linearly with bit width
    > of the data (or very nearly so - the ROM grows faster than
    > linearly, but it's not a big deal).
    >
    > Bit-serial implementation would be smaller but needs the same
    > number of FFs; however, by careful shoehorning you could get
    > some of the storage into SRL16s and save quite a lot of space.
    > That sort of jumping-on-the-suitcase is not really my strength,
    > though!
    >
    > I'll try to put the example on our website before long, but
    > meanwhile let me know if you want a copy.
    > --
    > Jonathan Bromley, Consultant
    >
    > DOULOS - Developing Design Know-how
    > VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services
    >
    > Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
    >
    > http://www.MYCOMPANY.com
    >
    > The contents of this message may contain personal views which
    > are not the views of Doulos Ltd., unless specifically stated.


    The size you describe sounds plenty small enough, target device is
    probably an Actel APA1000, but a lot of other stuff is going in there
    as well. 100 MHz is plenty fast enough as well, our master clock is
    only 33MHz, possible a bit more, but I have 1/2 ms to do all my calcs
    (to be confirmed though).

    I'm starting to get my head round the cordic, so I'll persevere a bit
    more so I understand it and then code it up myself (VHDL), with loads
    of generics to paramaterise it, if I can.
    But looking at other designs may help my understanding.
    It seems a word serial is only about twice the size of a bit serial, so
    may look at both types of implementation. A fully parallel design
    would be too resource hungry though, I think.

    Regards, Kev P. (AKA Niv).
     
    Niv, Aug 16, 2006
    #9
  10. On 16 Aug 2006 00:13:00 -0700, Niv
    <> wrote:

    [me]
    >> Can you quantify "small"? I resuscitated my iterative CORDIC (one
    >> clock cycle per bit of data, not pipelined); for 16 data bits and 3
    >> guard bits it needs 62 FFs and about 230 LUTs in a Spartan-3.


    whoops, typo, that should have been "about 330 LUTs". Sorry.
    This count scales pretty much linearly with the number of bits
    in the data path, and is only very slightly affected by the number
    of iterations you perform.

    [Kevin]
    >I'm starting to get my head round the cordic, so I'll persevere a bit
    >more so I understand it and then code it up myself (VHDL), with loads
    >of generics to paramaterise it, if I can.


    I parameterised for...
    * bit width of the coordinate and angle I/Os
    * number of guard LSBs added to the internal datapaths
    * number of iterations of the algorithm

    Does it need any more? I'm not sure.

    >But looking at other designs may help my understanding.


    I've emailed it to you separately. The testbench is ropey, but
    it shows the thing working as a cos/sin calculator.

    >It seems a word serial is only about twice the size of a bit serial


    Indeed. Unless you do very, very clever stuff with the bit shifters,
    you need the same number of FFs for both. But the LUT count should
    be significantly lower than for bit-serial. So maybe it could be got
    down to half the size of word-serial, or thereabouts.

    > A fully parallel design
    >would be too resource hungry though, I think.


    Yes, for any respectable accuracy. The fully-parallel design
    is simpler because you don't need a barrel shifter, but bigger
    because you replicate the adder hardware for each iteration.
    I reckon the barel shifter is about as big as 3 adders, so you
    can guess that a 16+3-bit, 19-iteration version would come
    out around 5X bigger than the word-serial form. And given
    you've got so much time to do the calculation, it would be
    a completely pointless waste of space.

    Thanks for provoking me into bringing that design back to
    life - it gave me a chance to tidy it up, and it now exists in
    both VHDL and Verilog flavours. --
    Jonathan Bromley, Consultant

    DOULOS - Developing Design Know-how
    VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

    Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK

    http://www.MYCOMPANY.com

    The contents of this message may contain personal views which
    are not the views of Doulos Ltd., unless specifically stated.
     
    Jonathan Bromley, Aug 16, 2006
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. pete

    Cosine algo, was Re: sqrt algo

    pete, Oct 15, 2005, in forum: C Programming
    Replies:
    108
    Views:
    1,736
  2. Replies:
    1
    Views:
    403
    mlimber
    Mar 24, 2006
  3. Grant Edwards

    Single precision floating point calcs?

    Grant Edwards, May 9, 2007, in forum: Python
    Replies:
    8
    Views:
    535
    Beliavsky
    May 10, 2007
  4. Niv (KP)

    LFSR calcs, passing params.

    Niv (KP), May 13, 2009, in forum: C Programming
    Replies:
    3
    Views:
    596
    Barry Schwarz
    May 16, 2009
  5. George

    Field calcs with OnBlur

    George, Jul 20, 2004, in forum: ASP General
    Replies:
    2
    Views:
    120
    Rob Collyer
    Jul 20, 2004
Loading...

Share This Page