newbie with timing problem, is adding pipeline stages the only optionto speed up?

Discussion in 'VHDL' started by festi.giorgio@gmail.com, Sep 14, 2012.

  1. Guest

    i am implementing a fir filter and it has to be(in my naive opinion) very fast.
    when i grow up the number of coefficient some timing problem arise, at first i have tried to add some pipeline stages and it get something better.
    Without going too much in the detail.. is adding pipeline stages the only option that i have to speed up the all system?

    thanks :)
     
    , Sep 14, 2012
    #1
    1. Advertising

  2. rickman Guest

    Re: newbie with timing problem, is adding pipeline stages the onlyoption to speed up?

    On 9/14/2012 2:57 AM, wrote:
    > i am implementing a fir filter and it has to be(in my naive opinion) very fast.
    > when i grow up the number of coefficient some timing problem arise, at first i have tried to add some pipeline stages and it get something better.
    > Without going too much in the detail.. is adding pipeline stages the only option that i have to speed up the all system?
    >
    > thanks :)



    No, there are other options.

    Rick
     
    rickman, Sep 14, 2012
    #2
    1. Advertising

  3. Guest

    Re: newbie with timing problem, is adding pipeline stages the onlyoption to speed up?

    It first depends on what you mean by "very fast".

    In some devices there are dedicated multipliers or MACC (e.g. DSP48) that are supposed to work up to ~400MHz-~600MHz.
    Although this is true, you would have to carefully tweak your architecture,by either correctly inferring the multiplier components and cascading themwith appropriate register (pipelining) stages. Also the max achievable speed might be limited by the actual physical number of multipliers in an FPGAcolumn e.g. 7 (cascading to another column might reduce the theoretical max speed).
    You could also use tools like the Core Generator (Xilinx) to generate such filters (I have not done that).

    If on the other hand you do not care too much about such details, and wouldrather want to speed-up your design's timing some rules of thumb are:
    - Trim your bit-width as soon as possible. (e.g. some dedicated multipliersor MACC like the DSP48 are typically SIGNED 18x25in and 48out, with a carry in of 48).
    - Add pipeline stages. At the input of the multipliers, at the output, and at the adder stages. (For most applications, these latencies are consideredinsignificant).
    - Consider using an "adder chain" instead of an "adder tree" for the final summation. (An adder is included in the DSP48 and can be efficiently cascaded using an adder chain).
    - Use a synchronous reset??? (Depending on the manufacturer, sometimes dedicated multipliers can only be inferred if you use a sync-reset, otherwise you'll end up using the slower FPGA fabric).

    For your case, it might be that you have a huge combinatorial adder tree at the output (e.g. SUM = A+B+C+D...+Y+Z), this is why when you increase the number of coefficients, the timing fails (but I’m just guessing at this point, I could not tell without looking at the code).

    You do not have to know all the details and inner workings, just a bit of understanding of your current platform can help you code efficiently and in a way that the synthesizer can better understand.

    You might also want to take a look at the synthesizer/platform user guides (e.g. the "XST User Guide") these sometimes contain VHDL templates for specific cases.

    I hope this helps ! :D
     
    , Sep 14, 2012
    #3
  4. Robert Miles Guest

    Re: newbie with timing problem, is adding pipeline stages the onlyoption to speed up?

    On Friday, September 14, 2012 1:57:40 AM UTC-5, wrote:
    > i am implementing a fir filter and it has to be(in my naive opinion) veryfast.
    >
    > when i grow up the number of coefficient some timing problem arise, at first i have tried to add some pipeline stages and it get something better.
    >
    > Without going too much in the detail.. is adding pipeline stages the onlyoption that i have to speed up the all system?
    >
    >
    >
    > thanks :)


    Adding carefully controlled delays in the appropriate places is POSSIBLE, but hard to design, partly since ASIC and FPGAs seldom have any way other than clocked stages to supply delays with a well-controlled value range. This would mean longer to get the result, but usually less than pipelining.

    Do ypu have a way to do more of the first part of the calculations in parallel, then combine them in another stage?
     
    Robert Miles, Sep 15, 2012
    #4
  5. Re: newbie with timing problem, is adding pipeline stages the onlyoption to speed up?

    Off the top of my head here are the options

    1 - pipelining (you can pipeline the outputs and inputs of the multipliers,and also pipeline the tree adder stage)

    The output for a generic fir filter

    y[n] = b[0]*x[0] + b[1]*x[1] + ... b[n]*x[n];

    you can pipeline the outputs of b[0]*x[0] to b[n]*x[n]. After, you need to add all the intermediate ouputs.

    Suppose, that your filter order is 15, so you have 16 coefficients in all.

    stage 1 :

    8 adders (b[0]*x[0] + b[1]*x[1], b[2]*x[2] + b[3]*x[3], ...), then pipeline

    stage 2 : 4 adders

    stage 3 : 2 adders

    stage 4 : 1 adder

    The number of adder stages stages is ceil(log2(order+1)). Don't forget to pipeline after each stage.

    If you pipelined the input before the multiplier, you'd have 1 pipeline delay, if you pipelined the multipliers outputs, you'd have another pipeline delays. Adding the pipelined adder tree would add another 4 pipeline delays.

    In total, you would have 6 pipeline delays for this implementation. This isnot usually a big deal and would give you a speed boost.

    2 - If your filter is symmetrical (or anti-symmetrical), you can get rid ofhalf the multipliers.

    y[n] = b[0]*x[0] + b[1]*x[1] + ... b[n]*x[n];

    if b[n] = b[0], b[1] = b[n-1], then you could rewrite the FIR equation as

    y[n] = b[0]*(x[0]+x[n]) + b[1]*(x[1]+x[n-1]) +

    3 - If you have access to the quantizer toolbox in Matlab you can simulate how many bits you actually need to implement the filter. You could then reduce the bitwidth of some intermediate signals and thus reduce the combinatorial delays. You don't actually need the fixed-point toolbox to perform this analysis, but it's a good tool if you have it.

    There might be some errors in my answer, but I think that the general ideasare present.
     
    Benjamin Couillard, Sep 25, 2012
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. santhosh
    Replies:
    1
    Views:
    1,114
    Mike Treseler
    Aug 21, 2003
  2. dlx to three stages

    , May 21, 2005, in forum: VHDL
    Replies:
    0
    Views:
    533
  3. Oren
    Replies:
    1
    Views:
    558
    Michael D. Ober
    Apr 29, 2007
  4. Cuthbert
    Replies:
    3
    Views:
    498
  5. capitan
    Replies:
    3
    Views:
    2,036
    capitan
    Sep 25, 2011
Loading...

Share This Page