newbie with timing problem, is adding pipeline stages the only optionto speed up?

festi.giorgio · Sep 14, 2012

i am implementing a fir filter and it has to be(in my naive opinion) very fast.
when i grow up the number of coefficient some timing problem arise, at first i have tried to add some pipeline stages and it get something better.
Without going too much in the detail.. is adding pipeline stages the only option that i have to speed up the all system?

thanks

rickman · Sep 14, 2012

i am implementing a fir filter and it has to be(in my naive opinion) very fast.
when i grow up the number of coefficient some timing problem arise, at first i have tried to add some pipeline stages and it get something better.
Without going too much in the detail.. is adding pipeline stages the only option that i have to speed up the all system?

thanks

No, there are other options.

Rick

antongunman · Sep 14, 2012

It first depends on what you mean by "very fast".

In some devices there are dedicated multipliers or MACC (e.g. DSP48) that are supposed to work up to ~400MHz-~600MHz.
Although this is true, you would have to carefully tweak your architecture,by either correctly inferring the multiplier components and cascading themwith appropriate register (pipelining) stages. Also the max achievable speed might be limited by the actual physical number of multipliers in an FPGAcolumn e.g. 7 (cascading to another column might reduce the theoretical max speed).
You could also use tools like the Core Generator (Xilinx) to generate such filters (I have not done that).

If on the other hand you do not care too much about such details, and wouldrather want to speed-up your design's timing some rules of thumb are:
- Trim your bit-width as soon as possible. (e.g. some dedicated multipliersor MACC like the DSP48 are typically SIGNED 18x25in and 48out, with a carry in of 48).
- Add pipeline stages. At the input of the multipliers, at the output, and at the adder stages. (For most applications, these latencies are consideredinsignificant).
- Consider using an "adder chain" instead of an "adder tree" for the final summation. (An adder is included in the DSP48 and can be efficiently cascaded using an adder chain).
- Use a synchronous reset??? (Depending on the manufacturer, sometimes dedicated multipliers can only be inferred if you use a sync-reset, otherwise you'll end up using the slower FPGA fabric).

For your case, it might be that you have a huge combinatorial adder tree at the output (e.g. SUM = A+B+C+D...+Y+Z), this is why when you increase the number of coefficients, the timing fails (but I’m just guessing at this point, I could not tell without looking at the code).

You do not have to know all the details and inner workings, just a bit of understanding of your current platform can help you code efficiently and in a way that the synthesizer can better understand.

You might also want to take a look at the synthesizer/platform user guides (e.g. the "XST User Guide") these sometimes contain VHDL templates for specific cases.

I hope this helps !

Robert Miles · Sep 15, 2012

i am implementing a fir filter and it has to be(in my naive opinion) veryfast.

when i grow up the number of coefficient some timing problem arise, at first i have tried to add some pipeline stages and it get something better.

Without going too much in the detail.. is adding pipeline stages the onlyoption that i have to speed up the all system?

thanks

Adding carefully controlled delays in the appropriate places is POSSIBLE, but hard to design, partly since ASIC and FPGAs seldom have any way other than clocked stages to supply delays with a well-controlled value range. This would mean longer to get the result, but usually less than pipelining.

Do ypu have a way to do more of the first part of the calculations in parallel, then combine them in another stage?

Benjamin Couillard · Sep 25, 2012

Off the top of my head here are the options

1 - pipelining (you can pipeline the outputs and inputs of the multipliers,and also pipeline the tree adder stage)

The output for a generic fir filter

y[n] = b[0]*x[0] + b[1]*x[1] + ... b[n]*x[n];

you can pipeline the outputs of b[0]*x[0] to b[n]*x[n]. After, you need to add all the intermediate ouputs.

Suppose, that your filter order is 15, so you have 16 coefficients in all.

stage 1 :

8 adders (b[0]*x[0] + b[1]*x[1], b[2]*x[2] + b[3]*x[3], ...), then pipeline

stage 2 : 4 adders

stage 3 : 2 adders

stage 4 : 1 adder

The number of adder stages stages is ceil(log2(order+1)). Don't forget to pipeline after each stage.

If you pipelined the input before the multiplier, you'd have 1 pipeline delay, if you pipelined the multipliers outputs, you'd have another pipeline delays. Adding the pipelined adder tree would add another 4 pipeline delays.

In total, you would have 6 pipeline delays for this implementation. This isnot usually a big deal and would give you a speed boost.

2 - If your filter is symmetrical (or anti-symmetrical), you can get rid ofhalf the multipliers.

y[n] = b[0]*x[0] + b[1]*x[1] + ... b[n]*x[n];

if b[n] = b[0], b[1] = b[n-1], then you could rewrite the FIR equation as

y[n] = b[0]*(x[0]+x[n]) + b[1]*(x[1]+x[n-1]) +

3 - If you have access to the quantizer toolbox in Matlab you can simulate how many bits you actually need to implement the filter. You could then reduce the bitwidth of some intermediate signals and thus reduce the combinatorial delays. You don't actually need the fixed-point toolbox to perform this analysis, but it's a good tool if you have it.

There might be some errors in my answer, but I think that the general ideasare present.

Problem with a login script, SESSION user rights and put this together so it works with the other pages and MySQL. Code examples.	2	May 5, 2023
10 Easy Steps to Speed Up Your Computer - Without Upgrading	0	Jun 8, 2009
Timing problem on (overly) complex page.	2	Jun 26, 2007
ISO: What is the "best practice" for getting error info from a pipeline?	5	Aug 23, 2007
An idea for heap allocation at near stack allocation speed	14	Feb 13, 2011
Is a Metaclass the appropriate way to solve this problem?	0	Aug 7, 2013
PATCH: Speed up direct string concatenation by 20+%!	34	Sep 28, 2006
problem with microsoft speed optimization	21	Dec 29, 2005

newbie with timing problem, is adding pipeline stages the only optionto speed up?

festi.giorgio

rickman

antongunman

Robert Miles

Benjamin Couillard

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads