It first depends on what you mean by "very fast".
In some devices there are dedicated multipliers or MACC (e.g. DSP48) that are supposed to work up to ~400MHz-~600MHz.
Although this is true, you would have to carefully tweak your architecture,by either correctly inferring the multiplier components and cascading themwith appropriate register (pipelining) stages. Also the max achievable speed might be limited by the actual physical number of multipliers in an FPGAcolumn e.g. 7 (cascading to another column might reduce the theoretical max speed).
You could also use tools like the Core Generator (Xilinx) to generate such filters (I have not done that).
If on the other hand you do not care too much about such details, and wouldrather want to speed-up your design's timing some rules of thumb are:
- Trim your bit-width as soon as possible. (e.g. some dedicated multipliersor MACC like the DSP48 are typically SIGNED 18x25in and 48out, with a carry in of 48).
- Add pipeline stages. At the input of the multipliers, at the output, and at the adder stages. (For most applications, these latencies are consideredinsignificant).
- Consider using an "adder chain" instead of an "adder tree" for the final summation. (An adder is included in the DSP48 and can be efficiently cascaded using an adder chain).
- Use a synchronous reset??? (Depending on the manufacturer, sometimes dedicated multipliers can only be inferred if you use a sync-reset, otherwise you'll end up using the slower FPGA fabric).
For your case, it might be that you have a huge combinatorial adder tree at the output (e.g. SUM = A+B+C+D...+Y+Z), this is why when you increase the number of coefficients, the timing fails (but I’m just guessing at this point, I could not tell without looking at the code).
You do not have to know all the details and inner workings, just a bit of understanding of your current platform can help you code efficiently and in a way that the synthesizer can better understand.
You might also want to take a look at the synthesizer/platform user guides (e.g. the "XST User Guide") these sometimes contain VHDL templates for specific cases.
I hope this helps !
