Hello all,
I have written, synthesized and tested (behavioral simulation) a generic adder tree that receives a bit string of arbitrary length and computes the sum of all bits. I have used a recursive function. My code is the following:
Problem is that such a combinatorial design has high latency, limiting my clock in the rest of my design. To reduce it, I would like to add pipeline stages, but I don't know how exactly to do it, given the code above. Functions are c like sequential structures so I don't know how to add flip flops. I have thought of constructing the adder tree in parts, calling the function as many times as my pipeline stages and squeeze ffs in between but I would like to know if there is a better/easier solution. Thanks in advance.
ps: I'm using Xilinx ISE.
I have written, synthesized and tested (behavioral simulation) a generic adder tree that receives a bit string of arbitrary length and computes the sum of all bits. I have used a recursive function. My code is the following:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
entity AdderTree is
Generic (InputStringSize : integer := 80;
OutputWidth : integer := 7);
Port ( BitString : in STD_LOGIC_VECTOR (InputStringSize-1 downto 0);
Sum : out STD_LOGIC_VECTOR (OutputWidth-1 downto 0);
Clk : in STD_LOGIC);
end AdderTree;
architecture Behavioral of AdderTree is
function adder_tree (BitStr : unsigned; iter : integer) return unsigned is
variable result: unsigned (OutputWidth-iter-1 downto 0):=(others=>'0');
variable bitstring_tmp : unsigned(BitStr'length-1 downto 0):=(others=>'0');
begin
bitstring_tmp:=BitStr;
if BitStr'length = 2 then
result:=('0' & bitstring_tmp(1 downto 1)) + ('0' & bitstring_tmp(0 downto 0));
elsif BitStr'length = 1 then
result:=bitstring_tmp(0 downto 0);
else
if (BitStr'length/=3) then
result:=('0' & adder_tree(bitstring_tmp(BitStr'length-1 downto BitStr'length/2),iter+1)) + ('0' & adder_tree(bitstring_tmp((BitStr'length/2)-1 downto 0),iter+1));
else
result:=(adder_tree(bitstring_tmp(BitStr'length-1 downto BitStr'length/2),iter)) + ('0' & adder_tree(bitstring_tmp((BitStr'length/2)-1 downto 0),iter+1));
end if;
end if;
return result;
end adder_tree;
begin
Sum<=std_logic_vector(adder_tree(unsigned(BitString),0));
end Behavioral;
Problem is that such a combinatorial design has high latency, limiting my clock in the rest of my design. To reduce it, I would like to add pipeline stages, but I don't know how exactly to do it, given the code above. Functions are c like sequential structures so I don't know how to add flip flops. I have thought of constructing the adder tree in parts, calling the function as many times as my pipeline stages and squeeze ffs in between but I would like to know if there is a better/easier solution. Thanks in advance.
ps: I'm using Xilinx ISE.