Latches in pipeline design and numeric logic

Discussion in 'VHDL' started by andyesquire@hotmail.com, Apr 6, 2005.

1. Guest

Hi All,

I'm working on a pipeline design (for Xilinx Virtex) and I'd like to
get some opinions on whether it's ok to use a latch inside a
combinatoral process squeezed between two sequential processes that
hold the pipeline registers.

As the input to the latch is registered, and the output from the logic
containing the latch isn't used until it's been registered, is this a
'good'
latch design? What other issues should be considered?

The same pipeline design without inferring a latch is over 50Mhz slower
so I'm keen to hitch my trailer to the 'latches can be great when they
are not inferred by mistake' wagon train

A simplified example is shown below, COMB_B process contains the latch
and does some trivial calculation. Whereas COMB_A does the same
calculation, but without a latch, and a lot slower.

Or is there is a way to make XST avoid inferring two lots of (a_reg -
b_reg) logic in COMB_A? It's this logic that causes the 50Mhz hit and
why I used a latch COMB_B.

Thanks,

Andy.

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

entity test_latch is
port (clk : in std_logic;
rst : in std_logic;
d_in : in std_logic;
a_in : in std_logic_vector(31 downto 0);
b_in : in std_logic_vector(31 downto 0);
x_in : in std_logic_vector(31 downto 0);
r_out : out std_logic_vector(31 downto 0)
);
end test_latch;

architecture Behavioral of test_latch is
signal a_reg,b_reg,x_reg : std_logic_vector(31 downto 0);
signal d_reg : std_logic;
signal latch : std_logic_vector(31 downto 0);
signal r : std_logic_vector(31 downto 0);
begin

SEQ1 : process (clk,rst) is
begin
if (rst ='1') then
a_reg <= (others => '0');
b_reg <= (others => '0');
x_reg <= (others => '0');
d_reg <= '0';
elsif (rising_edge(clk)) then
if (d_reg = '1') then
a_reg <= a_in;
b_reg <= b_in;
x_reg <= x_in;
d_reg <= d_in;
end if;
end if;
end process;

-- COMB_A : process (d_reg,x_reg,a_reg,b_reg) is
-- begin
-- if (d_reg = '1') then
-- if (signed(x_reg) > (signed(a_reg) - signed(b_reg))) then
-- r <= x_reg;
-- else
-- r <= std_logic_vector(signed(a_reg) - signed(b_reg));
-- end if;
-- else
-- r <= (others => '0');
-- end if;
-- end process;

COMB_B : process (x_reg,a_reg,b_reg,d_reg) is
begin
if (d_reg = '1') then
latch <= std_logic_vector(signed(a_reg) - signed(b_reg));
if (signed(x_reg) > signed(latch)) then
r <= x_reg;
else
r <= latch;
end if;
else
r <= (others => '0');
-- dont specify value for 'latch' as that's what
-- we want to infer
end if;
end process;

SEQ2 : process (clk,rst) is
begin
if (rst ='1') then
elsif (rising_edge(clk)) then
r_out <= r;
end if;
end process;

end Behavioral;

, Apr 6, 2005

2. NeoGuest

I dont understand how you can better your timings by putting a latch
inbetween and its bad for timing analysis also.The preferred way is to
go for pipelining.

Neo, Apr 6, 2005

3. Paul UiterlindenGuest

wrote:

> A simplified example is shown below, COMB_B process contains the latch
> and does some trivial calculation. Whereas COMB_A does the same
> calculation, but without a latch, and a lot slower.

Signal 'latch' is missing in the sensitivity list. This could cause
differences in simulation results (RTL versus synthesis).

> Or is there is a way to make XST avoid inferring two lots of (a_reg -
> b_reg) logic in COMB_A? It's this logic that causes the 50Mhz hit and
> why I used a latch COMB_B.

Just make it a signal, assigned by a concurrent signal assignment:

latch <= std_logic_vector(signed(a_reg) - signed(b_reg));
COMB_B : process (x_reg,a_reg,b_reg,d_reg) is
begin
if d_reg = '1' then
...

Or make it completely local (my preference) by assigning the common
expression to a variable:

COMB_B : process (x_reg,a_reg,b_reg,d_reg) is
variable a_min_b : std_logic_vector(31 downto 0);
begin
a_min_b := (others => '0'); -- Avoid latch
if (d_reg = '1') then
a_min_b := std_logic_vector(signed(a_reg) - signed(b_reg));
if (signed(x_reg) > signed(a_min_b)) then
r <= x_reg;
else
r <= a_min_b;
end if;
else
r <= (others => '0');
end if;
end process;

A further (textual) optimization would be declaring r, a_reg, b_reg,
x_reg and a_min_b as signed, in stead of std_logic_vector. This would
avoid most of the conversions. The resulting code (untested):

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

entity test_latch is
port (clk : in std_logic;
rst : in std_logic;
d_in : in std_logic;
a_in : in std_logic_vector(31 downto 0);
b_in : in std_logic_vector(31 downto 0);
x_in : in std_logic_vector(31 downto 0);
r_out : out std_logic_vector(31 downto 0)
);
end test_latch;

architecture Behavioral of test_latch is
signal a_reg,b_reg,x_reg : signed(31 downto 0);
signal d_reg : std_logic;
signal r : signed(31 downto 0);
begin

SEQ1 : process (clk, rst) is
begin
if rst ='1' then
a_reg <= (others => '0');
b_reg <= (others => '0');
x_reg <= (others => '0');
d_reg <= '0';
elsif rising_edge(clk) then
if d_reg = '1' then
a_reg <= signed(a_in);
b_reg <= signed(b_in);
x_reg <= signed(x_in);
d_reg <= d_in;
end if;
end if;
end process;

COMB_B : process (x_reg, a_reg, b_reg, d_reg) is
variable a_min_b : signed(31 downto 0);
begin
a_min_b := (others => '0'); -- Avoid latch
if d_reg = '1' then
a_min_b := a_reg - b_reg;
if x_reg > a_min_b then
r <= x_reg;
else
r <= a_min_b;
end if;
else
r <= (others => '0');
end if;
end process;

SEQ2 : process (clk, rst) is
begin
if rst ='1' then
r_out <= (others => '0'); -- This was missing (intentional?)
elsif rising_edge(clk) then
r_out <= std_logic_vector(r);
end if;
end process;

end Behavioral;

--

Paul Uiterlinden, Apr 6, 2005
4. AndyGuest

Thanks Paul, I guess a variable was what I needed all along ! I tried
it out and it gives the same path delay as the latch but the variable
is obviously much nicer.

Btw, I got into the habit of using std_logic_vector everywhere because
Xilinx XST Guide (I think) says only to use slv in port declarations,
so I was just being lazy The above code was just a example so sorry
for sloppy mistakes.

Neo - The latch itself doesn't do anything to improve the timing, it's
not clocked so it cannot reduce the number of logic levels, but in this
case it provided faster logic than the alternative code.

Andy.

Andy, Apr 6, 2005