Cross Clock Domain Control

M

M. Norton

Well, the IDE vs emacs vs vi vs writing everything out by hand in
pencil vs adjusting the universal constants to create cosmic rays that
strikes the hard disk platter in just the right way to write a one or
zero discussion has been entertaining, but though I might throw out
something more designish.

I've got a structure in this Arria II GX device where I've got a PCI
core backend bus (from an Actel) doing its thing with memory in the
Arria. Some memory regions are large enough where I just use the
altsyncram function from Quartus to implement the large scale memory.
I've got a few little discrete registers laying around where it
doesn't seem to make sense to implement large components and I thought
I'd write my own little cross-clock domain protected register. In
this case, the PCI backend is 33 MHz and this little peripheral SPI
interface is 10 MHz. The registers are supposed to be just uni-
directional -- haven't decided to implement anything more elaborate
just yet.

The problem is, I think I'm missing something kind of vital.
Especially when both clocks are incident at just about the same time
and I'm wondering if I need metastability registers in between to aid
in the data transfer. Perhaps only the request/ack registers need
metastability. Anyhow, this is what I've got and if it amuses someone
to take a look and comment (or laugh, I'm good with that too), I'd be
very interested in outside thoughts. The basic idea is that clock
domain A writes an internal register, and raises a request flag.
Clock domain B, as long as it's not being read from, sees the request
and registers the data and raises it's acknowledge flag. Clock domain
A sees the acknowledge and lowers its request. Clock domain B sees
the request rescind and lowers its acknowledge, and we're back to
normal. It seems to me like that ought to carefully lock the data
through the clock domain without anything getting dropped... but
something is still tickling at the back of my brain and I can't quite
decide if there's a problem or not.

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity xckd_reg is
generic (
WIDTH : integer := 32
);
port (
reset : in std_logic;
clk_a : in std_logic;
d_a : in std_logic_vector(WIDTH-1 downto 0);
wren_a : in std_logic;
clk_b : in std_logic;
rden_b : in std_logic;
q_b : out std_logic_vector(WIDTH-1 downto 0)
);
end entity xckd_reg;

architecture rtl of xckd_reg is
signal reg_a : std_logic_vector(WIDTH-1 downto 0);
signal req : std_logic;
signal ack : std_logic;
begin
process (clk_a, clk_b, reset)
begin
if (reset = '1') then
reg_a <= (others => '0');
req <= '0';
elsif rising_edge(clk_a) then
if (wren_a = '1' and req /= '1') then
reg_a <= d_a;
req <= '1';
elsif (req = '1' and ack = '1') then
req <= '0';
end if;
end if;

if (reset = '1') then
q_b <= (others => '0');
elsif rising_edge(clk_b) then
if (req = '1' and ack = '0' and rden_b /= '1') then
q_b <= reg_a;
ack <= '1';
elsif (req = '0' and ack = '1') then
ack <= '0';
end if;
end if;
end process;
end architecture rtl;
 
M

Mike Treseler

M. Norton said:
Clock domain
A sees the acknowledge and lowers its request. Clock domain B sees
the request rescind and lowers its acknowledge, and we're back to
normal. It seems to me like that ought to carefully lock the data
through the clock domain without anything getting dropped... but
something is still tickling at the back of my brain and I can't quite
decide if there's a problem or not.

For synthesis you will probably need two process, one for each clock.
Inputs from the opposite domain need input registers.

-- Mike Treseler
 
A

Andy

For synthesis you will probably need two process, one for each clock.
Inputs from the opposite domain need input registers.

   -- Mike Treseler

Some tools do not require separate processes, but all will require
that no resulting register is clocked by both clocks.

And those input registers' outputs need sufficiently additional slack
to account for metastable delays in settling to a final value. Usually
I add a 2nd "metastable rejection" register after the 1st input
(synchronization) register (with no logic between them) prior to
feeding the functional logic in the target clock domain.

As long as the synchronization register is not driving (either
directly or indirectly through combinatorial logic) a causal input
(e.g. a clock or async reset), the metastable rejection register is
not strictly required, but you must have some means of ensuring
sufficient additional slack in the timing to the next register(s).

Andy
 
M

M. Norton

On Apr 6, 9:24 am, Mike Treseler <[email protected]> wrote:

Just wanted to say thanks guys. I had thought it just sort of floated
away on the USENET. I had to sideline that part of the design to
spend time on getting the PCI local bus interface integrated and
working on a prototype (which went very well) and just now getting
back to the guts of the design, including the need for my little cross
clock syncing component to aid communication between the local bus and
a slow little SPI interface. I will dig into the comments here in the
morning, see if it all rattles around and lodges into appropriate
spots in my head.

Mark Norton
 
M

M. Norton

Some tools do not require separate processes, but all will require
that no resulting register is clocked by both clocks.

Well I'm fairly certain that the registers are unique. I should split
them into separate processes anyhow. It's my usual style, believe I
just got in a bit of a hurry here.
And those input registers' outputs need sufficiently additional slack
to account for metastable delays in settling to a final value. Usually
I add a 2nd "metastable rejection" register after the 1st input
(synchronization) register (with no logic between them) prior to
feeding the functional logic in the target clock domain.

As long as the synchronization register is not driving (either
directly or indirectly through combinatorial logic) a causal input
(e.g. a clock or async reset), the metastable rejection register is
not strictly required, but you must have some means of ensuring
sufficient additional slack in the timing to the next register(s).

I think from reading what you and Mike wrote, the main thing I don't
have is that the control signals need their own metastability
registers for safety. The contents of this register is just purely
for moving data between a 33 MHz domain and a 10 MHz domain, though I
think I will still add in the metastability registers for good design
practice. I don't have any latency requirements here where additional
pipeline delay for safe data handling is going to cause any trouble.
I just want to achieve safe, trouble-free data handoff.

Thanks for the thoughts, back into the editor.

Mark Norton
 
R

rickman

Well I'm fairly certain that the registers are unique. I should split
them into separate processes anyhow. It's my usual style, believe I
just got in a bit of a hurry here.



I think from reading what you and Mike wrote, the main thing I don't
have is that the control signals need their own metastability
registers for safety. The contents of this register is just purely
for moving data between a 33 MHz domain and a 10 MHz domain, though I
think I will still add in the metastability registers for good design
practice. I don't have any latency requirements here where additional
pipeline delay for safe data handling is going to cause any trouble.
I just want to achieve safe, trouble-free data handoff.

Thanks for the thoughts, back into the editor.

Mark Norton

This is getting to complicated for me to read. Clock domain crossing
is very simple. Many years ago someone showed me a circuit using
three FFs that is guaranteed to work and is adaptable to many
situations. The handshake is just a request and an acknowledge, but
does not go up and down for each transfer, it just changes state.
Then this state change is detected to generate a clock enable on the
receiver. Because the edge detection uses a couple of FFs it is
inherently metastability resolved. This design is going from the
slower clock (sysclk) to the faster clock (fstclk), but works
regardless of clock speed. You can also use the input enable as a
clock to the FF in that clock domain rather than an enable if that
suits your application better. The only requirement is that the input
is not clocking faster than the feedback timing of the loop.
Separately I had a register that used SysDataEn as a register enable
to hold the data, but that is only needed if the data might otherwise
change.

If you need a handshake back to the origin, you can add a second FF on
the source side that will register the return handshake. You can then
use the xor of the incoming and outbound signals as a "wait" signal
(or xnor as the case may be). Or add two FFs and detect the returning
edge.

-- Timing diagram
--
-- SysClk ____-----_____-----_____-----_____-----_____-----
_____-----___
-- SysDataEn ______----------____________________----------
________________
-- SysDataXmt
===============x=======================================x======
-- FstClk __--__--__--__--__--__--__--__--__--__--__--__--__--__--
__--__
-- DataEn _______________|-----------------------------|
________________
-- DataEnSync __________________|---------------------------|
_______________
-- DataEnDly ______________________|---------------------------|
___________
-- FstDataEn ___________________|---|_______________________|---|
__________
-- FstDataRcv ________________________|---------------------------|
_________

ENTITY NRZResync is
port(
SysDataEn : in std_logic;
FstDataEn : out std_logic;
);

begin

SysLogic : process (SysClk, SysRst)
begin
if (SysRst = '1') then
DataEn <= '0';
else
if (rising_edge(SysClk)) then
if ('1' = SysDataEn) then
DataEn <= not DataEnSync;
end if;
end if;
end if;
end process SysLogic;

FstLogic : process (FstClk, FstReset)
begin
if (FstReset = '1') then
DataEnSync <= '0';
DataEnDly <= '0';
elsif (rising_edge(FstClk)) then
DataEnSync <= DataEn;
DataEnDly <= DataEnSync;
end if;
end process FstLogic;

FstDataEn <= DataEnSync xor DataEnDly;

END RTL; -- NRZResync


It doesn't get much simpler than this does it?

Rick
 
M

M. Norton

It doesn't get much simpler than this does it?

I try to follow the group but then I get caught up with work efforts
and weeks go by without the proper attention. I wanted to thank you
for the above suggestion. It is indeed a very clever method and I'll
give it a shot. I had a far more complex version that I think is
probably causing some timing errors so the simpler variant is
appreciated and I'm hoping will result in a better design.

Best regards,
Mark Norton
 
R

rickman

I try to follow the group but then I get caught up with work efforts
and weeks go by without the proper attention.  I wanted to thank you
for the above suggestion.  It is indeed a very clever method and I'll
give it a shot.  I had a far more complex version that I think is
probably causing some timing errors so the simpler variant is
appreciated and I'm hoping will result in a better design.

Best regards,
Mark Norton

Cool, let us know how it turns out. Giving advice helps me too if I
know that it worked for you. I have a customer who was going to
design a large FPGA with something like 80 clock domains. I explained
this method of crossing clock domains and how he would be better
bringing all the I/O clocks into one central domain. He said he got
what I was explaining, but I haven't heard how it went. If you get
what I wrote, then my customer likely got it too.

Rick
 
A

Andy

Just beware if one reset can go active without the other going
active... I assume those two are properly synchronized (deasserting
edge only) versions of the same root reset signal.

Andy
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top