Hi Jerker,
From reading your various postings, I believe the summary is:
You have a state machine.
It gets into illegal states.
It does it about 1 out of 10 times on startup.
You do not have an async reset to the FSM.
Your async inputs to the state machine go through a single flipflop.
You are investigating ways of detecting illegal states and want to get back to a valid state.
You specify initial values (in VHDL).
Your static timing analysis indicates no problems.
Your clock frequency is 20MHz.
Hi all! I'd like to once again bring up the subject of state machines
running into illegal states (illegal in the sense that the state vector does
not correspond to any of the states defined in the VHDL code), ...
1. Most discussions cover how to recover from illegal states, but few cover
how it actually happens. ...
2. How can I force Xilinx XST (6.2 SP3) to produce a safe FSM that recovers
from an illegal state? ...
/Jerker
In summary from other postings:
This might be metastables.
This might be a timing problem.
There is an asyc reset, which occurs when your chip goes active.
(FPGAs and CPLDs do this differently, but the effect is similar)
There are various noise sources that could cause this:
kai: "Internal noise coupling in the chip (crosstalk), power drops, alpha
particles, not properly double-sync'ing an async signal before using
it in two different places ... the list goes on!
Phil Hays wrote: "You do have an asynchronous reset, you just didn't know that
you did. When a Xilinx FPGA finishes the program download, it has all
initial values held until an internal signal is released. This release
is asynchronous to your clock. To avoid problems with this add a counter
that is reset to all zeros. Until that counter counts to 15, keep the
state machine in the initial state."
Rickman wrote: "Figure out what is wrong and deal with the cause of the problem."
You wrote: "I doubt that it's about static timing in my case since my clock is 20
MHz, and XST's post-layout static timing analysis doesn't complain.
Metastability could be an issue, but it's strange that it happens so
often. On one particular design, it happens about once every ten times i
startup the system. All inputs are synchronized with one FF each, but
I'll try adding a second one to see if it helps."
Here is my analysis:
Trying to change your design to get out of illegal states is nearly pointless,
since
A) it is hard to do
B) the tools work against you
C) you may not catch all possible cases
D) by the time you detect it, damage has already been done
E) if the cause is gross signal integrity problems such as unreliable power, then
you FSM is the least of your problems.
(there are exceptions to this, such as remote systems (no one to push the reset
button, ultra high reliability systems (tolerates rare alpha particle upsets) )
Rickmans quote above is spot on.
Since this happens 10% of the time in a system at 20MHz, this is not metastability.
If you want to learn more about metastability, this is my favorite URL:
http://www.fpga-faq.com/FAQ_Pages/0017_Tell_me_about_metastables.htm
Even though your problem is not metastability, once your current problem is fixed,
the much rarer problem of metastability may cause problems. A double synchronizer
on all your async inputs is cheap insurance.
You wrote: "I doubt that it's about static timing in my case since my clock
is 20 MHz, and XST's post-layout static timing analysis doesn't complain."
Your assertion that static timing analysis indicates that there are no problems
is insufficient. I have seen far too many designs by engineer that proudly show
the static timing report showing that there are no errors, but they have not
generated the "unconstrained paths" report. The static timing analyzer tells you
that of the paths you have constrained, these all meet timing, but the delay on
the unconstrained paths is unbounded. You need to identify all unconstrained
paths and either be able to explain why they dont need a timing constraint
(such as a push button input), or add constraints so that the paths are covered.
Phil Hays' quote above is almost certainly identifying your problem, and gives
a fine solution. Let me expand on it. The problem is that when the chip goes
active, you have logic signals that go into the state machine that cause it to
transition to a next state immediately. Since the going active is asynchronous
to the 20MHz clock, you may have anywhere from 50 to 0 ns to do this. This
represents a race condition (not a metastability), and in 10% of your startups,
you lose the race. As others have described, not all parts of the state machine
have enough time (when the available time is less than 50 ns) to transition to
the next valid state. Phil's (and Philip's) solution is to hold off the first
transitions of the state machine until a few cycles after the chip goes active.
Phil's solution suggests 15 cycles, probably anything over 4 would be rock solid.
As an example, I usually use a 4 bit shift register to do this. Either way, it
works like this:
The hold-off circuit is initialized to 0000 (counter or shifter). The release of
reset (chip going active) allows either to start changing. Phil's counter counts
up, in my case the D input to the shifter is tied high, so I start to shift in
'1's (0000->1000->1100->1110->1111->1111 ...)
For Phil's counter, you probably would want to make it dead-end at 1111, and not
wrap back to 0000.
Neither the counter or the shifter can't get to their terminal state other than
through multiple cycles of the clock.
In your FSM, the initial state is set in your VHDL. Depending on what your FSM
does the transition out of this state may be to one or more states. For ALL of
these exit conditions, you need to add an additional signal, the detection of
the terminal state of the hold-off circuit. The result is that the FSM cant
leave the initial state until several clocks after the chip goes active, because
the same logic that initializes the FSM, is also holding the hold-off circuit
in its initial state. By the time the FSM is allowed to make its first
transition, it will have stable input signals (through the double synchronizers)
and it will have a full cycle to do its transition.
Additional answers to some of your other questions:
"But your point is still interesting in case I would need to introduce an
asynchronous reset some day. Does that mean one should avoid them if illegal
states are a concern?"
Yes, you should avoid async signals and resets, regardless of whether you are
concerned about illegal states. If you must have broadly used resets, then the
common recommendation is async assertion, and sync de-assertion.
You wrote: "Metastability could be an issue, but it's strange that it happens so
often. On one particular design, it happens about once every ten times i
startup the system. All inputs are synchronized with one FF each, but
I'll try adding a second one to see if it helps."
Right. The 1 in 10 occurrence rate is far to high to be metastables in a system
running at 20MHz.
Your async inputs to the state machine should have at least a double synchronizer.
(read the above URL). The double synchronizers is just good design practice.
In summary:
Add the hold-off circuit, and check the unconstrained paths report in the static
timing analyzer.
Good Luck,
Philip
Philip Freidin
Fliptronics