Chasing Bugs in the Fog

R

rickman

I have a bug in a test fixture that is FPGA based. I had thought it was
in the software which controls it, but after many hours of chasing it
around I've concluded it must be in the FPGA code.

I didn't think it was in the VHDL because it had been simulated well and
the nature of the bug is an occasional dropped character on the receive
side. Who can't design a UART? Well, it could be in the handshake with
the state machine, but still...

So I finally got around to adding some debug signals which I would
monitor on an analyzer and guess what, the bug is gone! I *hate* when
that happens. I can change the code so the debug signals only appear
when a control register is set to enable them, but still, I don't like
this. I want to know what is causing this DURN THING!

Anyone see this happen to them before?

Oh yeah, someone in another thread (that I can't find, likely because I
don't recall the group I posted it in) suggested I add synchronizing FFs
to the serial data in. Sure enough I had forgotten to do that. Maybe
that was the fix... of course! It wasn't metastability, I bet it was
feeding multiple bits of the state machine! Durn, I never make that
sort of error. Thanks to whoever it was that suggested the obvious that
I had forgotten.
 
R

Rob Gaddi

So I finally got around to adding some debug signals which I would
monitor on an analyzer and guess what, the bug is gone! I *hate* when
that happens. I can change the code so the debug signals only appear
when a control register is set to enable them, but still, I don't like
this. I want to know what is causing this DURN THING!

Anyone see this happen to them before?

Oh yeah, someone in another thread (that I can't find, likely because I
don't recall the group I posted it in) suggested I add synchronizing FFs
to the serial data in. Sure enough I had forgotten to do that. Maybe
that was the fix... of course! It wasn't metastability, I bet it was
feeding multiple bits of the state machine! Durn, I never make that
sort of error. Thanks to whoever it was that suggested the obvious that
I had forgotten.

Not metastability, a race condition. Asynchronous external input
headed to multiple clocked elements, each of which it reaches via a
different path with a different delay.

When you added debugging signals you changed the netlist, which changed
the place and route, making unpredictable changes to those delays. In
this case, it happened to push it into a place where _as far as you
tested_, it seems happy. But it's still unsafe, because as you change
other parts of the design, the P&R of that section will still change
anyhow, and you start getting my favorite situation, the problem that
comes and goes based on entirely unrelated factors.

The fix you fixed fixes it. When you resynchronized it on the same
clock as you're running around the rest of the logic, you forced that
path to become timing constrained. As such, the P&R takes it upon
itself to make sure that the timing of that route is irrelevant with
respect to the clock period, and your problem goes away for good.
 
R

rickman

Not metastability, a race condition. Asynchronous external input
headed to multiple clocked elements, each of which it reaches via a
different path with a different delay.

When you added debugging signals you changed the netlist, which changed
the place and route, making unpredictable changes to those delays.

No, when changing the debug output I added the synchronization FFs which
fixed the problem.

My point was that when the other poster suggested that I need to sync to
the clock I mistook that for metastability forgetting that the input
went to multiple sections of logic. So actually I made the same mistake
twice... lol

In
this case, it happened to push it into a place where _as far as you
tested_, it seems happy. But it's still unsafe, because as you change
other parts of the design, the P&R of that section will still change
anyhow, and you start getting my favorite situation, the problem that
comes and goes based on entirely unrelated factors.

The fix you fixed fixes it. When you resynchronized it on the same
clock as you're running around the rest of the logic, you forced that
path to become timing constrained. As such, the P&R takes it upon
itself to make sure that the timing of that route is irrelevant with
respect to the clock period, and your problem goes away for good.

Just to make sure of what was what (it has been two years since I last
worked with this design) I pulled the FFs out and added back just one.
Sure enough the bug reappears with no FFs, but goes away with just one.
The added debug info available allowed me to see exactly the error and
sure enough, when a start bit comes in there is a chance that the two
counters are not properly set and the error shows up in the center of
the bit where the current contents of the shift register are moved into
the holding register as a new char.

I guess what most likely happened is that when I wrote the UART code I
assumed the sync FFs would be external and when I wrote the wrapper code
I assumed the FFs were inside the UART. In other words, I didn't have a
proper spec and never gave this problem proper consideration.

I will revisit this design and look at the other inputs. No reason to
assume I didn't make the same mistake elsewhere.
 
N

Nicolas Matringe

Le 18/06/2013 23:45, rickman a écrit :
I guess what most likely happened is that when I wrote the UART code I
assumed the sync FFs would be external and when I wrote the wrapper code
I assumed the FFs were inside the UART. In other words, I didn't have a
proper spec and never gave this problem proper consideration.

Several years ago a young engineer reused my long proven UART code and
modified it, carelessly removing the synchronizing FF. He came to see me
and complained that my UART didn't work, it hung after some
unpredictable time.
I thought for a few minutes, guessed he probably had removed the FF and
fixed his problem right away.

Nicolas
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top