Chasing Bugs in the Fog

Discussion in 'VHDL' started by rickman, Jun 18, 2013.

  1. rickman

    rickman Guest

    I have a bug in a test fixture that is FPGA based. I had thought it was
    in the software which controls it, but after many hours of chasing it
    around I've concluded it must be in the FPGA code.

    I didn't think it was in the VHDL because it had been simulated well and
    the nature of the bug is an occasional dropped character on the receive
    side. Who can't design a UART? Well, it could be in the handshake with
    the state machine, but still...

    So I finally got around to adding some debug signals which I would
    monitor on an analyzer and guess what, the bug is gone! I *hate* when
    that happens. I can change the code so the debug signals only appear
    when a control register is set to enable them, but still, I don't like
    this. I want to know what is causing this DURN THING!

    Anyone see this happen to them before?

    Oh yeah, someone in another thread (that I can't find, likely because I
    don't recall the group I posted it in) suggested I add synchronizing FFs
    to the serial data in. Sure enough I had forgotten to do that. Maybe
    that was the fix... of course! It wasn't metastability, I bet it was
    feeding multiple bits of the state machine! Durn, I never make that
    sort of error. Thanks to whoever it was that suggested the obvious that
    I had forgotten.
    rickman, Jun 18, 2013
    1. Advertisements

  2. rickman

    Rob Gaddi Guest

    Not metastability, a race condition. Asynchronous external input
    headed to multiple clocked elements, each of which it reaches via a
    different path with a different delay.

    When you added debugging signals you changed the netlist, which changed
    the place and route, making unpredictable changes to those delays. In
    this case, it happened to push it into a place where _as far as you
    tested_, it seems happy. But it's still unsafe, because as you change
    other parts of the design, the P&R of that section will still change
    anyhow, and you start getting my favorite situation, the problem that
    comes and goes based on entirely unrelated factors.

    The fix you fixed fixes it. When you resynchronized it on the same
    clock as you're running around the rest of the logic, you forced that
    path to become timing constrained. As such, the P&R takes it upon
    itself to make sure that the timing of that route is irrelevant with
    respect to the clock period, and your problem goes away for good.
    Rob Gaddi, Jun 18, 2013
    1. Advertisements

  3. rickman

    rickman Guest

    No, when changing the debug output I added the synchronization FFs which
    fixed the problem.

    My point was that when the other poster suggested that I need to sync to
    the clock I mistook that for metastability forgetting that the input
    went to multiple sections of logic. So actually I made the same mistake
    twice... lol

    Just to make sure of what was what (it has been two years since I last
    worked with this design) I pulled the FFs out and added back just one.
    Sure enough the bug reappears with no FFs, but goes away with just one.
    The added debug info available allowed me to see exactly the error and
    sure enough, when a start bit comes in there is a chance that the two
    counters are not properly set and the error shows up in the center of
    the bit where the current contents of the shift register are moved into
    the holding register as a new char.

    I guess what most likely happened is that when I wrote the UART code I
    assumed the sync FFs would be external and when I wrote the wrapper code
    I assumed the FFs were inside the UART. In other words, I didn't have a
    proper spec and never gave this problem proper consideration.

    I will revisit this design and look at the other inputs. No reason to
    assume I didn't make the same mistake elsewhere.
    rickman, Jun 18, 2013
  4. Le 18/06/2013 23:45, rickman a écrit :
    Several years ago a young engineer reused my long proven UART code and
    modified it, carelessly removing the synchronizing FF. He came to see me
    and complained that my UART didn't work, it hung after some
    unpredictable time.
    I thought for a few minutes, guessed he probably had removed the FF and
    fixed his problem right away.

    Nicolas Matringe, Jun 18, 2013
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.