Chasing Bugs in the Fog

Discussion in 'VHDL' started by rickman, Jun 18, 2013.

  1. rickman

    rickman Guest

    I have a bug in a test fixture that is FPGA based. I had thought it was
    in the software which controls it, but after many hours of chasing it
    around I've concluded it must be in the FPGA code.

    I didn't think it was in the VHDL because it had been simulated well and
    the nature of the bug is an occasional dropped character on the receive
    side. Who can't design a UART? Well, it could be in the handshake with
    the state machine, but still...

    So I finally got around to adding some debug signals which I would
    monitor on an analyzer and guess what, the bug is gone! I *hate* when
    that happens. I can change the code so the debug signals only appear
    when a control register is set to enable them, but still, I don't like
    this. I want to know what is causing this DURN THING!

    Anyone see this happen to them before?

    Oh yeah, someone in another thread (that I can't find, likely because I
    don't recall the group I posted it in) suggested I add synchronizing FFs
    to the serial data in. Sure enough I had forgotten to do that. Maybe
    that was the fix... of course! It wasn't metastability, I bet it was
    feeding multiple bits of the state machine! Durn, I never make that
    sort of error. Thanks to whoever it was that suggested the obvious that
    I had forgotten.

    --

    Rick
    rickman, Jun 18, 2013
    #1
    1. Advertising

  2. rickman

    Rob Gaddi Guest

    On Mon, 17 Jun 2013 20:00:01 -0400
    rickman <> wrote:

    > So I finally got around to adding some debug signals which I would
    > monitor on an analyzer and guess what, the bug is gone! I *hate* when
    > that happens. I can change the code so the debug signals only appear
    > when a control register is set to enable them, but still, I don't like
    > this. I want to know what is causing this DURN THING!
    >
    > Anyone see this happen to them before?
    >
    > Oh yeah, someone in another thread (that I can't find, likely because I
    > don't recall the group I posted it in) suggested I add synchronizing FFs
    > to the serial data in. Sure enough I had forgotten to do that. Maybe
    > that was the fix... of course! It wasn't metastability, I bet it was
    > feeding multiple bits of the state machine! Durn, I never make that
    > sort of error. Thanks to whoever it was that suggested the obvious that
    > I had forgotten.
    >
    > --
    >
    > Rick


    Not metastability, a race condition. Asynchronous external input
    headed to multiple clocked elements, each of which it reaches via a
    different path with a different delay.

    When you added debugging signals you changed the netlist, which changed
    the place and route, making unpredictable changes to those delays. In
    this case, it happened to push it into a place where _as far as you
    tested_, it seems happy. But it's still unsafe, because as you change
    other parts of the design, the P&R of that section will still change
    anyhow, and you start getting my favorite situation, the problem that
    comes and goes based on entirely unrelated factors.

    The fix you fixed fixes it. When you resynchronized it on the same
    clock as you're running around the rest of the logic, you forced that
    path to become timing constrained. As such, the P&R takes it upon
    itself to make sure that the timing of that route is irrelevant with
    respect to the clock period, and your problem goes away for good.

    --
    Rob Gaddi, Highland Technology -- www.highlandtechnology.com
    Email address domain is currently out of order. See above to fix.
    Rob Gaddi, Jun 18, 2013
    #2
    1. Advertising

  3. rickman

    rickman Guest

    On 6/17/2013 8:14 PM, Rob Gaddi wrote:
    > On Mon, 17 Jun 2013 20:00:01 -0400
    > rickman<> wrote:
    >
    >> So I finally got around to adding some debug signals which I would
    >> monitor on an analyzer and guess what, the bug is gone! I *hate* when
    >> that happens. I can change the code so the debug signals only appear
    >> when a control register is set to enable them, but still, I don't like
    >> this. I want to know what is causing this DURN THING!
    >>
    >> Anyone see this happen to them before?
    >>
    >> Oh yeah, someone in another thread (that I can't find, likely because I
    >> don't recall the group I posted it in) suggested I add synchronizing FFs
    >> to the serial data in. Sure enough I had forgotten to do that. Maybe
    >> that was the fix... of course! It wasn't metastability, I bet it was
    >> feeding multiple bits of the state machine! Durn, I never make that
    >> sort of error. Thanks to whoever it was that suggested the obvious that
    >> I had forgotten.
    >>
    >> --
    >>
    >> Rick

    >
    > Not metastability, a race condition. Asynchronous external input
    > headed to multiple clocked elements, each of which it reaches via a
    > different path with a different delay.
    >
    > When you added debugging signals you changed the netlist, which changed
    > the place and route, making unpredictable changes to those delays.


    No, when changing the debug output I added the synchronization FFs which
    fixed the problem.

    My point was that when the other poster suggested that I need to sync to
    the clock I mistook that for metastability forgetting that the input
    went to multiple sections of logic. So actually I made the same mistake
    twice... lol


    > In
    > this case, it happened to push it into a place where _as far as you
    > tested_, it seems happy. But it's still unsafe, because as you change
    > other parts of the design, the P&R of that section will still change
    > anyhow, and you start getting my favorite situation, the problem that
    > comes and goes based on entirely unrelated factors.
    >
    > The fix you fixed fixes it. When you resynchronized it on the same
    > clock as you're running around the rest of the logic, you forced that
    > path to become timing constrained. As such, the P&R takes it upon
    > itself to make sure that the timing of that route is irrelevant with
    > respect to the clock period, and your problem goes away for good.


    Just to make sure of what was what (it has been two years since I last
    worked with this design) I pulled the FFs out and added back just one.
    Sure enough the bug reappears with no FFs, but goes away with just one.
    The added debug info available allowed me to see exactly the error and
    sure enough, when a start bit comes in there is a chance that the two
    counters are not properly set and the error shows up in the center of
    the bit where the current contents of the shift register are moved into
    the holding register as a new char.

    I guess what most likely happened is that when I wrote the UART code I
    assumed the sync FFs would be external and when I wrote the wrapper code
    I assumed the FFs were inside the UART. In other words, I didn't have a
    proper spec and never gave this problem proper consideration.

    I will revisit this design and look at the other inputs. No reason to
    assume I didn't make the same mistake elsewhere.

    --

    Rick
    rickman, Jun 18, 2013
    #3
  4. Le 18/06/2013 23:45, rickman a écrit :

    > I guess what most likely happened is that when I wrote the UART code I
    > assumed the sync FFs would be external and when I wrote the wrapper code
    > I assumed the FFs were inside the UART. In other words, I didn't have a
    > proper spec and never gave this problem proper consideration.


    Several years ago a young engineer reused my long proven UART code and
    modified it, carelessly removing the synchronizing FF. He came to see me
    and complained that my UART didn't work, it hung after some
    unpredictable time.
    I thought for a few minutes, guessed he probably had removed the FF and
    fixed his problem right away.

    Nicolas
    Nicolas Matringe, Jun 18, 2013
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. aniket

    Vhdl Cli bugs ??

    aniket, Sep 27, 2003, in forum: VHDL
    Replies:
    0
    Views:
    510
    aniket
    Sep 27, 2003
  2. David Lozzi

    Chasing Tail, PLEASE HELP

    David Lozzi, Jan 2, 2007, in forum: ASP .Net
    Replies:
    2
    Views:
    349
    David Lozzi
    Jan 8, 2007
  3. Thomas Sondergaard

    Chasing a garbage collection bug

    Thomas Sondergaard, Sep 10, 2003, in forum: Ruby
    Replies:
    17
    Views:
    175
    Sean O'Dell
    Sep 12, 2003
  4. Josef 'Jupp' Schugt

    Still use 'ruby-bugs' for Ruby bugs?

    Josef 'Jupp' Schugt, Nov 4, 2004, in forum: Ruby
    Replies:
    2
    Views:
    155
    Tom Copeland
    Nov 4, 2004
  5. sbk
    Replies:
    2
    Views:
    167
Loading...

Share This Page