Why is this code persnickety?

K

KK6GM

I'm a software guy with a decent hardware understanding, just
beginning to teach myself VHDL and FPGAs. I've got a Nexys2 board and
in playing with the multiplexed 7-seg displays I've run across a
situation where my code to delay and switch between digits runs for
some count values and doesn't run for others. Here's the code section
in question:

prescale_to_100us: process (clk_50M)
variable counter : natural range 0 to 2500 := 1;
begin
if (rising_edge(clk_50M)) then
counter := counter - 1;
if (counter = 0) then
counter := 2500;
clk_100us <= not clk_100us;
end if;
end if;
end process;


drive_digits: process (clk_100us, sw)
constant Scan_Count : natural := 300;
variable counter : natural range 0 to Scan_Count := 1;
variable digs7_var : std_logic_vector (3 downto 0) := "1110";
begin
if (rising_edge(clk_100us)) then
counter := counter - 1;
if (counter = 0) then
case sw is
when "001" => counter := Scan_Count / 8;
when "010" => counter := Scan_Count / 4;
when "100" => counter := Scan_Count / 2;
when others => counter := Scan_Count;
end case;

case digs7_var is
when "1110" => digs7_var := "1101";
when "1101" => digs7_var := "1011";
when "1011" => digs7_var := "0111";
when "0111" => digs7_var := "1110";
when others => digs7_var := "1110";
end case;

digs7 <= digs7_var;
end if;
end if;
end process;

This code consistently works for e.g. Scan_Count = 300, 302 or 305,
but fails (no multiplexing, just a solid 0 in the LSD position) for
301, 303 and 304. Nothing else is changed, just the value of
Scan_Count. Naturally I assume that the problem lies in my code, but
I have no idea what it might be. Any ideas? Thanks.

BTW, do I need "sw" in the sensitivity list for drive_digits?

Mike
 
J

Jonathan Bromley

I'm a software guy with a decent hardware understanding, just
beginning to teach myself VHDL and FPGAs. I've got a Nexys2 board and
in playing with the multiplexed 7-seg displays I've run across a
situation where my code to delay and switch between digits runs for
some count values and doesn't run for others. Here's the code section
in question:

prescale_to_100us: process (clk_50M)
variable counter : natural range 0 to 2500 := 1;
begin
if (rising_edge(clk_50M)) then
counter := counter - 1;
if (counter = 0) then
counter := 2500;
clk_100us <= not clk_100us;
end if;
end if;
end process;


drive_digits: process (clk_100us, sw)
constant Scan_Count : natural := 300;
variable counter : natural range 0 to Scan_Count := 1;
variable digs7_var : std_logic_vector (3 downto 0) := "1110";
begin
if (rising_edge(clk_100us)) then
counter := counter - 1;
if (counter = 0) then
......... reload scan counter, update multiplex output
end if;
end if;
end process;

This code consistently works for e.g. Scan_Count = 300, 302 or 305,
but fails (no multiplexing, just a solid 0 in the LSD position) for
301, 303 and 304. Nothing else is changed, just the value of
Scan_Count. Naturally I assume that the problem lies in my code, but
I have no idea what it might be. Any ideas? Thanks.

Your code looks broadly OK to me (have you simulated it?).
It's good that you hide locals inside the processes as variables.
Symbolic names for the various constants would help readability
and maintainability, I'd suggest, and (see your latter question)
no, you don't want 'sw' in the clocked process's sensitivity
list - but it shouldn't do any harm, because the whole body
of the process runs only on rising_edge.

Carefully check out the synthesis report. My first suspicion
would be that the clk_100us signal is not being routed on a
clock net. Possibly it hasn't been recognized as a clock
because of that bad sensitivity list - I don't know - but
if the tools didn't recognise the clock then the results
you see are not too surprising because you'll get clock
skew between the various FFs of the second process's
counter and this can give rise to hold time violations.

As a sanity check on this, consider reworking the clock
divider to generate a clock enable so that you can clock
both processes from the common 50MHz.

Also, note that your "if counter=0" tests (in both
processes) are testing NEXT-STATE values of the counter,
and therefore will give slower logic than if you had
tested "counter" before decrementing it. Try this:

prescale_to_100us: process (clk_50M)
variable counter: natural range 0 to 2500 := 1;
begin
if counter=1 then -- test first, tests current state
counter := 2500;
enable_100us <= '1';
else
counter := counter-1;
enable_100us <= '0';
end if;
end process;

drive_digits: process (clk_50M)
constant Scan_Count : natural := 300;
variable counter : natural range 0 to Scan_Count := 1;
variable digs7_var : std_logic_vector (3 downto 0) := "1110";
begin
if (rising_edge(clk_50M)) then
if (enable_100us) then
-- same logic as before.

Don't try to roll the two "if" tests into a single test;
some synthesis tools are rather picky about keeping the
rising_edge test in a separate statement.
BTW, do I need "sw" in the sensitivity list for drive_digits?

No, definitely not - see above.

Hope this helps, but I fear it may not...
 
K

KK6GM

........ reload scan counter, update multiplex output



Your code looks broadly OK to me (have you simulated it?).  
It's good that you hide locals inside the processes as variables.  
Symbolic names for the various constants would help readability
and maintainability, I'd suggest, and (see your latter question)
no, you don't want 'sw' in the clocked process's sensitivity
list - but it shouldn't do any harm, because the whole body
of the process runs only on rising_edge.

Carefully check out the synthesis report.  My first suspicion
would be that the clk_100us signal is not being routed on a
clock net.  Possibly it hasn't been recognized as a clock
because of that bad sensitivity list - I don't know - but
if the tools didn't recognise the clock then the results
you see are not too surprising because you'll get clock
skew between the various FFs of the second process's
counter and this can give rise to hold time violations.

As a sanity check on this, consider reworking the clock
divider to generate a clock enable so that you can clock
both processes from the common 50MHz.

Also, note that your "if counter=0" tests (in both
processes) are testing NEXT-STATE values of the counter,
and therefore will give slower logic than if you had
tested "counter" before decrementing it.  Try this:

  prescale_to_100us: process (clk_50M)
    variable counter: natural range 0 to 2500 := 1;
  begin
    if counter=1 then  -- test first, tests current state
      counter := 2500;
      enable_100us <= '1';
    else
      counter := counter-1;
      enable_100us <= '0';
    end if;
  end process;

  drive_digits: process (clk_50M)
    constant Scan_Count : natural := 300;
    variable counter : natural range 0 to Scan_Count := 1;
    variable digs7_var : std_logic_vector (3 downto 0) := "1110";
  begin
    if (rising_edge(clk_50M)) then
      if (enable_100us) then
        -- same logic as before.

Don't try to roll the two "if" tests into a single test;
some synthesis tools are rather picky about keeping the
rising_edge test in a separate statement.


No, definitely not - see above.

Hope this helps, but I fear it may not...

That did indeed solve the problem - many thanks! Clearly I need to
get a better understanding of the proper idioms for good, reliable
clocking. Guess I'm lucky to have stumbled across this whole issue
early on.

Mike
 
J

jacko

In some of my code I had a divided clock. I found that it was easier
to detect and latch the end count 1 cycle before into a single bit and
then use that single bit on the next cycle to do the clock inversion.
This speeded up the design by over 4 times. I think this is because
the clock enable of the divided clock FF was the output of an FF
instead of 2 layers of 4LUTs.
 
A

Andy

As has been mentioned, your code compares count to zero after it has
been decremented, meaning both combinatorial circuits must settle in
series before you get a valid answer, and that takes more time than
would otherwise be necessary.

There are a couple of ways around this. The first is to check and then
only decrement if safe to do so:

if counter = 0 then -- parallel comparison of each bit
counter := half_100us_period; -- use a constant (hint: 2500 - 1)!
clk_100us <= not clk_100us;
else
counter := counter - 1;
end if;

This method compares the output of the count register to zero, before
it is decremented, thus avoiding the serial delays of decrement and
compare within one clock cycle.

The second is related to this, but uses a trait of integer arithmetic
to avoid the parallel comparison of the count to zero:

if count - 1 < 0 then -- checked the carry bit
count := half_100us_period;
clk_100us <= not clk_100us;
else
count := count - 1; -- share decrementer with the comparison
end if;

The above ONLY WORKS WITH INTEGER SUBTYPE COUNTERS! No matter what the
integer subtype range of the operands, the - operator (or any other
integer operator) returns a full integer range result. That is not an
error. It is only an error if you try to store the result in a
subtype, and it is outside of that subtype's range. So, even though
count cannot be less than zero, count - 1 can be less than zero, and
can be used to check the carry bit.

What's more, the synthesis tool will share the two decrementers (the
one for the comparison, and the one for the counting action),
resulting in only one decrementer being produced.

In real life, if the counter is plenty fast (and/or the clock is
plenty slow), use the method that is most easily understood by the
reader, that also still meets timing. That reader could be a reviewer,
a verification engineer, a test engineer, or another developer that
needs to update the design, and could even be you in another 6 weeks
after you have forgotten why you did it the way you did.

Andy
 
K

KK6GM

As has been mentioned, your code compares count to zero after it has
been decremented, meaning both combinatorial circuits must settle in
series before you get a valid answer, and that takes more time than
would otherwise be necessary.

Right, this is obvious now that it has been pointed out to me. :)

To ask a question related to my OP, what is the preferred method (or
methods) of driving FFs at a rate derived from a master clock? For
example, suppose I want to clock some FFs at 1kHz from a 50MHz clock.
Is it to use the 1-clock-out-of-N enable as was discussed above? Are
there other "correct" ways as well? I guess the question is, do FFs
being updated at rates of MasterClock/N normally have their clock
inputs driven by MasterClock with suitable enables, or is it common to
actually derive other clocks from MasterClock?
 
K

KJ

To ask a question related to my OP, what is the preferred method (or
methods) of driving FFs at a rate derived from a master clock?  

In FPGAs or CPLDs, generate a clock enable. In ASICs, an actual gated
clock might be considered appropriate. The reason gated clocks are
generally not acceptable in FPGAs or CPLDs is because then the design
becomes sensitive to *minimum* propogation delays through logic due to
the inevitable need to reliably transfer signals from one clock domain
to the other, but there are no resources to control this delay. This
will create hold time issues where the signals from one clock domain
can switch prior to and be accepted by the gated clock one clock cycle
'early' or can cause incorrect logic because hold times were
violated. While a timing analysis will reveal the problem, the only
recourse is to change a random number seed and try to re-route and get
something different and hope. Every iteration of the design will
require this same sort of hand holding...most people have better
things to do with their time.
For
example, suppose I want to clock some FFs at 1kHz from a 50MHz clock.
Is it to use the 1-clock-out-of-N enable as was discussed above?  
Yes

Are
there other "correct" ways as well?  

If the division is fixed and known at design time, then there are
better forms of counters. As an example, an LFSR counter would
minimize the amount of logic needed to create the counter.

Another technique is to break the counter up into smaller pieces to
improve the clock cycle performance. Consider a 24 bit counter.
Rather than having to decode all 24 bits to decide that the count has
rolled over, another technique would be to break the counter up into
two (or three) sections of 8-12 bits. When the 'more significant'
bits sections counts down it sets a flop. Then the final rollover
occurs when all higher order sections have their bit set, and the
'least significant' bit section also counts down to 0.
I guess the question is, do FFs
being updated at rates of MasterClock/N normally have their clock
inputs driven by MasterClock with suitable enables,

In CPLDs and FPGAs...yes
or is it common to
actually derive other clocks from MasterClock?

In ASICs...yes (clock enables also work here also)

Kevin Jennings
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,053
Latest member
billing-software

Latest Threads

Top