More synthesis myths?

T

Tricky

I just overheard the following (or thereabouts)

using the following template:
process(clk, en)
begin
if en = '1' then
if rising_edge(clk) then
d <= b;
end if;
end if;
end process;

is better than the "normal" way

process(clk)
begin
if rising_edge(clk) then
if en = '1' then
c <= a;
end if;
end if;
end process;

because the 2nd can produce latches where the clock is gated with
enable? has this ever been the case? running either through quartus
produces the same (expected) thing - a d-type with enable.

Are there some other legends out there that still influence design
today? were they really a problem, have they actually been fixed?
 
J

JimLewis

The normal way is the only way I code.
process(clk)
begin
if rising_edge(clk) then
if en = '1' then
c <= a;
end if;
end if;
end process;
Historically some ASIC tools have/had a switch that would
allow the enable (en) to be transformed into a clock gate -
to be used for low power applications. Has anyone seen a
synthesis tool that will do this transformation without a
setting? I would consider this to be an error.

WRT to the other coding template. That coding style was not
in the 1076.6-1999 RTL coding styles, but is in the 1076.6-2004
RTL coding styles. I would still be concerned that there may be
some tools (such as ASIC synthesis tools) that do not support it.
Furthermore, since the code is logically the same, I would be
concerned that any misbehaving with the "normal" coding template
would also occur with this coding template.

Cheers,
Jim
SynthWorks
 
A

Andy

A third style is:

process (clk) is
begin
if rising_edge(clk) and en = '1' then
c <= a;
end if;
end process;

Which produces a clock enable and is the behavioral equivalent of
either style. The inclusion of en in the sensitivity list accomplishes
absolutely nothing, since nothing happens unless there was rising edge
event on clk. I sure hope we don't get back into the old days when the
order of nested if-then statements indicated priority from an
implementation/timing POV.

Note it does not say "rising_edge(clk and en)", if that were even
pemissible with rising_edge(), which would directly imply a gated
clock. There are some FPGA synthesis tools that will convert clock
enables into gated clocks, but only on devices that have "enabled
clock buffers" that are "safe". But you still have to set an option
for it to do that.

The "other" form with clock and enable in the sensitivity list could
also be drain on simulation performance with large systems, since such
processes cannot be merged with others that are either not clock-
enabled, enabled in another way, and/or enabled by other signals.

Andy
 
Joined
Dec 9, 2008
Messages
88
Reaction score
0
I checked it o the Xilinx tools it also synthesizes both of the first two styles synthesize into identical circuits.
 
D

Dave Farrance

Tricky said:
I just overheard the following (or thereabouts)

using the following template:
process(clk, en)
begin
if en = '1' then
if rising_edge(clk) then
d <= b;
end if;
end if;
end process;

is better than the "normal" way

process(clk)
begin
if rising_edge(clk) then
if en = '1' then
c <= a;
end if;
end if;
end process;

because the 2nd can produce latches where the clock is gated with

Looking at that, I'd guess that you meant 1st not 2nd.
 
K

K

I just overheard the following (or thereabouts)

using the following template:
process(clk, en)
  begin
    if en = '1' then
      if rising_edge(clk) then
        d <= b;
      end if;
    end if;
  end process;

is better than the "normal" way

process(clk)
  begin
    if rising_edge(clk) then
      if en = '1' then
        c <= a;
      end if;
    end if;
  end process;

because the 2nd can produce latches where the clock is gated with
enable? has this ever been the case? running either through quartus
produces the same (expected) thing - a d-type with enable.

Are there some other legends out there that still influence design
today? were they really a problem, have they actually been fixed?

The second example is the way to write it if you want a regular DFF
without asynchronous reset and with a clock enable. Usually the clock
enable is synthesised with a feedback mux. However, today most tools
have the possibility to do this with a clock gate latch instead (ie
the clock to the DFF is gated when en = '0' and the old value is
kept). I know that some tools do this by default (the FPGA tool we use
does this, and we usually turn it off to improve timing). However for
ASIC synthesis the automatic clockgating is disabled by default. We
work in a low-power process, then automatic clock gating is a simple
and safe way to save power (for a minor penalty in timing) so we use
it.

The first version you gave I'm less certain about, it doesn't match
any of the default DFF or DLAT patterns i've seen. But I guess since
the en signal is in the sensitivity list and is before the clock it
can be considered an asynchronous signal, so synthesis tools would not
try to do clock gate insertion on this since the clock gating has to
be synchonous with the clock.
 
P

Paul

Tricky said:
I just overheard the following (or thereabouts)

using the following template:
process(clk, en)
begin
if en = '1' then
if rising_edge(clk) then
d <= b;
end if;
end if;
end process;

is better than the "normal" way

process(clk)
begin
if rising_edge(clk) then
if en = '1' then
c <= a;
end if;
end if;
end process;

because the 2nd can produce latches where the clock is gated with
enable? has this ever been the case? running either through quartus
produces the same (expected) thing - a d-type with enable.

Why should 'en' be included in the sensitivity list in the first
template? It just does not make sense to me. Or does this fall under
the "or thereabouts"?
 
M

Marc Guardiani

Paul said:
Why should 'en' be included in the sensitivity list in the first
template? It just does not make sense to me. Or does this fall under
the "or thereabouts"?

As I understand processes, 'en' is in the sensitivity list because you
want the process to "run", so to speak, whenever it changes.
 
K

KJ

As I understand processes, 'en' is in the sensitivity list because you
want the process to "run", so to speak, whenever it changes.

Since the very next statement is "if rising_edge(clk) then..." the process
certainly won't be running too far...

From a synthesis perspective including (or not including) 'en' makes no
difference. From a simulation perspective, including 'en' in the
sensitivity list will make that process chew up some miniscule extra bit of
processor time, but everything will simulate exactly the same.

Kevin
 
P

Paul

Marc said:
As I understand processes, 'en' is in the sensitivity list because
you want the process to "run", so to speak, whenever it changes.

Yes, that's correct. But why on earth would you like to run the
process when 'en' changes? Functionally it just adds nothing (as KJ
rightfully explained). The only thing that is added is obfuscation.

For a pure synchronous process, my favorite template is:

process is
begin
wait until clk = '1'; -- or: wait until rising_edge(clk);

if en = '1' then
q <= d;
end if;
end process;

Major advantage (IMHO): at the first glance you see this is a
synchronous process. Now doubt possible. Also no long winded
if/end-if needed, with an additional indentation level.
 
A

Andy

For a pure synchronous process, my favorite template is:

  process is
  begin
    wait until clk = '1'; -- or: wait until rising_edge(clk);

    if en = '1' then
      q <= d;
    end if;
  end process;

Major advantage (IMHO): at the first glance you see this is a
synchronous process. Now doubt possible. Also no long winded
if/end-if needed, with an additional indentation level.

Just to add more options to the mix (If you don't like additional
levels of if-then statements or indentation):

(I've not tried this, so I don't know if any synthesis tools will "get
it right" or not)

process is
begin
wait until rising_edge(clk) and en = '1';
q <= d;
end process;

Or it's concurrent behavioral equivalent:

q <= d when rising_edge(clk) and en = '1';

About the sensitivity list issue: some simulators use an optimization
whereby multiple processes that share the same sensitivity list are
merged into one process in order to save setup/teardown overhead
associated with multiple processes. Adding an enable to the
sensitivity list would defeat this optimization in most cases. The
same is true for the concurrent statement's implied sensitivity list.

There is also a process template that uses variables for storage, with
assignments to signals after the end of the clocked if-then statement
to infer combinatorial logic outputs (no combo in->out paths). It may
not be recognized by all synthesis tools, but at least Quartus,
Synplify and Precision handle it.

An example would be (ignoring reset):

process (clk) is
variable count: natural range 0 to 2**n-1;
begin
if rising_edge(clk) then
count := (count - 1) mod 2**n;
end if;
output <= count = 2; -- combinatorial decode
end if;

I'm not sure how you would/could do this with a wait statement.

Andy
 
P

Pieter Hulshoff

Andy said:
process (clk) is
variable count: natural range 0 to 2**n-1;
begin
if rising_edge(clk) then
count := (count - 1) mod 2**n;
end if;
output <= count = 2; -- combinatorial decode
end if;

Personally I would avoid these constructions:
1. You generate FFs from variables, which are often hard to find due to name
changing during synthesis.
2. Your logic path consists of FF, - operator, comparator; from a timing
perspective it's better to use the comparator directly on the FF output, and
adjust the expected value accordingly.
3. Your output is not a FF, which may also create timing problems.

Kind regards,

Pieter Hulshoff
 
M

Mike Treseler

Pieter said:
Personally I would avoid these constructions:
1. You generate FFs from variables, which are often hard to find due to name
changing during synthesis.

In modelsim, I use an 'add wave' command for each process
to make the variables visible. Quartus uses the variable
names directly, when they represent flops rather than wires.
2. Your logic path consists of FF, - operator, comparator; from a timing
perspective it's better to use the comparator directly on the FF output, and
adjust the expected value accordingly.

In a simple example like this, you have a point.
In my processes, I may have 30 variables, and
these are mostly internal registers.
3. Your output is not a FF, which may also create timing problems.

The output is indeed a flip flop.
The signal assignment represents
the wire from Q to the output port.
Try it and see.

-- Mike Treseler
 
M

Mike Treseler

Andy wrote:

Pieter said:
3. Your output is not a FF, which may also create timing problems.

Sorry, I read what I expected,
"output <= count;"
not what he wrote.

I agree with you, that with rare exceptions,
process outputs should be registers.

-- Mike
 
P

Pieter Hulshoff

Mike,
In a simple example like this, you have a point.
In my processes, I may have 30 variables, and
these are mostly internal registers.

This second point had more to do with the logic generated by the compiler than
with the use of variables. Take for example:

WAIT UNTIL clk = '1';
counter := counter + 1;
IF counter = 5 THEN
counter := 0;
END IF;

vs

WAIT UNTIL clk = '1';
IF counter = 4 THEN
counter := 0;
END IF;
counter := counter + 1;

or

WAIT UNTIL clk = '1';
counter <= counter + 1;
IF counter = 4 THEN
counter <= 0;
END IF;

The last 2 examples will usually synthesize into faster logic than the 1st,
since the first assumes a + followed by a compare while the last two do the
compare directly on the FF output.

Kind regards,

Pieter Hulshoff
 
A

Andy

Personally I would avoid these constructions:
1. You generate FFs from variables, which are often hard to find due to name
changing during synthesis.
2. Your logic path consists of FF, - operator, comparator; from a timing
perspective it's better to use the comparator directly on the FF output, and
adjust the expected value accordingly.
3. Your output is not a FF, which may also create timing problems.

Kind regards,

Pieter Hulshoff

#1: I've never had problems finding variable-inferred register names.
The hierarchical naming works the same for signals or variables,
there's just an additional level of hierarchy for the process with
variables. Use descriptive process names and you won't have any
problems.

#2: I think you misunderstood what happens with signal assignments
from variables. For instance, the initical example I gave, and this
one, are cycle-accurately identical to each other WRT the output
signal:

process (clk) is
variable count: natural range 0 to 2**n-1;
begin
if rising_edge(clk) then
count := (count - 1) mod 2**n;
output <= count = 2; -- registered decode of combo count
end if;
end if;

The difference is where the register is implemented. In the initial
example, the register is after the decrement, splitting the decrement
and comparison. In this example, the register is after both the
decrement and the comparison, and is a separate register. The cycle
based timing for output in both is identical. Depending on where the
output is needed, the advantage generally lies with the former.
Naturally this is a trivial example which could easily be re-coded
behaviorally to compensate for an additional clock delay from a
registered outputs, but that is not the point. Re-coding for such
compensation often obfuscates the overall behavior that is intended.

When I specify two output signals, using the same expression, but one
within and one after the clocked clause, Synplify will recognize they
are functionally identical, and optimize the combinatorial output
version away. However, I've never seen it convert the combinatorial
output to a registered output unless such duplication was being
adressed, or register retiming was invoked.

Both simulate the same (WRT cycle-based timing on output), both behave
the same after synthesis.

#3: These examples are not intended as a verdict on the
appropriateness for all applications of combinatorial outputs from
synchronous processes, but rather an example of how to generate one
without introducing an additional process (implied or explicit).

I use signals only for inter-process communication. My processes tend
to be large and complex to minimize both the number of processes and
the signal-based communication between them, both of which contribute
to simulation efficiency. All intra-process communication uses
variables, whether the behavior implies a register or not. I prefer
not to focus on the explicit location of registers, but on the cyclic
behavior of the process, which is easier to read and debug from a
truly sequential description of variables than a pseudo-sequential
description of signals. Register re-timing optimizations change the
register/logic locations anyway, and usually do it better than I can
afford to. Just make sure you disable such optimizations (as well as
register replication, etc.) around synchronization boundaries (and
don't ask me how I know that!).

Andy
 
M

Mike Treseler

Pieter said:
The last 2 examples will usually synthesize into faster logic than the 1st,
since the first assumes a + followed by a compare while the last two do the
compare directly on the FF output.

I can't benchmark synthesis
until there is an entity and port assignments.

A real design also needs a reset strategy.

Synthesis sometimes creates duplicate registers
at the front end that are taken out
during mapping.

Because of these complications, I stick with
well-tested "known good" synchronous template
for my designs.

-- Mike Treseler
 
J

JimLewis

This second point had more to do with the logic generated by the compiler than
with the use of variables. Take for example:

WAIT UNTIL clk = '1';
counter := counter + 1;
IF counter = 5 THEN
  counter := 0;
END IF;

vs

WAIT UNTIL clk = '1';
IF counter = 4 THEN
  counter := 0;
END IF;
counter := counter + 1;

or

WAIT UNTIL clk = '1';
counter <= counter + 1;
IF counter = 4 THEN
  counter <= 0;
END IF;

The last 2 examples will usually synthesize into faster logic than the 1st,
since the first assumes a + followed by a compare while the last two do the
compare directly on the FF output.
Interesting example. While I agree with your conclusions above, there
is
another contributing factor with the last 2 examples. With an
incrementer
and a smart synthesis tool, the condition "counter = 4" is the same
as
(converting to unsigned for notation only) "counter(2) = '1'"

Cheers,
Jim
 
K

KJ

another contributing factor with the last 2 examples. With an
incrementer
and a smart synthesis tool, the condition "counter = 4" is the same
as
(converting to unsigned for notation only) "counter(2) = '1'"

Which is why it is usually better to code it as "counter >= 4". Then you
don't need to have as smart of a synthesis tool in order to reach the
conclusion that only bit 2 of the counter is needed.

KJ
 
A

Andy

I've always felt "safer" with '>=' or '<=' comparisons rather than '='
on counters, especially when dealing with non-modulo-2^n counters.

However, it should be noted that the three examples given do not
behave identically. Examples 1 and 3 count from 0 to 4 and repeat.
Example 2 counts from 1 to 4 and repeats!

Small, fast and wrong is still just wrong.

Andy
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top