Identity-conversion of the clock signal

  • Thread starter valentin tihhomirov
  • Start date
V

valentin tihhomirov

Hello,

what if you clock one trigger by std_logic CLK and another by
to_bit(CLK)? What if another clock is to_stdulogic(to_bit(std)). We have
std-to-bit-to-std converter on the clock line in this case. Any VHDL HW
engeneer has a feeling that the conversion is redundant and will be
"optimized out" by synthesizer. In my case, the intermediate value is of
tri-valued type instead of bit, but idea is the same.

Below is the code that intermediately converts an std_logic CLK into a
multibit type and then makes an inverse conversion. The clock never
takes the 3rd value; it is just a convenient way to pass a signal from a
two-valued circuit to a multivalued one that has its clock also
multivalued for convenience. During logic optimization, synthesizer
should replace the 'doniting' converters with a plain wire; so, the
behavour must be like there is a single clock net.

To demonstrate equivalence of the original CLK and the final bitCLK, I
put two regs in pipeline and expect that the following register Q
reproduces the leading D with a single clock cycle delay. Unfortunately,
all RTL simulators I tried (Symphony, Modelthech and my favorite
Active-HDL) agree on something different: Q is fetched the same value as
D simultaneously, without the cycle shift. That is because D is updated
earlier than the clock event reaches Q (I suspect that converter delays
evaluation of clk at Q by 2 delta-cycles). Nevertheless, synthesizer
does not disappoint me: XST removes the unnecessary conversion functions
and gives the implementation the desired pipeline behaviour.

The experiment reveals that simulators 1) refuse to model the FF
behaviour (that requires to fetch the value active at reg input in the
moments preceding the CLK rising edge) 2) nor aim to predict the
synthesised HW behaviour. Nevertheless, the simulators have no
difficulty handling the widely used std-bit conversion:

bitCLK <= std_logic_1164.to_bit(stdlogicCLK);
process begin wait until bitCLK='1'; ...

It is this deceptive success of this given sample that mislead me to my
design. Can you explain why the simulators démarche in case of my
conversion?

I always have problems finding the LRM. Does it require that the sync
clocks are the same wire (not just logically identical)? I have resolved
the issue by balancing the delays (the clock of the first reg is also
null-converted) but it looks fragile and I do not understand why no
balancing is needed in case of std-bit conversion? Or I mistake and the
latter is also unreliable?


use IEEE.STD_LOGIC_1164.ALL;
entity BIST is
port (
CLK: std_logic;
Q: out std_logic
);
end entity;

architecture RTL of BIST is

type TRIVAL is ('U', '0', '1');

FUNCTION BIT_TO_TRIVAL(b: bit) return TRIVAL is -- returns TRIVAL
equivalent of bit
begin
if b = '0' then return '0';
else return '1';
end if;
end function;

signal triCLK: TRIVAL;
signal D: std_logic := '0';

begin


SQUARE_GEN: process
begin
wait until clk = '1';
D <= not D;
end process;

triCLK <= BIT_TO_TRIVAL(to_bit(CLK));

TRIVAL_CLOCKED: block
signal bitCLK: bit;
FUNCTION To_bit ( a : TRIVAL; umap : BIT := '0') RETURN BIT IS
BEGIN
CASE a IS
WHEN '0' => RETURN ('0');
WHEN '1' => RETURN ('1');
WHEN OTHERS => RETURN umap;
END CASE;
END;


begin
bitCLK <= To_bit(triCLK); -- TRIVAL clock to bit

process begin
--wait until CLK='1'; -- this is ok
wait until bitCLK='1'; -- this causes problems
Q <= D;
end process;
end block;

end RTL;



library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
entity BIST_TB is
end entity;

architecture BEH of BIST_TB is
signal CLK: std_logic := '0';
begin
process begin
loop
CLK <= not CLK; wait for 50 ns;
end loop;
end process;

BIST_U: entity work.BIST
port map(CLK);

end architecture;


Thank you for participation.
 
V

valentin tihhomirov

Thank you for response.

The conversions don't matter - either in synthesis or simulation.
> The signal assignments do. You can rewrite this with the same type
> throughout for your intermediate clock signals - no conversions -
> and it will do the same. Try it.

Your are right. I've checked that and confirm that these are the
assignmnents rather than conversion or non-std logic that create the
skew problem. It is unbelivable since the assignment looks as a plain
conductor, which is a way simpler than the multilevel conversion, which
is so complex that you may not even recognize the trivial identity
operation in it.

The simulators are effectively modelling what you wrote as a race condition and
alerting you that this code gives a likelihood of clock skew. In other words,
they are correct.

Unfortunately, they do not alert. XST sometimes does but not in this case.

You are lucky that XST implemented it as you expect; it could have decided to
insert a clock buffer between "clk" and "bitCLK" and then what would happen?
(XST probably won't ... though different XST versions have put clock buffers in
different places in the past ... but what about other tools?)

I think all synthesizers are the same: they all do logical optimization.
And, if they put a buffer, they know that logic is identical.
Actually, I believe that they always put buffers on the clock -- it is
so large and what will happen if they do not put the buffers ;) The
proper buffer placement is FPGA supplier concern.


The LRM doesn't, but good design practice does.
Generate the clock; then use the same clock for both
> the producer and consumer (of D in this example).

Thank you. Unfortunately, there are the cases where the clock passes
through a block of non-standard logic, like it does in mine case. And,
shouldn't it be the LRM that tells us what the good design practice is?
 
V

valentin tihhomirov

I was surprised to know that it is the assignment rather than conversion
that causes the simulation clock skew. It is furthermore surprising when
I think of instantiation. The port mapping looks very much like
assigning signals to I/O. Yet, the unbalanced depth hierarchies are not
known to cause the clock skew.
 
J

Jonathan Bromley

I was surprised to know that it is the assignment rather than conversion
that causes the simulation clock skew. It is furthermore surprising when
I think of instantiation. The port mapping looks very much like
assigning signals to I/O. Yet, the unbalanced depth hierarchies are not
known to cause the clock skew.

AAAAAARGH!!!!!!!!!!!!

Yes, your "discovery" is correct, and in fact is a very
fundamental idea in VHDL: when you connect a signal to
a port, the signal OUTSIDE the port and the signal
INSIDE the port are completely merged, and become
one and the same signal.

This is true even if there is a type conversion in the
port map.

This identity of signals on either side of a port
makes it possible, for example, to detect and use
the 'TRANSACTION attribute of a signal in a different
module than that containing the driver.

It is an important difference between VHDL and
Verilog; in Verilog, a unidirectional port generally
looks like a continuous assignment (a buffer) across
the module boundary. Bidirectional ports in Verilog
are merged, just as in VHDL. Unfortunately it's
not that simple, because simulators are entitled
to merge unidirectional ports as an optimization.
Luckily this does not lead to impossible skew
problems, because the buffer delay in Verilog is
actioned earlier than the nonblocking assignment
delay that is normally used for flipflops.

HOWEVER.................

under pressure from <insert names of your enemies here>
VHDL 2008 has permitted expressions in port maps, and
such expressions DO create a delta-cycle delay across
the port. Personally I regard this as a very misguided
move, breaking an important and elegant feature of the
language for the sake of a trivial improvement in
convenience for some kinds of RTL design. But that
is a completely different rant, for another day...
 
A

Andy

under pressure from <insert names of your enemies here>
VHDL 2008 has permitted expressions in port maps, and
such expressions DO create a delta-cycle delay across
the port.  Personally I regard this as a very misguided
move, breaking an important and elegant feature of the
language for the sake of a trivial improvement in
convenience for some kinds of RTL design.  But that
is a completely different rant, for another day...

Jonathan,

How does a pre-2008 port type conversion function execute without a
delta delay? How are events passed from the one type to the other
without a delta between them?

What if one or both types are resolved types?

I was pretty sure that the Pre-2008 port conversions incurred a delta
delay, but I could very easily be wrong about that... I could see how
an implicit conversion between similar types (e.g. unsigned <-> SLV, a
"cast") could work without a delta, but not an arbitrary, explicit
conversion function (e.g. SL to boolean).

Andy
 
V

valentin tihhomirov

Jonathan said:
AAAAAARGH!!!!!!!!!!!!

Yes, your "discovery" is correct, and in fact is a very
fundamental idea in VHDL: when you connect a signal to
a port, the signal OUTSIDE the port and the signal
INSIDE the port are completely merged, and become
one and the same signal.

This is true even if there is a type conversion in the
port map.

This identity of signals on either side of a port
makes it possible, for example, to detect and use
the 'TRANSACTION attribute of a signal in a different
module than that containing the driver.

It is an important difference between VHDL and
Verilog; in Verilog, a unidirectional port generally
looks like a continuous assignment (a buffer) across
the module boundary. Bidirectional ports in Verilog
are merged, just as in VHDL. Unfortunately it's
not that simple, because simulators are entitled
to merge unidirectional ports as an optimization.
Luckily this does not lead to impossible skew
problems, because the buffer delay in Verilog is
actioned earlier than the nonblocking assignment
delay that is normally used for flipflops.

HOWEVER.................

under pressure from <insert names of your enemies here>
VHDL 2008 has permitted expressions in port maps, and
such expressions DO create a delta-cycle delay across
the port. Personally I regard this as a very misguided
move, breaking an important and elegant feature of the
language for the sake of a trivial improvement in
convenience for some kinds of RTL design. But that
is a completely different rant, for another day...


One of the "enemies" desired the expressions was me:
http://groups.google.ee/group/comp.lang.vhdl/browse_thread/thread/54964f76d0b64cc2
and
http://groups.google.ee/group/comp.lang.vhdl/browse_thread/thread/244b103de6179efa

Yet, my confusion goes deeper as VHDL experts tell the strange things
like expressions must incur a delay (why?) if Brain just pointed out
that the conversions do not impose any. May be it is appropriate place
to inform the reader that I have resolved my problem by using

wait until multivaluedCLK = '1';

construction. It is considerably more concise and less fragile than
keeping the balancing the assignments.
 
J

Jonathan Bromley

How does a pre-2008 port type conversion function execute without a
delta delay? How are events passed from the one type to the other
without a delta between them?

Well... if asked to hazard a guess, I'd say that...
- on input, the signal inside the module is rewritten to be
of the "outside" type, and all readers inside the module
get the incoming conversion automatically applied;
- on output, the signal inside is likewise rewritten to be
of the "outside" type, and the value presented by each
driver inside the module automatically suffers the outgoing
conversion before being applied to the signal.
In this way, events pass through unharmed but values are mapped.
What if one or both types are resolved types?
See above; the conversion can be per-reader and per-writer.
I don't actually know, but I guess on output the inside type's
resolution function could be applied to all the inside drivers,
and then the result converted before being sent to the outside
type's resolution function.
I was pretty sure that the Pre-2008 port conversions incurred a delta
delay,

You panicked me with your post, so I tried it just to be sure.
Try this... the signal assignment incurs a delta delay (of course)
and so you see differences between s_bo and p_bo, but the
converted port p_bi and unconverted p_bo always match.
And it's a custom conversion function :-0

package p is
function to_bit(b: boolean) return bit;
end;

package body p is
function to_bit(b: boolean) return bit is
begin
if b then return '1'; else return '0'; end if;
end;
end;

use work.p.all;
entity e is
port (p_bi: in bit; p_bo: in boolean);
end;
architecture a of e is
signal s_bo: boolean;
begin
s_bo <= p_bo;
process(p_bi, p_bo)
begin
if p_bi = to_bit(p_bo) then
report "OK: " & bit'image(p_bi) & ", " & boolean'image(p_bo);
else
report "??: " & bit'image(p_bi) & ", " & boolean'image(p_bo);
end if;
end process;
process(s_bo, p_bo)
begin
if s_bo = p_bo then
report "OK: " & boolean'image(s_bo) & ", " &
boolean'image(p_bo);
else
report "??: " & boolean'image(s_bo) & ", " &
boolean'image(p_bo);
end if;
end process;
end;

use work.p.all;
entity tb is end;
architecture a of tb is
signal b: boolean;
begin
b <= TRUE after 1 ns, FALSE after 2 ns, TRUE after 3 ns;
test: entity work.e port map (p_bi => to_bit(b), p_bo => b);
end;
 
A

Andy

Well... if asked to hazard a guess, I'd say that...
  - on input, the signal inside the module is rewritten to be
    of the "outside" type, and all readers inside the module
    get the incoming conversion automatically applied;
  - on output, the signal inside is likewise rewritten to be
    of the "outside" type, and the value presented by each
    driver inside the module automatically suffers the outgoing
    conversion before being applied to the signal.
In this way, events pass through unharmed but values are mapped.


See above; the conversion can be per-reader and per-writer.
I don't actually know, but I guess on output the inside type's
resolution function could be applied to all the inside drivers,
and then the result converted before being sent to the outside
type's resolution function.


You panicked me with your post, so I tried it just to be sure.
Try this...  the signal assignment incurs a delta delay (of course)
and so you see differences between s_bo and p_bo, but the
converted port p_bi and unconverted p_bo always match.
And it's a custom conversion function :-0

package p is
  function to_bit(b: boolean) return bit;
end;

package body p is
  function to_bit(b: boolean) return bit is
  begin
    if b then return '1'; else return '0'; end if;
  end;
end;

use work.p.all;
entity e is
  port (p_bi: in bit; p_bo: in boolean);
end;
architecture a of e is
  signal s_bo: boolean;
begin
  s_bo <= p_bo;
  process(p_bi, p_bo)
  begin
    if p_bi = to_bit(p_bo) then
      report "OK: " & bit'image(p_bi) & ", " & boolean'image(p_bo);
    else
      report "??: " & bit'image(p_bi) & ", " & boolean'image(p_bo);
    end if;
  end process;
  process(s_bo, p_bo)
  begin
    if s_bo = p_bo then
      report "OK: " & boolean'image(s_bo) & ", " &
boolean'image(p_bo);
    else
      report "??: " & boolean'image(s_bo) & ", " &
boolean'image(p_bo);
    end if;
  end process;
end;

use work.p.all;
entity tb is end;
architecture a of tb is
  signal b: boolean;
begin
  b <= TRUE after 1 ns, FALSE after 2 ns, TRUE after 3 ns;
  test: entity work.e port map (p_bi => to_bit(b), p_bo => b);
end;

Fair enough; thanks for the explanation!

"Learn something new every day."

Andy
 
J

Jonathan Bromley

"Learn something new every day."

I try; honestly, I try. But as I get older, the
learning process becomes less comfortable.
Increasingly I depend on the youngsters to keep
me honest and to show me the places I still need
to learn more...
 
J

Jonathan Bromley

Yet, my confusion goes deeper as VHDL experts tell the
strange things like expressions must incur a delay (why?)

Expressions *in a port map* incur a delta delay, because
the port map

instance: some_component port map (e_port => expression, ...);

is equivalent to

temp_signal <= expression; -- here is the delta delay
instance: some_component port map (e_port => temp_signal, ...);

Note that the delay is caused by the signal assignment -
it is NOT caused by the expression itself. An expression
is merely a calculation and it takes place in zero time,
without a delta delay.
I have resolved my problem by using

wait until multivaluedCLK = '1';

How is this different from any other way of
sensing an event on multivalueCLK ???
 
V

valentin tihhomirov

> Note that the delay is caused by the signal assignment -
> it is NOT caused by the expression itself. An expression
> is merely a calculation and it takes place in zero time,
> without a delta delay.

Yes, I see but it is not obvious that expressions infere an assignement.
And, I still haven't got any explanation why the hell a simple
assignment, a data movment operator that infers no more than a wire,
takes a way more time to simulate while a conversion, the data
processing, which inevitably involves the data movement, takes zero time?

>
>
> How is this different from any other way of
> sensing an event on multivalueCLK ???

Excuse me, the point was that VHDL allows to wait for event on
non-standard logic. This is an enabler to the assigment-free approach.
The style of sensing does not matter.

One of the blocks in my design is custom logic based: all its signals
including clock are multivalued. This block is a netlist that passes its
multivalued CLK to flip-flop primitives. Originally, the netlist was
std_logic-based and FF primitives used the following construction:

BitClk <= to_bit(CLK);
process begin
wait until BitClk = '1';

I obtained it from academia withoug any skew problems. I, therefore,
started to think this is a proper VHDL style. It also served me as a
hint to go with multivalued logic -- just define a function to convert
from a custom type to bit and get a multivalued-clock controlled
register (clock uses only two values). Suddenly, my conversion produced
the skew problem. I initially believed, it is due to custom conversion
because to_bit(std) revealed no problem. Then, Brain explained that the
assignments are the source of skew and proposed a single clock net as a
"good design practice". Since it is difficult in my case to pass
std_logic through a custom-type CUT, I first decided to balance the
conversions. It is verbose and fragile to care that all regs have
balanced clock assignments. I have also realized that there are
hierarchial ports on the way. The port mappings also looks like an
assignment and promises the same probelm. Fortunately, this discussion
has revealed that the zero-delay port mapping and conversion allow to
accomplish without the detrimental assignments. It would not be possible
though if VHDL forbided the wait for custom logic event.
 
J

Jonathan Bromley

Yes, I see but it is not obvious that expressions infere an assignement.

I'm not quite sure what you mean here - expressions don't
themselves imply an assignment. The exception is the new
syntax allowing an expression in a port map:

instance: thing port map (input => (A and B), ...

is the same as

temp_signal <= A and B;
instance: thing port map (input => temp_signal, ...

which, of course, introduces a delta delay thanks to
the implicit signal assignment to temp_signal.
And, I still haven't got any explanation why the hell a simple
assignment, a data movment operator that infers no more than a wire,
takes a way more time to simulate while a conversion, the data
processing, which inevitably involves the data movement, takes zero time?

We're talking about simulated time here. VHDL has strict
update/execute semantics: whenever a <= signal assignment is
executed, the updating of the target signal is postponed
until the update phase - after all processes have finished
their currently-running execute phase and have reached a
wait of some kind. That's why VHDL does not suffer the
absurd read/write races that plague Verilog. Variable
assignments, however, occur inline during the execute
phase, in zero simulated time no matter how complex the
calculation.
Excuse me, the point was that VHDL allows to wait for event on
non-standard logic. This is an enabler to the assigment-free approach.
The style of sensing does not matter.

One of the blocks in my design is custom logic based: all its signals
including clock are multivalued. This block is a netlist that passes its
multivalued CLK to flip-flop primitives. Originally, the netlist was
std_logic-based and FF primitives used the following construction:

BitClk <= to_bit(CLK);
process begin
wait until BitClk = '1';

I obtained it from academia withoug any skew problems.

That was luck rather than good academia, as I'm sure you
now are aware. Any delta delay on a clock net can easily
give rise to skew trouble.

Note, however, that even this can easily be rewritten to
avoid the assignment's delta-skew:

process begin
wait until to_bit(CLK) = '1';

And, once again, this solution is correct no matter how
complicated the conversion function might be. Indeed,
the common synthesis idiom
rising_edge(clock)
is just such a conversion function, evaluating a boolean
based on the current and immediately previous value of
the std_logic clock.
Since it is difficult in my case to pass
std_logic through a custom-type CUT, I first decided to balance the
conversions. It is verbose and fragile to care that all regs have
balanced clock assignments.

Indeed - close to impossible in practice, I guess.
Of course, ASIC place-and-route tools must necessarily
do precisely that, by constructing appropriate clock
distribution networks - but that is a very different problem.
I have also realized that there are
hierarchial ports on the way. The port mappings also looks like an
assignment and promises the same probelm. Fortunately, this discussion
has revealed that the zero-delay port mapping and conversion allow to
accomplish without the detrimental assignments.

Hierarchical design of synchronous logic would be miserably
difficult if this were not so.

Sometimes, however, the needs of modelling mean that you
will be forced into having some delta delay on your clock.
Although this is tiresome, and it has caused many people
(including me) some trouble, it is possible to deal with it.

There is an alternative approach which is much less fragile.
For every synchronous (registered) output from any block,
introduce a small time delay. This models the flipflops'
clock to output delay, and ensures enough hold time for
other flipflops that use the same clock even if their
clock has a delta delay. It is easy to add a time delay
in a signal assignment from the block's internal signal
to its output port:

entity shifter is
port (clk: in std_logic; d: in std_logic; q: out std_logic);
end;
architecture two_stage of shifter is
signal shift_register: std_logic_vector(0 to 1);
begin
process (clk) begin
if rising_edge(clk) then
shift_register <= d & shift_register(0);
end if;
end process;

-- This assignment models clock-to-output delay
-- and provides enough hold time for other flops
-- even if their clock is delayed by many delta cycles
q <= shift_register(1) after 100 ps;
end;
 
V

valentin tihhomirov

I'm not quite sure what you mean here - expressions don't
themselves imply an assignment. The exception is the new
syntax allowing an expression in a port map:

instance: thing port map (input => (A and B), ...

is the same as

temp_signal <= A and B;
instance: thing port map (input => temp_signal, ...

which, of course, introduces a delta delay thanks to
the implicit signal assignment to temp_signal.


This helped me to understand that the words "to imply" and "implicit"
are relative, that they have a common root. I've got that despite you
use them contradictively. Do you see that you first denounce the
implication and then reannounce it again?

We're talking about simulated time here. VHDL has strict
update/execute semantics: whenever a <= signal assignment is
executed, the updating of the target signal is postponed
until the update phase - after all processes have finished
their currently-running execute phase and have reached a
wait of some kind. That's why VHDL does not suffer the
absurd read/write races that plague Verilog. Variable
assignments, however, occur inline during the execute
phase, in zero simulated time no matter how complex the
calculation.


Why to use assignments to bring some order into the evaluation logic if
you admit yourself that it is almost impossible in practice to rely on
it? IMO, assignments exist in all languages for one important thing: you
compute once and save/share the result.

The benefit of manually inserting the trigger hold delay is that you do
it almost thoughtlessly. But doing it per every FF is also quite
tiresome. Doing this automatically by simulators is what I desired
complaining their discrepancy from HW.

That was luck rather than good academia, as I'm sure you
now are aware. Any delta delay on a clock net can easily
give rise to skew trouble.

Note, however, that even this can easily be rewritten to
avoid the assignment's delta-skew:

process begin
wait until to_bit(CLK) = '1';

This is indeed an alternative to waiting on custom logic. Unfortunately,
none besides 'academic style' is supported by XST, as experiments show
http://forums.xilinx.com/xlnx/board/message?board.id=SYNTHBD&thread.id=1747
 
K

KJ

Why to use assignments to bring some order into the evaluation logic if
you admit yourself that it is almost impossible in practice to rely on
it? IMO, assignments exist in all languages for one important thing: you
compute once and save/share the result.

You're correct that one uses a concurrent signal assignment or a
variable assignment as a method to save (*1) a result to be used by
something else. The reason for doing this is to improve code
maintainability. It can also help with debug of the code when there
is a clear (and correct) match between the naming of the signal and
what it logically really represents.

Where I differ with you is when you say "almost impossible in practice
to rely on it". Rely on it to do what? The "HD" in VHDL does stand
for 'hardware description', but that does not necessarily imply that
source code literally describes the precise implementation. If it
did, then one would expect the source code that someone writes to
describe functionality by instantiating look up tables (and defining
their contents), flip flops and whatever other primitives are
available in a particular FPGA. That description would be completely
useless and have to be completely re-written to use and/or arrays and
flops if targetting a CPLD type device. Both of those descriptions
would also be completely useless if the target implementation was
discrete logic parts.

So one must accept the paradox that the hardware that is being
described in the original source code is most likely NOT a description
of the actual implemented hardware. What is the point of a 'hardware
description' language that is typically not used by an end user to
describe the actual hardware implementation? In a word,
'productivity'. Having to change source code to describe different
physical implementations (i.e. targetting different types of devices)
is not as productive as describing a mythical hardware implementation
and then using tools to translate that into something that is
physically realizable in multiple forms.

Given that the original source describes mythical and not necessarily
real hardware, one has to consider what is the purpose of the various
tools that one will use to interpret that source code. Here you seem
to be thinking that the simulator should be simulating an actual
hardware description and if an assignment will end up resulting in no
hardware, just a wire connection, then there should be no simulation
induced delay. But the simulator is simulating the description of the
mythical hardware (i.e. what is actually in the source code). The
translation of the mythical hardware description into a description of
actual hardware is the job of a synthesis tool. Those are two
independent paths.

Both synthesis and simulation are acting on the same source code
input, but performing completely different functions. Simulation
shows you what will occur if you were to build the mythical hardware
as literally described in the source code. Synthesis cobbles together
a functional near equivalent to the source code description using only
the primitives that it has available to it. The fork in the road is
that the two tools are performing completely different functions on
the same input so expecting them to behave in some coordinated manner
is not a realistic expectation.

Understanding just what different tools are trying to accomplish and
how one should then write source code so that each tool can do their
job and you can use the tools to effectively produce the intended
design and be confident that the results from both tools are valid is
important. One can describe things in source code that truly are
mythical and can not be physically realized...but that's not a fault
of the language, that's the fault of the person who thinks it could be
synthesized in the first place...but it will simulate just fine
regardless. Engineers know how to use tools in a manner that results
in a description that can be used by someone else to actually build
something.
The benefit of manually inserting the trigger hold delay is that you do
it almost thoughtlessly. But doing it per every FF is also quite
tiresome. Doing this automatically by simulators is what I desired
complaining their discrepancy from HW.

This added delay is just a hack that in certain situations may have
some merit...nothing more.
This is indeed an alternative to waiting on custom logic. Unfortunately,
none besides 'academic style' is supported by XST,

Each tool has it's own unique constraints. You either live within
those constraints or don't use that tool. Those constraints also
change over time. Generally speaking, those changes over time are for
the better (i.e. simulators support updated standards, synthesis tools
accept more abstract descriptions than they used to). Each tool has
known as well as undiscovered bugs at any point in time.

Kevin Jennings

(*1): In this context, 'save' does not mean physical memory storage.
 
V

valentin tihhomirov

KJ said:
You're correct that one uses a concurrent signal assignment or a
variable assignment as a method to save (*1) a result to be used by
something else. The reason for doing this is to improve code
maintainability. It can also help with debug of the code when there
is a clear (and correct) match between the naming of the signal and
what it logically really represents.

Intermediate variables also hint the compiler the intermediate type to
use in composite function. I belive that this along with improved
debugging makes the intermediate vars a preferred coding standard in SW
companies. Yet, I often preferred the conciseness of immediate
composition. Jonathan explained why -- I have background in VDHL where
assignments incur delays :)

Where I differ with you is when you say "almost impossible in practice
to rely on it". Rely on it to do what?

Jonathan pointed out that a delta delay on assignment brings some order
into VHDL evaluation of events by simulator. We rely on VHDL to produce
a predictable behaviour.


The "HD" in VHDL does stand
for 'hardware description', but that does not necessarily imply that
source code literally describes the precise implementation. If it
did, then one would expect the source code that someone writes to
describe functionality by instantiating look up tables (and defining
their contents), flip flops and whatever other primitives are
available in a particular FPGA. That description would be completely
useless and have to be completely re-written to use and/or arrays and
flops if targetting a CPLD type device. Both of those descriptions
would also be completely useless if the target implementation was
discrete logic parts.

So one must accept the paradox that the hardware that is being
described in the original source code is most likely NOT a description
of the actual implemented hardware. What is the point of a 'hardware
description' language that is typically not used by an end user to
describe the actual hardware implementation? In a word,
'productivity'. Having to change source code to describe different
physical implementations (i.e. targetting different types of devices)
is not as productive as describing a mythical hardware implementation
and then using tools to translate that into something that is
physically realizable in multiple forms.

Given that the original source describes mythical and not necessarily
real hardware, one has to consider what is the purpose of the various
tools that one will use to interpret that source code. Here you seem
to be thinking that the simulator should be simulating an actual
hardware description and if an assignment will end up resulting in no
hardware, just a wire connection, then there should be no simulation
induced delay. But the simulator is simulating the description of the
mythical hardware (i.e. what is actually in the source code). The
translation of the mythical hardware description into a description of
actual hardware is the job of a synthesis tool. Those are two
independent paths.

Nevertheless, synthesizers manage to produce the same behaviour HW from
the mythical one. They ensure a skewless clock distribution and hold
delay on output when meet a sync FF wait pattern. At that, synthesizers
do not demand the added delay hack.

This added delay is just a hack that in certain situations may have
some merit...nothing more.

This merit is a confidence, which may be not too much for you but I
would prefer it built-in.

In general, your speech sounds like an attempt to justify imperfection
and discreepancy of sim from synth.
Regards.
 
J

Jonathan Bromley

Do you see that you first denounce the
implication and then reannounce it again?

Certainly, but I also made it clear that the second case
was an exception.

I'm guessing that you are working in English as a
second language. Please accept my apologies if
I have been too idiomatic or insufficiently precise.
Why to use assignments to bring some order into the evaluation logic if
you admit yourself that it is almost impossible in practice to rely on
it?

I didn't say that. I said that it was impossible in practice to
balance delta delays for all receivers of a clock. That's a very
different problem.

If all receivers of a clock see it without delta delays, there
is no problem at all, and the delta-delay signal assignment
behaviour of VHDL does precisely what we need. It gives us
a straightforward, efficient abstraction for simulation
whilst ensuring that the same behaviour can be preserved
by synthesis. Without the delta delay on signal assignment,
it is very hard to maintain that matching behaviour.
IMO, assignments exist in all languages for one important thing: you
compute once and save/share the result.

Variable assignment meets that need in VHDL.
The benefit of manually inserting the trigger hold delay is that you do
it almost thoughtlessly. But doing it per every FF is also quite
tiresome. Doing this automatically by simulators is what I desired
complaining their discrepancy from HW.

I'm sorry, I don't really understand your complaint here. VHDL,
like other hardware description languages, does not pretend to
be a perfect model of digital hardware. But it does provide
modelling that is sufficiently accurate for most designers'
needs, providing they follow certain rules. Those rules
are neither complicated nor onerous.
This is indeed an alternative to waiting on custom logic. Unfortunately,
none besides 'academic style' is supported by XST, as experiments show
http://forums.xilinx.com/xlnx/board/message?board.id=SYNTHBD&thread.id=1747

Ah, now I understand the problem.

Did you by any chance learn your VHDL from Navabi's book? It makes
extensive use of custom enumerated data types to model logic.
Unfortunately, it is academic and does not match the real world.
In general, enumerated types in VHDL synthesise to hardware signals
that have enough bits to provide a unique 1/0 binary value for
each enumeration literal. So, for example, your "trit" data type
with its three values 'U', '0', '1' will synthesise to at least
two digital signals (possibly three, if the tool chooses one-hot
coding). That explains the curious error messages you got
from XST. Simulation would handle the code just fine.

There is just one exception to this. The built-in data types
boolean and bit, and the predefined type std_ulogic, are handled
specially by synthesis because they were created specifically
in order to model the behaviour of a single digital signal.

I don't really understand why you need your "trit" data type;
std_(u)logic has all the values you need, and there is already
a definition in ieee.std_logic_1164 that may serve you well:

subtype X01 is std_logic range 'X' to '1';

If you use that type, it will reliably synthesise to a single
wire and any conversion functions' behaviour will match
simulation.
 
K

KJ

Nevertheless, synthesizers manage to produce the same behaviour HW from
the mythical one.

You're mistaken and apparently didn't understand my post. Synthesizer
do *not* produce an implementation that is the exact same behavior as
specified in the original source unless your source code consists of
only instantiation of primitives that are available in the target
device and nothing else. In particular, assignments (b <= a; b <=
Some_function(a)) are not implemented at all, they are transformed
into a logical equivalent which is implemented with those primitives.
Two things that are logically equivalent are not necessarily the
same. Logic does not consider delays at all, this is covered in
Boolean Logic 101.

Any FPGA or CPLD *could* implement the assignment b <= a; in
hardware. They do not do so because that is (almost) never what the
design engineer intends to be implemented in the hardware.
Implementing the code as literally written would wildly waste device
resources and would cause the engineer to quickly find a better
synthesis tool.
They ensure a skewless clock distribution and hold
delay on output when meet a sync FF wait pattern. At that, synthesizers
do not demand the added delay hack.

Right...like I said, synthesizers don't actually implement was is
literally described in the source code...which implies they implement
'something else'. A simulator simulates what is literally described,
not 'something else'. Someone skilled in using both synthesis and
simulation tools knows how to use them (i.e. write code) so that the
simulation of 'literally described source code' will be close enough
to the 'something else' that the synthesizer actually implements to
have high confidence that they are describing essentially the same
thing, even though they are not exactly the same.
This merit is a confidence, which may be not too much for you but I
would prefer it built-in.

OK, so find a synthesis tool that implements your source code as
literally written. You'd probably find that you would be the only one
in the world that would use it. Alternatively, choose to write your
code as only instantiations of primitives supported by the target
device.
In general, your speech sounds like an attempt to justify imperfection
and discreepancy of sim from synth.

No, just trying to enlighten you on what the differences are.
Synthesis tools and simulation tools are both 'tools'. Tools can be
used by both those skilled in their use and those that are unskilled.

Kevin Jennings
 
K

Kenn Heinrich

valentin tihhomirov said:
Intermediate variables also hint the compiler the intermediate type to
use in composite function. I belive that this along with improved
debugging makes the intermediate vars a preferred coding standard in
SW companies. Yet, I often preferred the conciseness of immediate
composition. Jonathan explained why -- I have background in VDHL
where assignments incur delays :)



Jonathan pointed out that a delta delay on assignment brings some
order into VHDL evaluation of events by simulator. We rely on VHDL to
produce a predictable behaviour.

This is a key point, that the language has well defined semantics. But
Kevin's "mythical implementation" analogy is very accurate - these well
defined semantics are for the mythical (simulation) aspect, not for the
physical hardware.

The problem you stumbled on, with a signal assignment causing a clock
skew, is one of the classic nuisance problems in VHDL. I don't know
whether to call it a design mistake, or just an oversight - perhaps the
design committe should have forseen this scenario. It would appear
trivially obvious to a real hardware guy, but would simulate with such
counter-intuituve behaviour. I think all hardware guys worth their salt
have run across this one before and made a mental note: "Just don't do
that".
The "HD" in VHDL does stand

Nevertheless, synthesizers manage to produce the same behaviour HW
from the mythical one. They ensure a skewless clock distribution and
hold delay on output when meet a sync FF wait pattern. At that,
synthesizers do not demand the added delay hack.



This merit is a confidence, which may be not too much for you but I
would prefer it built-in.

In general, your speech sounds like an attempt to justify imperfection
and discreepancy of sim from synth.
Regards.

I think there are two things you can take away from this discussion. The
first is Andy's well described case about simulation with well defined
semantics versus synthesis as a parallel-but-not-quite-identical
process. The second is that the delta cycle delay problem you saw is
both (1) well understood under the LRM language semantics, and (2) in
your specific case completely counterintuitive to what an old-school
breadboard and wire-wrap hardware engineer will expect.

As far as the suggestion to add

foo <= bar after 100 ps;

you must be very careful here. The LRM allows you to set a time
resolution limit for any simulation, and will intepret time expressions
less than the limit as zero. Therefore, the above code simulated with a
resolution limit of 1 ns will behave the same as

foo <= bar after 0 ns;

which is the same as

foo <= bar;

Watch out!

As far as your example working well from academia, one think to note
about the skew problem is that it really only happens one way. I.e. when
clk2 <= clk1; you will find that a process sensitive to clk2 will see
data produced by a "flop" (a process) on clk1 a cycle early, but NOT the
other way around. It might be the case that your academic example didn't
exhibit this problem because there was only a one-way crossing from the
one "clock domain" to the other. Or perhaps it was lucky delta
balancing.

If you absolutely must keep two different data types for clocks, the
most timing-robust (although admittedly ugly) solution is to pass, in
parallel, two versions of the same clock through your entire heirarchy
and produce both the std_logic and TRI versions out of one single,
coherent, delta balanced clock generator process. That puts all of your
balancing in one easy to manage place and gives you the illusion of
being able to pass cleanly back and forth between the two domains, which
mimics your real hardware pretty accurately.

Another thing you might wqant to try is to make use of the fact that
'X','0', and '1' in the std_logic enum are consecutive, and there's an
actual subtype X01 and conversions (cvt_to_x01, To_x01) that can let you
deal with the surjection. Of course, this would mean re-engineering
your codebase to remove your custom clock type.

As a general principle, though, it's often good software engineering
practice (abstraction, conciseness, type-safety) to encapsulate your
data in user types with conversion functions. However, the case of
clocks is an exception - it's just plain dangerous, for all the reasons
discussed above.


And as a further slight digression on some of the earlier posts, the
reason that the simple (2001- style) port mappings do not cause delta
delays while the signal assigment do is the following:

Every port and declared signal defines a new, distinct signal. However,
the simulator (and hence the LRM) deals with _groups_ of signals that
are tied together, called "nets" in the LRM. When you do simple port
maps, the signals get combined to belong to the same "net". In the LRM,
one _net_ gets updated in between one simulation cycle. Thus, you can
tie arbitrary levels of heirarchy together, and every port lives on the
same net, so every flop sees the rising edge on the same simulation
cycle. This is slightly different that just joining signals togther,
due to in/out/buffer, type conversion, resolution, and so
forth. However, a signal assignment is actually a VHDL process, and NOT
a port mapping (even though the use of symmetric arrows (<= and =>)
would make you think otherwise).

For your clock problem, this is what happens according to the LRM:

- The language says that first you update the net on clk1 (and all
signals attached to it). Then you queue up every process sensitive to
all signals on clk1's net. That's the update part of the first
simulation cycle.

- The PROCESS defined by "clk2 <= clk1" fires because it's sensitive to
clk1 and schedules a pending transaction on clk2. That will be seen in
the *next* simulation cycle, but first we have to finish running every
process (i.e. clock every flop) sensitive to clk1's net.

- This means every *other* process sensitive to clk1 is also running,
assigning your Q outputs to every clk1-domain DFF. Now ths first sim
cycle is complete.

- On the *next* simulation cycle, every signal associated with the clk2
net gets updated, triggering every process sensitive to clk2 to
run. These processes will see the values just written by the clk1
processes, resulting in your phantom early clocking. Since two sim
cycles elapsed but no time did, this was a delta cycle.


Hope this helps a little,

- Kenn
 
V

valentin tihhomirov

I didn't say that. I said that it was impossible in practice to
balance delta delays for all receivers of a clock. That's a very
different problem.

If all receivers of a clock see it without delta delays, there
is no problem at all, and the delta-delay signal assignment
behaviour of VHDL does precisely what we need. It gives us
a straightforward, efficient abstraction for simulation
whilst ensuring that the same behaviour can be preserved
by synthesis. Without the delta delay on signal assignment,
it is very hard to maintain that matching behaviour.


I have concluded this from your contrapose VHDL event evaluation to
Verilog's "mess".

Variable assignment meets that need in VHDL.

Variables are used to generate logic in the process. They represent
different signals at different times. And, they are hard-to-debug in
simulator.

I'm sorry, I don't really understand your complaint here. VHDL,
like other hardware description languages, does not pretend to
be a perfect model of digital hardware. But it does provide
modelling that is sufficiently accurate for most designers'
needs, providing they follow certain rules. Those rules
are neither complicated nor onerous.

I told about tiresomeness of extending every reg assignment with 'after'
clause. The rule "to do everything" is not complex at all per se ;)

Ah, now I understand the problem.

Did you by any chance learn your VHDL from Navabi's book? It makes
extensive use of custom enumerated data types to model logic.
Unfortunately, it is academic and does not match the real world.
In general, enumerated types in VHDL synthesise to hardware signals
that have enough bits to provide a unique 1/0 binary value for
each enumeration literal. So, for example, your "trit" data type
with its three values 'U', '0', '1' will synthesise to at least
two digital signals (possibly three, if the tool chooses one-hot
coding). That explains the curious error messages you got
from XST. Simulation would handle the code just fine.

Are you about "Bad condition in wait statement, or only one clock per
process" or "at line 0, operands of <AND> are not of the same size"? How
can "wait until to_bit(multivalued)='1'", that converts to a single bit,
can produce a multibit wait? XST is more cryptic than informative. The
Quartus output is really descriptive, on the other hand.

There is just one exception to this. The built-in data types
boolean and bit, and the predefined type std_ulogic, are handled
specially by synthesis because they were created specifically
in order to model the behaviour of a single digital signal.

I don't really understand why you need your "trit" data type;
std_(u)logic has all the values you need, and there is already
a definition in ieee.std_logic_1164 that may serve you well:

subtype X01 is std_logic range 'X' to '1';

If you use that type, it will reliably synthesise to a single
wire and any conversion functions' behaviour will match
simulation.

The special "single bit" treatment of std types is the reason I
introduce a custom enumeration. I will try to emulate the truly
multivalued data lines in one of my subblocks.
 
V

valentin tihhomirov

Thank you for sharing exertise. I'm asking, however, about fully
synchronous FPGA-internal case.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top