Describing pipelined hardware

Ben Jones · Jun 8, 2006

I am, however, against doing documentation too early in development. As
requirements often changes, you will find yourself with lots of
additional work to do just to keep the documentation up-to-date. Do a
minimal amount of documentation at the early stages of the development,
then when the product matures, spend more time documenting the internals.

To me, this argument (writing docs/comments early on causes extra
maintenance) is like saying "don't install lead shielding in nuclear
processing facilities, because it's a health hazard". The alternative is
much, much worse.

When I design a moderately complicated circuit, I much prefer to start by
documenting the design. This often throws up problems that I wouldn't have
seen coming if I'd just sat down and started coding - particularly in
interfacing and flow control. I'll only start coding a module when I have a
good idea not only what it's supposed to do, but how it's going to work,
often (but by no means always) down to the clock cycle.

I've found this tends to lead to a more stable architecture that doesn't
have to change radically over its lifetime (so the
documentation-effort-mountain never materializes). Still, I'd agree that
it's definitely possible to overdo it.

Cheers,

-Ben-

KJ · Jun 8, 2006

Kim Enkovaara said:
The problem with pure VHDL is that the pipeline functionality is usually
really hard to extract from the code afterwards by another designer. When
you have long pipelines and the different pipeline stages fight access
to shared resources (memories, multipliers) then tracking that
whole mess becomes really hard.

I've found that this usually tends to happen when the algorithm
implementation isn't properly partitioned into clearly describable functions
each using a decent I/O interface model specification. (See my June 7 post
for more of my blabber on that).

That spreadsheet is also a good tool if pipeline structure must be
changed.
It's easy to see all the dependencies and they are not forgotten during
the change.

And if you design to that spreadsheet (i.e. treat it as a specification that
you need to meet) than that's a good approach for breaking the problem down.

I have seen few times where during update of pipeline sequencing
of one small place was forgotten to update and the bug was hidden for a
long
time.

Again, having the entire algorithm properly partitioned into clearly
describable functions each using a decent I/O interface model specification
helps since it becomes a bit more straightforward to unit test each
sub-block if necessary. Such unit testing 'should' allow more rigorous
testing of boundary and corner conditions in the design over a more
monolithic testbench of the entire function. Even if it doesn't though
having multiple testbenches to run sub functions can generally help catch
the more subtle errors.

Kevin Jennings

KJ · Jun 8, 2006

I am, however, against doing documentation too early in development. As

requirements often changes, you will find yourself with lots of
additional work to do just to keep the documentation up-to-date. Do a
minimal amount of documentation at the early stages of the development,
then when the product matures, spend more time documenting the internals.

Maybe we're just thinking differently about the *level* of documentation
that you're talking about but I couldn't disagree more about documenting
early. As a designer it's your responsibility to turn requirements into an
implementation and the first step in that process 'should' be documentation
of what that implementation is. As that documentation gets worked on and
actually thought about many of the design issues will get worked through.
Not to say that as the coding begins that oopses won't be found but that is
a measure of the quality of the documentation, a well produced design
document will not have many things that pop up and hopefully no major
gotchas.

As for changing requirements, that's a whole different problem. I know it
always happens and always will because it will probably always take longer
to design something to meet a set of requirements than it will be to come up
with a new set. But if the requirements are constantly changing that is a
measure of the quality of the work being done by the folks producing the
requirements, they are not able to translate their market needs into a
product requirement very well. That being the issue, than that's the
problem that should be addressed first. And yes, I realize we're all in
that 'changing quickly, gotta react to that market quickly' mindset and you
can't wait for the requirements document to be totally finalized cuz we'll
miss the market window but that's not to say that there can be no overlap
between requirements definition and design but that is still no excuse for
botching the requirements. A 'bad' design can come out of a 'not so hot'
designer, and the same can be said for 'requirements' and 'not so hot
marketing' folks.

Kevin Jennings

KJ · Jun 8, 2006

For more complex circuits I'll use a bitmap graphics editor and save in a

standard format (done carefully, this doesn't lead to as many maintenance
problems as you might think). I don't like any of the vector drawing
packages I've tried recently which has led me to start thinking about
doing
something with SVG...

Unless this diagram though is taken as something to design to it becomes
obsolete and out of date real quickly though since it doesn't get
maintained. If it IS something that will be designed to then it should be
part of the design specification.

For timing diagrams I have a neat web-based tool I wrote myself (based on
an
idea I stole shamelessly from Frank Vorstenbosch). It's not actually on
the
public Internet anywhere, although every time someone posts on one of
these
groups asking for a free timing diagram authoring tool I have an urge to
post it somewhere! The beauty of this tool is that the source is ASCII
text,
and is fairly readable (and writeable) in its own right, so you're not
forever fretting about making lines join up and getting spacing and angles
right.

Yet another proprietary format!!! (Sorry, couldn't resist, even if you do
have a wonderful tool)

Kevin Jennings

Ben Jones · Jun 8, 2006

KJ said:
Unless this diagram though is taken as something to design to it becomes
obsolete and out of date real quickly though since it doesn't get
maintained.

Under the assumption that the designer is a lazy weasel, yes you're right.

It's too often the case, but it ain't necessarily so.

If it IS something that will be designed to then it should be
part of the design specification.

100% agreed.

Yet another proprietary format!!! (Sorry, couldn't resist, even if you do
have a wonderful tool)

Yet another format: yes, OK. Proprietary: no, I wouldn't have thought so.
Surely that would mean I kept it secret in some way?

Cheers,

-Ben-

KJ · Jun 8, 2006

Jonathan said:
but there's a big part of me that wants to go with Mike Treseler's "do everything
in one clocked process" plan.

Then while debugging somebody elses code, you drag a signal from the
Modelsim wave window to the dataflow window and cringe when you see
that the signal that you're interested is one of the 20 outputs of a
process with 42 inputs. Then you scroll down to find the equation that
you're interested in and find that it's a relatively simple combination
of three signals. After pondering on those you use dataflow to take
you back to the source of one of those three signals and cringe again
when you find that it too is one of 26 outputs of a process with 37
inputs and start muttering curses against those "do everything in one
clocked process" approach people. Either that or mutter something to
Mentor Graphics to have them improve the dataflow window to have some
way to filter out only those signals in the process that are actually
used by the selected signal.

The latter gives you (at best) almost software-like
clarity

Haven't you heard though? Software is going parallel. Now that
Moore's law can no longer deliver increased clock speed and the road to
high performance resides in parallel programming of multiple cores the
mindset that comes along with understanding VHDL and the lack of any
ordering of concurrent statements/processes means that software is
begining the move away from the sequential mindset

at the expense of requiring the synthesis tool to chase
around a large piece of code trying to find widely-scattered
opportunities for resource sharing.

Any benchmark examples of what that expense might be? Like it took 2
hours to synthesize with one approach, 1:57 with another? Or maybe
only 15 minutes? Or just shooting from the hip maybe based on things
you've seen?

Over the years I've
become reasonably skilled at second-guessing what styles
a synthesis tool will optimise well, but that's not a very
reliable way to proceed!

I find it hard to believe that it is a 'style' thing that is optomizing
well or not. What you may consider style I think probably would
translate into actual logical differences and therefore different
synthesis results. A incorrect example of 'style' could be using
'case' versus nested 'if/elsif'. The 'if/elsif' implies a priority
encoding whereas 'case' implies that the cases are distinct and
non-overlapping. But this is not really a 'style' difference at all,
it's different logic. So in any case, I'm not sure what styles you see
that optomize well versus another style of equivalent logic that
optomizes not so well.

Kevin Jennings

Mike Treseler · Jun 8, 2006

KJ said:
I've found that this usually tends to happen when the algorithm
implementation isn't properly partitioned into clearly describable functions
each using a decent I/O interface model specification.

What is it about a data pipeline that I
can't see clearly in the simulation
waveforms and document with testbench
code and and a pdf of the waveforms?

-- Mike Treseler

Mike Treseler · Jun 8, 2006

KJ said:
Then while debugging somebody elses code, you drag a signal from the
Modelsim wave window to the dataflow window and cringe when you see
that the signal that you're interested is one of the 20 outputs of a
process with 42 inputs.

That's exactly the reason that I use a single process per entity.
There are no directionless signals from who knows where.
There are no signals at all. No need to trace data flow. Just code.

Any benchmark examples of what that expense might be? Like it took 2
hours to synthesize with one approach, 1:57 with another? Or maybe
only 15 minutes? Or just shooting from the hip maybe based on things
you've seen?

I have not seen any difference for synthesis.
Simulation is very quick however for a module
without signals.

I find it hard to believe that it is a 'style' thing that is optomizing
well or not. What you may consider style I think probably would
translate into actual logical differences and therefore different
synthesis results.

I don't agree. Try it and see.

-- Mike Treseler

Jonathan Bromley · Jun 8, 2006

Jonathan said:
Jonathan said:

but there's a big part of me that wants to go with Mike Treseler's "do everything
in one clocked process" plan. [...]
The latter gives you (at best) almost software-like
clarity

Click to expand...

Haven't you heard though? Software is going parallel.

I wish. I've been trying to do, and encourage, parallel software
since long before it was even a teeny little bit fashionable.
We hardware folk do parallel all the time, and thank heavens
we have languages that let us describe static instantiation
and parallel execution rather easily. However, if you are
trying to see the overall story about what happens to some
data as it flows through the system, a serialised description
is often more lucid. My concern - the one I was asking
people to share in the original post - was that it's almost
impossible to preserve such a serialised description across
a pipelined design. Several contributors have suggested
ways to help with this, but I still see it as an issue.

Any benchmark examples of what that expense might be? Like it took 2
hours to synthesize with one approach, 1:57 with another? Or maybe
only 15 minutes? Or just shooting from the hip maybe based on things
you've seen?

A bit more than shooting from the hip; rather, the experience that
synth tools often fail to find opportunities to simplify a datapath
when the operations that imply that datapath are deeply entangled
in control code. To take a trivial example: suppose an address
counter is incremented in each of several branches of a case
statement. It's obviously easier for the synth tool to optimise
thatif the programmer sets an increment-enable flag in each
branch, and uses that flag to enable the increment operation,
than if the increment is specified independently in each branch.
This example probably doesn't cause any trouble, but I have
plenty of experience of more complex examples of arithmetic
or logic actions *not* being optimised in situations like this.

I find it hard to believe that it is a 'style' thing that is optomizing
well or not.

See above. Perhaps "style" was the wrong word, but there's
no doubt that some ways of writing the code optimise better
than others. And in no-trivial cases it's sometimes tough to
predict in advance what will cause trouble and what won't.

Thanks for all the other interesting insights.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
(e-mail address removed)
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Ralf Hildebrandt · Jun 8, 2006

KJ wrote:

Then while debugging somebody elses code, you drag a signal from the
Modelsim wave window to the dataflow window and cringe when you see
that the signal that you're interested is one of the 20 outputs of a
process with 42 inputs.

But would it be easier to understand the circuit, if you would have to
look at 12 different processes, that are connected in a "wild" way with
several signals?

I guess a clear style is independent from the "one process or many
processes"-question. Always you should build "blocks" that have a well
defined purpose. The purpose of these "black boxes" should be known. If
you want to have a look into the box you should find again "boxes", that
have a clear purpose and so on.

I personally use a lot of processes (sometimes one for every flipflop),
but this depends on the circuit I have to model. Most of my circuits a
small ones, that have to be highly optimized and have a "difficult"
behavior. For more circuits, that include e.g. more arithmetic, Mike's
approach of "one process" may be much more clear.

Ralf

Jason Zheng · Jun 8, 2006

Jonathan said:
Jonathan said:

but there's a big part of me that wants to go with Mike Treseler's "do everything
in one clocked process" plan. [...]
The latter gives you (at best) almost software-like
clarity

Click to expand...

Haven't you heard though? Software is going parallel.

Click to expand...

I wish. I've been trying to do, and encourage, parallel software
since long before it was even a teeny little bit fashionable.
We hardware folk do parallel all the time, and thank heavens
we have languages that let us describe static instantiation
and parallel execution rather easily. However, if you are
trying to see the overall story about what happens to some
data as it flows through the system, a serialised description
is often more lucid. My concern - the one I was asking
people to share in the original post - was that it's almost
impossible to preserve such a serialised description across
a pipelined design. Several contributors have suggested
ways to help with this, but I still see it as an issue.

I agree with your assessment. But I disagree with blindly applying Mike
Treseler's approach. The thinking that by simply describing pipeline
design in a software-like fashion we can avoid thinking in parallel is
plain wrong. There are many advantages to Mike Treseler's approach, but
it's not the golden key that solves the basic problem that you are
describing.

Perhaps, verilog's programming level is too low to tackle the
"easily-readable, self-documenting" pipeline design problem. Perhaps we
need a higher-level logic description language/scheme to have a true
clean solution. Imagine describing a database query in x86 assembly
language; I would much rather read that description written in SQL. One
thing for sure, rewriting that query in C won't help much either.

No offense to anyone, just my 2 cents.

~jz

Mike Treseler · Jun 8, 2006

Jason said:
I agree with your assessment. But I disagree with blindly applying Mike
Treseler's approach. The thinking that by simply describing pipeline
design in a software-like fashion we can avoid thinking in parallel is
plain wrong.

True. Without simulation I can't see what I'm doing.
However, with simulation I can keep the ducks
lined up as I go and see everything at once.

-- Mike Treseler

KJ · Jun 8, 2006

Mike said:
What is it about a data pipeline that I
can't see clearly in the simulation
waveforms and document with testbench
code and and a pdf of the waveforms?

For every possible corner, boundary and flow control condition that can
ever occur? Pictures speak 1000 words, but you would need a lot of
pictures to cover all of this.

Besides, in doing that, you're not so much documenting what it is that
the design does as it is characterizing what it does under certain test
conditions.

My point was that if you force yourself to have a clear functional
description of each of the sub-functions that basically define the
pipeline in the first place and couple that by adhering to a defined
flow control specification than the effort "to extract from the code
afterwards by another designer" is probably not required. When you
don't have this clear definition then "tracking that whole mess becomes
really hard."

Kevin Jennings

KJ · Jun 8, 2006

Mike said:
That's exactly the reason that I use a single process per entity.
There are no directionless signals from who knows where.
There are no signals at all. No need to trace data flow. Just code.

Maybe should have been clearer, what I was referring to was Modelsim's
dataflow window itself as a debug tool when confronting code that is a
single process per entity. I doubt that most of those 42 outputs
depend directly on more than a handful of those 20 inputs. The
Modelsim dataflow window works well with the source code to help you
navigate through the code and is much easier to use when you can see
that the signal you're interested in depends only on these 3 signals
(and not the other 39), you can immediately see the state of the
signals and the code and then wave them (or not). Then click on the
input and it takes you back to the driver and repeat until the root
cause is found.

I don't agree. Try it and see.

Try what? Do you have two examples of 'good' and functionally
equivalent code where the style makes a difference?

Kevin Jennings

KJ · Jun 8, 2006

Jonathan said:
See above. Perhaps "style" was the wrong word, but there's
no doubt that some ways of writing the code optimise better
than others. And in no-trivial cases it's sometimes tough to
predict in advance what will cause trouble and what won't.

I'd be interested in seeing some examples of good code that are
functionally equivalent but optomize differently. I can certainly see
that if you have some poorly written code that happens to be
functionally equivalent to well written code that the optomizer might
churn a bit...or is that what you meant by 'style'? (Light bulb
might've just turned on here).

Thanks for all the other interesting insights.

You're welcome

Kevin Jennings

KJ · Jun 8, 2006

Ralf said:
KJ wrote:

But would it be easier to understand the circuit, if you would have to
look at 12 different processes, that are connected in a "wild" way with
several signals?

My comment was more directed towards an after the fact you're debugging
somebody elses code to fix a problem, not towards the different effort
of trying to get some overall picture of the data flow of the algorithm
itself.

The way the Modelsim dataflow window, source window and wave window
work together makes it very easy to navigate through almost any code
well written or not. The 'almost' exception is when you're interested
in tracking backwards from symptom to root cause and hit a process
where with lots of in and outs but the signal that is on the path that
you're following doesn't depend on all of those inputs, generally only
a handful.

Twas off on a tangent

I personally use a lot of processes (sometimes one for every flipflop),
but this depends on the circuit I have to model. Most of my circuits a
small ones, that have to be highly optimized and have a "difficult"
behavior. For more circuits, that include e.g. more arithmetic, Mike's
approach of "one process" may be much more clear.

Actually I don't think the one process versus several processes
approach makes anything 'clearer', just different. Some of those
differences are perceived as good, some bad whereas somebody else would
have the opposite opinion. The clarity comes from the skill of the
person writing the code.

Kevin Jennings

Mike Treseler · Jun 8, 2006

KJ said:
Maybe should have been clearer, what I was referring to was Modelsim's
dataflow window itself as a debug tool when confronting code that is a
single process per entity.

Perhaps I misunderstood what you were saying.
In the dataflow view each process is a box,
so I don't any reason to use it with a
single process design. Like you, I use the dataflow viewer
sometimes to decode designs by others, but
it is not the right tool to debug a single process module.

Try what? Do you have two examples of 'good' and functionally
equivalent code where the style makes a difference?

No. I don't think the style makes any difference.
I have never seen any "different synthesis results"

-- Mike Treseler

Kim Enkovaara · Jun 9, 2006

KJ said:
My point was that if you force yourself to have a clear functional
description of each of the sub-functions that basically define the
pipeline in the first place and couple that by adhering to a defined
flow control specification than the effort "to extract from the code
afterwards by another designer" is probably not required. When you
don't have this clear definition then "tracking that whole mess becomes
really hard."

The problem is to make clear functional description so accurately that
it describes all the functionality in the pipeline. At least in my
opinion tracking dependencies between pipeline stages is hard. With
dependencies I mean something like this as an example:

"Stages 3 and 25 share a common memory and stalls in the pipeline are
not allowed. When the format of incoming data is known we know that
when s3 is accessing memory s25 has propagated data that doesn't need
that access".

Now think that change is needed and one pipeline stage is added.
What are all the dependencies that have to be also changed, what are
the new hazards, is the new condition adding new hazards?

Those dependencies are really hard to handle. Formal model checking can
be a good tool to proof that hazards are not possible with the used
constrained incoming data. Of course there can be error conditions and
the design must get over them and heal itself, or at least indicate that
pipeline reset is needed.

In just pure dataflow pipeline without dependencies is not so hard to
document. It is just defined transactions between stages. Also if
the stages can stall sometimes, and there are fifos to handle that,
then hazard handling comes much easier.

--Kim

Ben Jones · Jun 9, 2006

KJ said:
I'd be interested in seeing some examples of good code that are
functionally equivalent but optomize differently.

I think there are a lot of people talking at cross purposes here, but here's
an example of two functionally identical pieces of code which will often
synthesize to something different:

X: process (clock)
begin
if rising_edge(clock) then
if (a and b) = '1' then
c <= d;
end if;
end if;
end process;

Y: process (clock)
begin
if rising_edge(clock) then
if a = '1' then
c <= (c and b) or (d and not b);
end if;
end if;
end process;

Process X will usually map to a single register, with a two-input AND
function driving the clock enable pin. Process Y will usually map to a
single register with a 3-input MUX function on the input, and the clock
enable driven simply by signal 'a'. In FPGA technology it's quite likely
that the latter circuit is faster.

This is a simple example - real situations are often much more extreme.

-Ben-

KJ · Jun 9, 2006

Minor error in the equation for 'c' in the 'Y' process, but simple enough to
try on a few different tools....thanks, we'll see what happens.

X: process (clock)
begin
if rising_edge(clock) then
if (a and b) = '1' then
c <= d;
end if;
end if;
end process;

Y: process (clock)
begin
if rising_edge(clock) then
if a = '1' then
c <= (c and b) or (d and not b);

Should be
c <= (d and b) or (c and not b);

Pipelining a large mathematical equation	6	Oct 12, 2012
question on timing in synthesizable vhdl	5	Sep 12, 2005
Back to the future	19	Jul 10, 2009
Identity-conversion of the clock signal	21	Jan 20, 2010
Seeking advice regarding asynchronous (responseless?) HTTP	1	Jun 9, 2009
ANN: Version 0.1.2 of sarge (a subprocess wrapper library) has beenreleased.	0	Dec 17, 2013
VHDL - processes, race conditions, & Verilog	7	Apr 1, 2005
BioRuby & Google Summer of Code 2011	0	Mar 25, 2011

Describing pipelined hardware

Ben Jones

KJ

KJ

KJ

Ben Jones

KJ

Mike Treseler

Mike Treseler

Jonathan Bromley

Ralf Hildebrandt

Jason Zheng

Mike Treseler

KJ

KJ

KJ

KJ

Mike Treseler

Kim Enkovaara

Ben Jones

KJ

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads