Describing pipelined hardware

Jonathan Bromley · Jun 6, 2006

Not a specific question, not a request for help, just an
invitation to share ideas about something that I've always
found tricky - and I suspect I'm not alone.

Using HDLs you can elegantly describe quite complicated logic in
a clocked process - we've had several discussions about that
here, and we know there are many popular styles.

Mostly, though, we need to describe things that are pipelined.
Sometimes that pipelining is from choice, sometimes it's
forced upon us by the behaviour of things outside our
control (such as pipelined synchronous RAMs in an FPGA).

As soon as you have a pipelined design, it's rather easy to
describe the behaviour of each pipeline stage as an HDL
clocked process (or, indeed, as part of a process that
describes multiple stages) but as soon as that happens
you tend to lose sight of the overall algorithm that's
being implemented. Sometimes the design nicely
suits a description in which each pipeline stage stands
alone, but if there is any feedback from later pipeline
stages to earlier ones then it's usually much harder
to see what's going on.

So, here's my question: When writing pipelined designs,
what do all you experts out there do to make the overall
data and control flow as clear and obvious as possible?

Thanks in advance

Mike Treseler · Jun 6, 2006

Jonathan said:
Mostly, though, we need to describe things that are pipelined.
Sometimes that pipelining is from choice, sometimes it's
forced upon us by the behaviour of things outside our
control (such as pipelined synchronous RAMs in an FPGA).

It can also be forced by the design requirements.
I can't shift in a serial packet
in one rx_clk for example.

It can also be forced by timing requirements.
If the system clock is 100Mhz,
that's 10ns a tick, without exception.

There is top level pipelining
from module instances
and internal pipelining using
cases of variable/register values
inside the process/block.

For example, a serial interface
stats counter might have single
process/block instances like this:

-[serial/sync/hdlc]-[octet2packetbus]-[statsCounters]-[cpu bus]-

A synchronous process/block always provides
at least one level of pipeline on the output.

Internal state or counter variables/registers
can add more latency as needed A)by design
or B)to meet timing. With recent devices
I have found few requirements for type B
pipelining, but this is very dependent on
the design requirements.

For example, if I have access to serial
data and clock, a crc check is straightforward.
However if I must process a word per tick,
I have no choice but to use a FOR loop
to process multiple bits per clock.

As soon as you have a pipelined design, it's rather easy to
describe the behaviour of each pipeline stage as an HDL
clocked process (or, indeed, as part of a process that
describes multiple stages) but as soon as that happens
you tend to lose sight of the overall algorithm that's
being implemented. Sometimes the design nicely
suits a description in which each pipeline stage stands
alone, but if there is any feedback from later pipeline
stages to earlier ones then it's usually much harder
to see what's going on.

I keep any such feedback inside the same process/block
even if this means a variable/register array declaration.

So, here's my question: When writing pipelined designs,
what do all you experts out there do to make the overall
data and control flow as clear and obvious as possible?

Good question.

The short answer is,
by using synchronous blocks and single
cycle control strobes at the module interfaces.
It's much simpler to design modules
to respond to a strobe (and maybe handshake it)
than it is to make some poor module
responsible for all cases of the full system timing.

The text books all say that
separating the data path is essential,
but I have never found any evidence
to support this assertion.
I like to let it all flow through
the same stream.

-- Mike Treseler

Kai Harrekilde-Petersen · Jun 6, 2006

Mike Treseler said:
The text books all say that
separating the data path is essential,
but I have never found any evidence
to support this assertion.
I like to let it all flow through
the same stream.

I have found that separating the datapath can tremendously help DC to
optimize the logic on the datapath - especially if you need to do
several almost identical operations on the datapath, depending on the
state.

In these and similar other cases I have found that creating a set of
flags in the control path, and then using the flags in the datapath to
determine how to manipulate the data, yields to superior synthesis
results.

Kai

Andy · Jun 6, 2006

I think that may be more of a limitation of DC than anything else. At
least for FPGA synthesis, Synplicity does not seem to mind combining
control and dataflow logic. I quit using DC (or FC2) a long time ago
because Synplicity was soooo much better, both in vhdl language
support, and in QOR. Judging from their simulator, which I still have
to use from time to time, synopsys still crashes on '93 standard
features that others gobble up with no problem, or at least they give
you an error report you can chase.

Andy

Andy · Jun 6, 2006

In clocked vhdl processes, every assignment from one _signal_ to
another is a clock cycle (a register or pipeline stage). This is
completely different from how software behaves.

Using variables instead of signals, you write the process the way you
would in software, and order references relative to assignments to
create clock delays (register/pipeline stages).

Some people like the descriptions using signals better, some like the
variable descriptions better. I like the flexibility of
moving/adding/deleting registers by moving variable assignments
relative to references in the process.

Another approach is to use pipelining and retiming features of your
synthesis tool. You may be able to describe the process all in one
cycle, and then delay the outputs by several clocks (through
registers), then let the synthesis tool redistribute registers
according to timing constraints. Synthesis tools have their
limitations here though... And of course, this has problems when
handling feedback.

Andy

Mike Treseler · Jun 6, 2006

Andy said:
I think that may be more of a limitation of DC than anything else. At
least for FPGA synthesis, Synplicity does not seem to mind combining
control and dataflow logic.

I agree, and would add Quartus, ISE, Leonardo, Modelsim, and NC-Sim
to the list of tools proven useful for VHDL'93 designs.

If I had to use DC, I would code in verilog instead of VHDL.

-- Mike Treseler

Aditya Ramachandran · Jun 7, 2006

For pipelined logic where it's not clear what each stage should do
exactly, I find it
easier to code the logic first, add multiple pipelined registers at the
end of the logic
and then synthesize using balance_registers in Design-Compiler.

Ex: AND AND AND FLOP FLOP FLOP
becomes
AND FLOP AND FLOP AND FLOP
after synthesis using balance_registers

Aditya

Ben Jones · Jun 7, 2006

So, here's my question: When writing pipelined designs,
what do all you experts out there do to make the overall
data and control flow as clear and obvious as possible?

Comments.

Lots and lots and lots of comments. Oh, and a diagram.

-Ben-

Marcus Harnisch · Jun 7, 2006

Jonathan,

I've successfully used register balancing in Synopsys DC since about
eight years ago. In order to notify the other end about when there's
work to be done, it is often a good idea to pass a synchronization
signal (e.g. data valid, deasserted reset) through the pipeline as
well.

Don't forget your post-synthesis verification though (gate-level or
formal). We never completely trust the tools, right?

Regards,
Marcus

KJ · Jun 7, 2006

Mostly, though, we need to describe things that are pipelined.
Sometimes that pipelining is from choice,

Not sure I can think of any "from choice" examples except for places where..
- It doesn't matter if the signal is combinatorial or delayed by a clock
cycle.
- and the cleanest from for writing the logic (in VHDL) is using a
statement only available inside a process (i.e. a case or if)
- There would be more than a couple signals in the sensitivity list
In that situation I would choose a clocked process over a process with the
laundry list of signals in the sensitivity list of which I'll invariably
forget at least one.

sometimes it's
forced upon us by the behaviour of things outside our
control (such as pipelined synchronous RAMs in an FPGA).

Dang those pesky constraints anyway.

As soon as you have a pipelined design, it's rather easy to
describe the behaviour of each pipeline stage as an HDL
clocked process (or, indeed, as part of a process that
describes multiple stages) but as soon as that happens
you tend to lose sight of the overall algorithm that's
being implemented.

That's the point where I would go back and rethink how I've partitioned the
design and ponder a bit on...
- Is the algorithm itself really what needs to be implemented or is there a
different algorithm that accomplishes the same/similar goals that might be
more ameanable to implementation since I've wrapped myself around the axle
on this one. If not, then move on to the following point.
- Rethink the partitioning of the design. Sometimes my first guess at how
things should be partitioned turns out to be rather clumsy and now after
having "lost sight of the overall algorithm that's being implemented" is a
good time to go back and redraw the boundary lines.

As for the boundary lines themselves, I'm generally talking about at the
VHDL entity level. Any decently complex algorithm that needs to be
pipelined probably is composed of some form of cascaded blocks. Each
cascaded block will have a clear definition of what it is trying to
accomplish. This pretty much then defines what the I/O (in terms of
algorithm information flow) is. Based on that choose an appropriate set of
control/status signals to move that information in and out of the blocks.
For that, of late I've been using Altera's Avalon bus specification as a
model. I looked at opencore's wishbone spec as well and wasn't terribly
impressed but Avalon seems to have an interface definition that scales
really well (like not just for the top level, but can go all the way down to
'simple blocks' without any appreciable 'overhead' in terms of wasted
logic). By that I mean that not only can I use it for the top level of the
algorithm implementation's I/O but it can also be used for interconnecting
those cascaded blocks. Not sales pitching Altera, I'm sure Xilinx, Actel et
al all probably have some equivalent as well but over the last 5 years I've
pretty much been all Altera. The SOPC Builder tool sucks and I no longer
use it for real design, but the Avalon specification itself is good.

In any case, I've found that having 'some' block I/O interface signal
specification instead of your own "well thought out, but still kinda in your
head but it works for me and it's so clear that I'm sure you'll get it too"
version is a key to not getting lost in your pipelining (second only to
having the individual sub-blocks implementing the correct
functionality...i.e. drawing the right boundaries in the first place).
Since these are 'sub-blocks' I'll tend to generalize the data signals to fit
the true need. For example, Avalon data are all std_logic_vectors but I'll
change that to be a VHDL record so that the interface between blocks is of
the appropriate type for that interface. At the top level of the algorithm
implementation you're generally constrained in what you can use but the
internal block to block interfaces generally don't have that constraint.

Once inside a particular block, if I'm finding myself "losing sight of the
overall algorithm within the local space" I'll generally follow the same
steps and re-factor. Maybe that means that this particular block should be
decomposed into a parent/child structure or maybe it needs to be split into
two cascaed 'siblings'.

Sometimes the design nicely
suits a description in which each pipeline stage stands
alone, but if there is any feedback from later pipeline
stages to earlier ones then it's usually much harder
to see what's going on.

'Most' of the time in the past, I've found that this feedback is usually
something of the form 'slow down I can't take the data so quickly' or 'OK,
I'm ready to accept data'. That feedback needs to get from the data
consumer back to whatever it is that is ultimately sourcing the data. This
particular type of feedback though is exactly the data flow control that
specifications like Avalon are designed to handle so if you've designed each
sub block to that interface than the flow control type of feedback will take
care of itself. I'm pondering what other types of feedback there might be
to feed from a later to an earlier stage, but I guess it's too early in the
morning.

So, here's my question: When writing pipelined designs,
what do all you experts out there do to make the overall
data and control flow as clear and obvious as possible?

1. Partition entities into clearly describable functions and don't be afraid
to go back and re-partition them into different clearly describable
functions if you get wrapped around the axle.
2. Choose an I/O interface model specification (Avalon, wishbone, etc.) and
use it not just for the top block but for sub-blocks as well. Since you'd
like to use this I/O model all the way from the top to bottom in your design
don't pick something that carries a lot of baggage with it that causes you
to abandon it. An outlandish example, would be choosing PCI as your model.
While great for connecting 'big' things, you probably wouldn't want to
outfit each entity with a PCI interface. Look for something that scales
well DOWNWARD (i.e. not logic wasteful), so you're not forced to abandon it
because of the overhead.
3. Re-factor an entity into a parent/child or sibling/sibling pair of
entities when you find yourself getting 'lost'.

Thanks in advance

Thanks for the soapbox

Kevin Jennings

KJ · Jun 7, 2006

Some people like the descriptions using signals better, some like the
variable descriptions better. I like the flexibility of
moving/adding/deleting registers by moving variable assignments
relative to references in the process.

Gee, that's the one thing I don't like that moving the order around changes
everything....except when I'm writing non-synthesizable test bench
code....I'm a waffler.

Another approach is to use pipelining and retiming features of your
synthesis tool. You may be able to describe the process all in one
cycle, and then delay the outputs by several clocks (through
registers), then let the synthesis tool redistribute registers
according to timing constraints. Synthesis tools have their
limitations here though... And of course, this has problems when
handling feedback.

I've never had much luck with re-timing....but it probably has more to do
with me maybe not really quite understanding something.

Kevin Jennings

KJ · Jun 7, 2006

So, here's my question: When writing pipelined designs,

Comments.

Lots and lots and lots of comments. Oh, and a diagram.

And if you go that route....keeping them up to date and accurate is the main
issue since many times they are anything but. Comments are good, but since
they are completely unsynthesized you really need to have a methodology that
produces clean 'live' code.

Kevin Jennings

Andy · Jun 7, 2006

What DC calls balancing registers, Synplicity calls retiming.
Synplicity has limits as to how many registers away it will move
original logic.

Andy

Jonathan Bromley · Jun 7, 2006

[Andy Jones]
I'm inclined to agree, from experience, but there's a big part of me
that wants to go with Mike Treseler's "do everything in one clocked
process" plan. The latter gives you (at best) almost software-like
clarity, at the expense of requiring the synthesis tool to chase
around a large piece of code trying to find widely-scattered
opportunities for resource sharing. Over the years I've
become reasonably skilled at second-guessing what styles
a synthesis tool will optimise well, but that's not a very
reliable way to proceed!

[Kai Harrekilde-Petersen]

I think that may be more of a limitation of DC than anything else. At
least for FPGA synthesis, Synplicity does not seem to mind combining
control and dataflow logic. I quit using DC (or FC2) a long time ago
because Synplicity was soooo much better, both in vhdl language
support, and in QOR.

In fairness I think we should take FC2 out of this equation - it's
not actively promoted now, I think, to be replaced by DC-FPGA.
My experience is rather different: DC seems to be astonishingly
good at finding optimisation opportunities, given the right care
and feeding. I do agree, though, that for the most part the
FPGA-oriented tools are way ahead of the ASIC-oriented tools
in language feature support. The obvious major exception is
support for synthesisable SystemVerilog, but that's another
discussion entirely...
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
(e-mail address removed)
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Mike Treseler · Jun 7, 2006

KJ said:
Gee, that's the one thing I don't like that moving the order around changes
everything....except when I'm writing non-synthesizable test bench
code....

I reduce this sort of confusion by using
variables only as registers -- always
use before update. The easiest way to
do this is to combine usage and update
on one line (reg:=f(reg)

whenever possible:

reg := reg + 1;
reg := shift_left(reg);

I like the fact that all the register
values are logically in phase with the
input ports and that tracing code in simulation
becomes a useful debugging tool.

-- Mike Treseler

Mike Treseler · Jun 7, 2006

Jonathan said:
[Andy Jones]

[ make that Kai ]

I'm inclined to agree, from experience, but there's a big part of me
that wants to go with Mike Treseler's "do everything in one clocked
process" plan. The latter gives you (at best) almost software-like
clarity, at the expense of requiring the synthesis tool to chase
around a large piece of code trying to find widely-scattered
opportunities for resource sharing. Over the years I've
become reasonably skilled at second-guessing what styles
a synthesis tool will optimise well, but that's not a very
reliable way to proceed!

Yes. The trend in synthesis is better and cheaper.
The trend in devices is bigger, faster and cheaper.
The value of an isolated optimization is on the wane.
The value of a clean logic description that simulates quickly is waxing.

[Kai Harrekilde-Petersen] [ make that Andy ]

I think that may be more of a limitation of DC than anything else. At
least for FPGA synthesis, Synplicity does not seem to mind combining
control and dataflow logic. I quit using DC (or FC2) a long time ago
because Synplicity was soooo much better, both in vhdl language
support, and in QOR.

Click to expand...

In fairness I think we should take FC2 out of this equation - it's
not actively promoted now, I think, to be replaced by DC-FPGA.
My experience is rather different: DC seems to be astonishingly
good at finding optimisation opportunities, given the right care
and feeding. I do agree, though, that for the most part the
FPGA-oriented tools are way ahead of the ASIC-oriented tools
in language feature support. The obvious major exception is
support for synthesisable SystemVerilog, but that's another
discussion entirely...

I am skeptical about object-oriented descriptions
taking off while even the use of variables
is still so misunderstood. In the meantime, some
of the old tools in the shed are cleaned up and
working pretty well.

-- Mike Treseler

Dave Higton · Jun 7, 2006

In message <[email protected]>

Ben Jones said:
Comments.

Lots and lots and lots of comments. Oh, and a diagram.

I have one concern about any diagrams: they are always (IME) in a
proprietary format, so you're locked in to a set of tools. One of
VHDL's strengths is that it's text only, so you can freely move
your source around all the tools you like, from the editor up.

Dave

Jason Zheng · Jun 7, 2006

Dave said:
In message <[email protected]>

I have one concern about any diagrams: they are always (IME) in a
proprietary format, so you're locked in to a set of tools. One of
VHDL's strengths is that it's text only, so you can freely move
your source around all the tools you like, from the editor up.

Dave

You are only going to have a format lock-in problem if you lock yourself
in with a proprietary software (such as M$ Visio) in the first place, as
there are plenty of platform-independent and text-based open formats
available. With the help of our good friend imagemagick, you can convert
just about any graphics format to encapsulated postscript (eps). You can
easily include that in a latex documentation file. Not to mention both
imagemagick and latex are free and open source ;-). If you prefer
openoffice, you can save the vector graphics in an xml format, which is
equally portable.

I am, however, against doing documentation too early in development. As
requirements often changes, you will find yourself with lots of
additional work to do just to keep the documentation up-to-date. Do a
minimal amount of documentation at the early stages of the development,
then when the product matures, spend more time documenting the internals.

Kim Enkovaara · Jun 8, 2006

Dave said:
I have one concern about any diagrams: they are always (IME) in a
proprietary format, so you're locked in to a set of tools. One of
VHDL's strengths is that it's text only, so you can freely move
your source around all the tools you like, from the editor up.

The problem with pure VHDL is that the pipeline functionality is usually
really hard to extract from the code afterwards by another designer. When
you have long pipelines and the different pipeline stages fight access
to shared resources (memories, multipliers) then tracking that
whole mess becomes really hard.

At least in my opinion just drawing a spreadsheet where clock cycles is
one axis and the dependencies is another, and writing text to each cell
what process or functionality is accessing dependency on that cycle
makes understanding easier. Also color coding makes it easier to follow
the flow of data in the pipeline. For example if many data "packets"
are at the same time in pipeline, they can be color coded to visualize
how the data is interleaved. Eye catches easily places where the colors
are wrong way around, and hazards in the pipeline are a possibility.

That spreadsheet is also a good tool if pipeline structure must be changed.
It's easy to see all the dependencies and they are not forgotten during
the change. I have seen few times where during update of pipeline sequencing
of one small place was forgotten to update and the bug was hidden for a long
time.

--Kim

Ben Jones · Jun 8, 2006

I have one concern about any diagrams: they are always (IME) in a
proprietary format, so you're locked in to a set of tools. One of
VHDL's strengths is that it's text only, so you can freely move
your source around all the tools you like, from the editor up.

I draw my diagrams as far as possible in ASCII, so they can stay attached to
the corresponding piece of code.

For more complex circuits I'll use a bitmap graphics editor and save in a
standard format (done carefully, this doesn't lead to as many maintenance
problems as you might think). I don't like any of the vector drawing
packages I've tried recently which has led me to start thinking about doing
something with SVG...

For timing diagrams I have a neat web-based tool I wrote myself (based on an
idea I stole shamelessly from Frank Vorstenbosch). It's not actually on the
public Internet anywhere, although every time someone posts on one of these
groups asking for a free timing diagram authoring tool I have an urge to
post it somewhere! The beauty of this tool is that the source is ASCII text,
and is fairly readable (and writeable) in its own right, so you're not
forever fretting about making lines join up and getting spacing and angles
right.

Cheers,

-Ben-

Pipelining a large mathematical equation	6	Oct 12, 2012
question on timing in synthesizable vhdl	5	Sep 12, 2005
Back to the future	19	Jul 10, 2009
Identity-conversion of the clock signal	21	Jan 20, 2010
Seeking advice regarding asynchronous (responseless?) HTTP	1	Jun 9, 2009
ANN: Version 0.1.2 of sarge (a subprocess wrapper library) has beenreleased.	0	Dec 17, 2013
VHDL - processes, race conditions, & Verilog	7	Apr 1, 2005
BioRuby & Google Summer of Code 2011	0	Mar 25, 2011

Describing pipelined hardware

Jonathan Bromley

Mike Treseler

Kai Harrekilde-Petersen

Andy

Andy

Mike Treseler

Aditya Ramachandran

Ben Jones

Marcus Harnisch

KJ

KJ

KJ

Andy

Jonathan Bromley

Mike Treseler

Mike Treseler

Dave Higton

Jason Zheng

Kim Enkovaara

Ben Jones

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads