Pipelining tutorial wanted

A

Andrea Campi

Hi,

I'm teaching myself some VHDL; I've already worked my way through
what I expect to be the most common entry-level projects (decoders,
counters etc). I also grabbed a Pegasus (Spartan II) board, just
to make things a bit more interesting. Now I'd like to go a few
steps farther.

One of the topics I'm more interested in is pipelining. I'd like
to understand better the tradoffs involved in that, techniques,
design choices... you get the idea.

Is there any material out there you would suggest? Websites would
be best, books are ok. Googling didn't suggest much (suitable for
a begginer). Keep in mind that until now I've mostly worked on
the Cookbook and examples grabbed off the web, and mostly a great
deal of experimenting, so buying one or maximum two books would
not be an unwelcomed suggestion--as long as you can agree on
which books would be better *grin*

In particular, I was pondering over this:

Mehdi R. Zargham
Computer Architecture: Single and Parallel Systems


Thanks in advance.

Bye,
Andrea
 
M

Mike Treseler

Andrea said:
One of the topics I'm more interested in is pipelining. I'd like
to understand better the tradoffs involved in that, techniques,
design choices... you get the idea.

Yes, it's very simple in concept.
Pipelining allows me to fix up a complex logic function
that fails Fmax in static timing. Note that
pipelining does not speed up the complex function.
It just allows it to work with a faster
system clock than it could tolerate otherwise.
The upside goes to other, less complex functions.

Let's say that a 200MHz clock passes Fmax
everywhere except for this 32:1 gate
made from 4 input, 2nS logic cells:

____64 input unpipelined gate________
one tick through
64 16 4 1
[DQ]==[LC]
[DQ]==[LC]
[DQ]==[LC]
[DQ]==[LC]
[DQ]==[LC]
[DQ]==[LC]
[DQ]==[LC]=[LC]\
[DQ]==[LC]=[LC]-=[LC]--[DQ]
[DQ]==[LC]=[LC]/
[DQ]==[LC]=[LC]
[DQ]==[LC]
[DQ]==[LC]
[DQ]==[LC]
[DQ]==[LC]
[DQ]==[LC]
[DQ]==[LC]
|<----6nS---->|
Fmax~166 MHz

I insert registers somewhere in the middle
of the logic cells like this:

____64 input pipelined gate__________
two ticks through
64 16 4 1
[DQ]==[LC]
[DQ]==[LC]
[DQ]==[LC]
[DQ]==[LC]
[DQ]==[LC]
[DQ]==[LC]\
[DQ]==[LC]=[LC]\
[DQ]==[LC]=[LC]-=[DQ]--[LC]--[DQ]
[DQ]==[LC]=[LC]/ ^pipe
[DQ]==[LC]=[LC]
[DQ]==[LC]/
[DQ]==[LC]
[DQ]==[LC]
[DQ]==[LC]
[DQ]==[LC]
[DQ]==[LC]
<--4nS---> <2nS>
Fmax~250 MHz


Note that pipelining has no upside if
static timing is ok, so I let synthesis
have a go at it first. If pipelining
is indicated, I can turn on options
to balance the LCs, but I do have
to add a variable or signal to infer
the extra registers.

-- Mike Treseler
 
H

Hal Murray

One of the topics I'm more interested in is pipelining. I'd like
to understand better the tradoffs involved in that, techniques,
design choices... you get the idea.

One good example is some modern RISC CPUs. I've seen at least
one good writeup, but don't remember where.

Another example that used to be much more interesting before
FPGAs added dedicated carry logic is carry-save adders. google
gets many hits. The general idea is that you trade off number
of cycles to get the answer for a faster cycle time. Suppose
you want a 32 bit adder/counter to run at X ns, but your carry
chain only goes 10 bits in that time. Put a FF at 10, 20, and 30
bits. Or maybe 8, 16, and 24 because you need some setup time for
the FF. On the first cycle, the bottom 8 bits are accurate, but
you might have a carry trapped in the first FF. On the next cycle,
the next 8 bits get updated, but maybe the second carry FF gets set.
....
Of course, this is only interesting if you can wait a few cycles
to get the answer. If you are doing something like counting to
N to make a delay of N cycles, you can preload the counter with N-few
to correct for the pipeline delays.
 
M

Mark Bottomley

Hal Murray said:
One good example is some modern RISC CPUs. I've seen at least
one good writeup, but don't remember where.

Another example that used to be much more interesting before
FPGAs added dedicated carry logic is carry-save adders. google
gets many hits. The general idea is that you trade off number
of cycles to get the answer for a faster cycle time. Suppose
you want a 32 bit adder/counter to run at X ns, but your carry
chain only goes 10 bits in that time. Put a FF at 10, 20, and 30
bits. Or maybe 8, 16, and 24 because you need some setup time for
the FF. On the first cycle, the bottom 8 bits are accurate, but
you might have a carry trapped in the first FF. On the next cycle,
the next 8 bits get updated, but maybe the second carry FF gets set.
...
Of course, this is only interesting if you can wait a few cycles
to get the answer. If you are doing something like counting to
N to make a delay of N cycles, you can preload the counter with N-few
to correct for the pipeline delays.

--
The suespammers.org mail server is located in California. So are all my
other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited
commercial e-mail to my suespammers.org address or any of my other addresses.
These are my opinions, not necessarily my employer's. I hate spam.

The industry standard for this sort of description is the Hennessey and
Patterson
book Computer Organization and Design, The Hardware/Software Interface
(not to be confused with the Patterson and Hennessey book Computer
Architecture, A Quantitative Approach) I would highly recommend both,
although the first one deals specifically with pipelining in a CPU with data
forwarding and stall hazards and other pipelining considerations.

Mark...
Computer Organization and Design
The Hardware/Software Interface
 
A

Andrea Campi

Hi Mark,

The industry standard for this sort of description is the Hennessey and
Patterson
book Computer Organization and Design, The Hardware/Software Interface
(not to be confused with the Patterson and Hennessey book Computer
Architecture, A Quantitative Approach) I would highly recommend both,
although the first one deals specifically with pipelining in a CPU with data

I had a look at the book on Amazon. It looks good, but I'm afraid it could
a little too much for me. Actually, I'd like to start out with something
simpler than a modern CPU. I gather pipelining can (and is) successfully
applied in all kind of networks, like in Mike's example.

Do you know of any more generic and more basic text that takes a more
general approach--and discusses implications for VHDL in particular?

TIA, bye,
Andrea
 
M

mike_treseler

The only thing that a pipeline implies for VHDL
is the requirement to add a few lines of code to
infer the pipeline registers if static timing proves
that they are needed. I see this as more of a "kludge"
to meet timing than a subject for a text book.

Consider getting a simulator and learning how
to use the full VHDL language for simulation.
Then you can try out whatever you like without
having to find a book.

-- Mike Treseler
 
N

Neo

Hi there,
I think the best way to get the hang of pipelining is to design a
simple 4(or more depending on your comfort) tap fir digital filter.
 
A

Andrea Campi

Hi there,
I think the best way to get the hang of pipelining is to design a
simple 4(or more depending on your comfort) tap fir digital filter.

Thanks Neo... I had a quick look on google and from the material I
found it looks this is a great idea indeed!

I'll start there and come back here in case I need any more help.

Bye,
Andrea
 
R

rickman

mike_treseler said:
The only thing that a pipeline implies for VHDL
is the requirement to add a few lines of code to
infer the pipeline registers if static timing proves
that they are needed. I see this as more of a "kludge"
to meet timing than a subject for a text book.

Consider getting a simulator and learning how
to use the full VHDL language for simulation.
Then you can try out whatever you like without
having to find a book.

Pipelining is far from a "kludge". It is also a lot more than just
adding registers. If your design has any feedback, the pipelining makes
that *much* more complex and this is where the textbook aspect comes
in. There is a great deal written about how to design pipelined logic
and most of it is not easy to reinvent.

--

Rick "rickman" Collins

(e-mail address removed)
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX
 
M

Mike Treseler

rickman said:
Pipelining is far from a "kludge". It is also a lot more than just
adding registers. If your design has any feedback, the pipelining makes
that *much* more complex and this is where the textbook aspect comes
in. There is a great deal written about how to design pipelined logic
and most of it is not easy to reinvent.

I suppose it does depend on the application.
For digital signal processing, you may have
to step up a level from the vhdl process
for a logic description.

-- Mike Treseler
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top