pipeline

Amit · Sep 26, 2008

hello group,

can sombody please tell me what makes a ciruit operate faster using
pipeline? let's say there is a full adder (normal) and another full
adder which is implemneted base on pipeline. or at least could give me
some links to follow on this.

thanks.
amit

Ralf Hildebrandt · Sep 27, 2008

Amit said:
can sombody please tell me what makes a ciruit operate faster using
pipeline? let's say there is a full adder (normal) and another full
adder which is implemneted base on pipeline. or at least could give me
some links to follow on this.

Ok, let's assume the operation a+b+c wich can be implemented as
s1=a+b;
s2=s1+c;

Now we have a combinational path from a over s1 to s2. The same holds
for b. Let us assume, that this path is too slow for your clock. This means:

process(clk)
variable s1 unsigned(a'high+1 downto a'low);
if rising_edge(clk) then
s1:=a+b;
s2<=s1+c;
-- we would get the same result if we would write:
-- s2<=(a+b)+c;
end if;
end process;

Now we can break this path into two pieces - we create a pipeline:

process(clk)
if rising_edge(clk) then
s1<=a+b; -- s1 is a signal
s2<=s1+c; -- note that the OLD value of s1 is used!
-- both s1 and s2 will be implemented as flipflops
end if;
end process;

Now we got two combinational paths from a,b to s1 and from s1,c to s2.
Both paths are approximately half as long as the initial path from a,b
to s2.
This results in two facts:
1) one single operation takes two clocks (the latency)
2) if this operation can work continuously we get one result with every
single clock!
=> We can continuously add operands with the fast clock, but each single
operation has a latency of 2 clocks. If we have a continuous data
stream, then we can process it with the fast clock. Then usually the
latency is not a problem.

Ralf

Amit · Sep 28, 2008

Ok, let's assume the operation a+b+c wich can be implemented as
s1=a+b;
s2=s1+c;

Now we have a combinational path from a over s1 to s2. The same holds
for b. Let us assume, that this path is too slow for your clock. This means:

process(clk)
variable s1 unsigned(a'high+1 downto a'low);
if rising_edge(clk) then
s1:=a+b;
s2<=s1+c;
-- we would get the same result if we would write:
-- s2<=(a+b)+c;
end if;
end process;

Now we can break this path into two pieces - we create a pipeline:

process(clk)
if rising_edge(clk) then
s1<=a+b; -- s1 is a signal
s2<=s1+c; -- note that the OLD value of s1 is used!
-- both s1 and s2 will be implemented as flipflops
end if;
end process;

Now we got two combinational paths from a,b to s1 and from s1,c to s2.
Both paths are approximately half as long as the initial path from a,b
to s2.
This results in two facts:
1) one single operation takes two clocks (the latency)
2) if this operation can work continuously we get one result with every
single clock!
=> We can continuously add operands with the fast clock, but each single
operation has a latency of 2 clocks. If we have a continuous data
stream, then we can process it with the fast clock. Then usually the
latency is not a problem.

Ralf

Hi Ralf,

thanks for the explanation so now i have a rca (full-adder which made
up of two half-adder) and want to implement it using pipeline method 4-
bit. this is what i'm doing:

i will need 5 levels of flip flops such that first level will have 9 f/
f and then second level will have 8 and so on or:

1-level 9 flip flops (y3 x3 y2 x2 y1 x1 y0 x0 c_in)
2-level 8 flip flops (y3 x3 y2 x2 y1 x1 c_out0 sum0)
3-level 7 flip flops (y3 x3 y2 x2 c_out1 sum1 sum0)
4-level 6 flip flops (y3 x3 c_out2 sum2 sum1 sum0)
5-level 5 flip flops (c_out3 sum3 sum2 sum1 sum0)

is that right?

please advise me on this.

thanks,
amt

Amit · Sep 28, 2008

And for completeness, to illustrate you can use registers in a
pipeline...

process(clk)
variable s1: unsigned(a'high+1 downto a'low);
if rising_edge(clk) then
s2<=s1+c;
-- s1 still contains the OLD value
s1:=a+b;
end if;
end process;

... but you must describe the pipeline backwards.
This is one place where use of signals in a process really does help
clarity.

- Brian

hi Brian,

thanks for your response. as far as i know when I use those above 5
processes they will be synthesis as 5 flip flops. I'm not sure what
you meant by registers. Do I need to add registers? if yes, what
should I do?
also, I'm not sure what you meant by backwards.

thanks,
amit

Amit · Sep 28, 2008

hi Brian,

thanks for your response. as far as i know when I use those above 5
processes they will be synthesis as 5 flip flops. I'm not sure what
you meant by registers. Do I need to add registers? if yes, what
should I do?
also, I'm not sure what you meant by backwards.

thanks,
amit

hi again,

i was looking at http://www.cs.umbc.edu/help/VHDL/samples/samples.shtml#stall
and it seems it is not only flip flops but need to add registers.
Right? does this mean I have to add registers to my design?

any online source that I can learn about this?

thanks!

Ralf Hildebrandt · Sep 28, 2008

Amit schrieb:

thanks for your response. as far as i know when I use those above 5
processes they will be synthesis as 5 flip flops. I'm not sure what
you meant by registers.

Registers are storage elements. In most cases this means "flipflops",
sometimes "latches" and in some very special case it can be ROM / ROM
whatever.

Ralf

Ralf Hildebrandt · Sep 28, 2008

Amit said:
thanks for the explanation so now i have a rca (full-adder which made
up of two half-adder) and want to implement it using pipeline method 4-
bit. this is what i'm doing:

It this some "homework" from the university or do you want to model
something for a real chip? For the second use the "+" Operator. It
synthesizes fine to an error-free and quite optimal result.

For special hand-made adder designs I suggest the pen and paper method.
Paint every full- or halfadder and for pipelining every flipflop you
want to use in your design.

Ralf

Ralf Hildebrandt · Sep 28, 2008

Amit said:
i will need 5 levels of flip flops such that first level will have 9 f/
f and then second level will have 8 and so on or:

1-level 9 flip flops (y3 x3 y2 x2 y1 x1 y0 x0 c_in)
2-level 8 flip flops (y3 x3 y2 x2 y1 x1 c_out0 sum0)
3-level 7 flip flops (y3 x3 y2 x2 c_out1 sum1 sum0)
4-level 6 flip flops (y3 x3 c_out2 sum2 sum1 sum0)
5-level 5 flip flops (c_out3 sum3 sum2 sum1 sum0)

is that right?

Let me add: Pipelining only makes sense, if your combinational circuit
is too slow for your clock. Otherwise it is a big waste of area
(additional flipflops), power (a lot of switching activity in the
flipflops) and an unnessecary introduction of latency.

So in most cases it only makes sense to split a combinational block into
two or three parts - not more.

Ralf

Amit · Oct 1, 2008

Let me add: Pipelining only makes sense, if your combinational circuit
is too slow for your clock. Otherwise it is a big waste of area
(additional flipflops), power (a lot of switching activity in the
flipflops) and an unnessecary introduction of latency.

So in most cases it only makes sense to split a combinational block into
two or three parts - not more.

Ralf

hello group,

thanks! yes finally i got the point of what you had explained the
other day. my adder is working fine.

thanks again
amit

Problem with waveform and ...	2	Sep 17, 2007
Site Migration Ruined all my CSS??	8	Aug 1, 2023
full adder example using fpga	5	Dec 10, 2007
variable timing signal	7	Nov 1, 2007
Interfering CSS	1	Feb 9, 2024
I'm tempted to quit out of frustration	1	Aug 13, 2023
Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022
Identity-conversion of the clock signal	21	Jan 20, 2010

pipeline

Amit

Ralf Hildebrandt

Amit

Amit

Amit

Ralf Hildebrandt

Ralf Hildebrandt

Ralf Hildebrandt

Amit

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads