pipeline

A

Amit

hello group,

can sombody please tell me what makes a ciruit operate faster using
pipeline? let's say there is a full adder (normal) and another full
adder which is implemneted base on pipeline. or at least could give me
some links to follow on this.

thanks.
amit
 
R

Ralf Hildebrandt

Amit said:
can sombody please tell me what makes a ciruit operate faster using
pipeline? let's say there is a full adder (normal) and another full
adder which is implemneted base on pipeline. or at least could give me
some links to follow on this.

Ok, let's assume the operation a+b+c wich can be implemented as
s1=a+b;
s2=s1+c;

Now we have a combinational path from a over s1 to s2. The same holds
for b. Let us assume, that this path is too slow for your clock. This means:

process(clk)
variable s1 unsigned(a'high+1 downto a'low);
if rising_edge(clk) then
s1:=a+b;
s2<=s1+c;
-- we would get the same result if we would write:
-- s2<=(a+b)+c;
end if;
end process;

Now we can break this path into two pieces - we create a pipeline:

process(clk)
if rising_edge(clk) then
s1<=a+b; -- s1 is a signal
s2<=s1+c; -- note that the OLD value of s1 is used!
-- both s1 and s2 will be implemented as flipflops
end if;
end process;

Now we got two combinational paths from a,b to s1 and from s1,c to s2.
Both paths are approximately half as long as the initial path from a,b
to s2.
This results in two facts:
1) one single operation takes two clocks (the latency)
2) if this operation can work continuously we get one result with every
single clock!
=> We can continuously add operands with the fast clock, but each single
operation has a latency of 2 clocks. If we have a continuous data
stream, then we can process it with the fast clock. Then usually the
latency is not a problem.

Ralf
 
A

Amit

Ok, let's assume the operation a+b+c wich can be implemented as
s1=a+b;
s2=s1+c;

Now we have a combinational path from a over s1 to s2. The same holds
for b. Let us assume, that this path is too slow for your clock. This means:

process(clk)
variable   s1   unsigned(a'high+1 downto a'low);
if rising_edge(clk) then
        s1:=a+b;
        s2<=s1+c;
        -- we would get the same result if we would write:
        -- s2<=(a+b)+c;
end if;
end process;

Now we can break this path into two pieces - we create a pipeline:

process(clk)
if rising_edge(clk) then
        s1<=a+b;  -- s1 is a signal
        s2<=s1+c; -- note that the OLD value of s1 is used!
        -- both s1 and s2 will be implemented as flipflops
end if;
end process;

Now we got two combinational paths from a,b to s1 and from s1,c to s2.
Both paths are approximately half as long as the initial path from a,b
to s2.
This results in two facts:
1) one single operation takes two clocks (the latency)
2) if this operation can work continuously we get one result with every
   single clock!
=> We can continuously add operands with the fast clock, but each single
operation has a latency of 2 clocks. If we have a continuous data
stream, then we can process it with the fast clock. Then usually the
latency is not a problem.

Ralf


Hi Ralf,

thanks for the explanation so now i have a rca (full-adder which made
up of two half-adder) and want to implement it using pipeline method 4-
bit. this is what i'm doing:

i will need 5 levels of flip flops such that first level will have 9 f/
f and then second level will have 8 and so on or:

1-level 9 flip flops (y3 x3 y2 x2 y1 x1 y0 x0 c_in)
2-level 8 flip flops (y3 x3 y2 x2 y1 x1 c_out0 sum0)
3-level 7 flip flops (y3 x3 y2 x2 c_out1 sum1 sum0)
4-level 6 flip flops (y3 x3 c_out2 sum2 sum1 sum0)
5-level 5 flip flops (c_out3 sum3 sum2 sum1 sum0)

is that right?

please advise me on this.

thanks,
amt
 
A

Amit

And for completeness, to illustrate you can use registers in a
pipeline...

process(clk)
variable   s1:   unsigned(a'high+1 downto a'low);
if rising_edge(clk) then
        s2<=s1+c;
        -- s1 still contains the OLD value
        s1:=a+b;
end if;
end process;

... but you must describe the pipeline backwards.
This is one place where use of signals in a process really does help
clarity.

- Brian


hi Brian,

thanks for your response. as far as i know when I use those above 5
processes they will be synthesis as 5 flip flops. I'm not sure what
you meant by registers. Do I need to add registers? if yes, what
should I do?
also, I'm not sure what you meant by backwards.

thanks,
amit
 
A

Amit

hi Brian,

thanks for your response. as far as i know when I use those above 5
processes they will be synthesis as 5 flip flops. I'm not sure what
you meant by registers. Do I need to add registers? if yes, what
should I do?
also, I'm not sure what you meant by backwards.

thanks,
amit


hi again,

i was looking at http://www.cs.umbc.edu/help/VHDL/samples/samples.shtml#stall
and it seems it is not only flip flops but need to add registers.
Right? does this mean I have to add registers to my design?

any online source that I can learn about this?

thanks!
 
R

Ralf Hildebrandt

Amit schrieb:

thanks for your response. as far as i know when I use those above 5
processes they will be synthesis as 5 flip flops. I'm not sure what
you meant by registers.

Registers are storage elements. In most cases this means "flipflops",
sometimes "latches" and in some very special case it can be ROM / ROM
whatever.

Ralf
 
R

Ralf Hildebrandt

Amit said:
thanks for the explanation so now i have a rca (full-adder which made
up of two half-adder) and want to implement it using pipeline method 4-
bit. this is what i'm doing:

It this some "homework" from the university or do you want to model
something for a real chip? For the second use the "+" Operator. It
synthesizes fine to an error-free and quite optimal result.

For special hand-made adder designs I suggest the pen and paper method.
Paint every full- or halfadder and for pipelining every flipflop you
want to use in your design.

Ralf
 
R

Ralf Hildebrandt

Amit said:
i will need 5 levels of flip flops such that first level will have 9 f/
f and then second level will have 8 and so on or:

1-level 9 flip flops (y3 x3 y2 x2 y1 x1 y0 x0 c_in)
2-level 8 flip flops (y3 x3 y2 x2 y1 x1 c_out0 sum0)
3-level 7 flip flops (y3 x3 y2 x2 c_out1 sum1 sum0)
4-level 6 flip flops (y3 x3 c_out2 sum2 sum1 sum0)
5-level 5 flip flops (c_out3 sum3 sum2 sum1 sum0)

is that right?


Let me add: Pipelining only makes sense, if your combinational circuit
is too slow for your clock. Otherwise it is a big waste of area
(additional flipflops), power (a lot of switching activity in the
flipflops) and an unnessecary introduction of latency.

So in most cases it only makes sense to split a combinational block into
two or three parts - not more.

Ralf
 
A

Amit

Let me add: Pipelining only makes sense, if your combinational circuit
is too slow for your clock. Otherwise it is a big waste of area
(additional flipflops), power (a lot of switching activity in the
flipflops) and an unnessecary introduction of latency.

So in most cases it only makes sense to split a combinational block into
two or three parts - not more.

Ralf


hello group,

thanks! yes finally i got the point of what you had explained the
other day. my adder is working fine.

thanks again
amit
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top