pipeline

Discussion in 'VHDL' started by Amit, Sep 26, 2008.

  1. Amit

    Amit Guest

    hello group,

    can sombody please tell me what makes a ciruit operate faster using
    pipeline? let's say there is a full adder (normal) and another full
    adder which is implemneted base on pipeline. or at least could give me
    some links to follow on this.

    thanks.
    amit
    Amit, Sep 26, 2008
    #1
    1. Advertising

  2. Amit schrieb:

    > can sombody please tell me what makes a ciruit operate faster using
    > pipeline? let's say there is a full adder (normal) and another full
    > adder which is implemneted base on pipeline. or at least could give me
    > some links to follow on this.


    Ok, let's assume the operation a+b+c wich can be implemented as
    s1=a+b;
    s2=s1+c;

    Now we have a combinational path from a over s1 to s2. The same holds
    for b. Let us assume, that this path is too slow for your clock. This means:

    process(clk)
    variable s1 unsigned(a'high+1 downto a'low);
    if rising_edge(clk) then
    s1:=a+b;
    s2<=s1+c;
    -- we would get the same result if we would write:
    -- s2<=(a+b)+c;
    end if;
    end process;

    Now we can break this path into two pieces - we create a pipeline:

    process(clk)
    if rising_edge(clk) then
    s1<=a+b; -- s1 is a signal
    s2<=s1+c; -- note that the OLD value of s1 is used!
    -- both s1 and s2 will be implemented as flipflops
    end if;
    end process;

    Now we got two combinational paths from a,b to s1 and from s1,c to s2.
    Both paths are approximately half as long as the initial path from a,b
    to s2.
    This results in two facts:
    1) one single operation takes two clocks (the latency)
    2) if this operation can work continuously we get one result with every
    single clock!
    => We can continuously add operands with the fast clock, but each single
    operation has a latency of 2 clocks. If we have a continuous data
    stream, then we can process it with the fast clock. Then usually the
    latency is not a problem.

    Ralf
    Ralf Hildebrandt, Sep 27, 2008
    #2
    1. Advertising

  3. Amit

    Amit Guest

    On Sep 26, 10:13 pm, Ralf Hildebrandt <> wrote:
    > Amit schrieb:
    >
    > > can sombody please tell me what makes a ciruit operate faster using
    > > pipeline? let's say there is a full adder (normal) and another full
    > > adder which is implemneted base on pipeline. or at least could give me
    > > some links to follow on this.

    >
    > Ok, let's assume the operation a+b+c wich can be implemented as
    > s1=a+b;
    > s2=s1+c;
    >
    > Now we have a combinational path from a over s1 to s2. The same holds
    > for b. Let us assume, that this path is too slow for your clock. This means:
    >
    > process(clk)
    > variable   s1   unsigned(a'high+1 downto a'low);
    > if rising_edge(clk) then
    >         s1:=a+b;
    >         s2<=s1+c;
    >         -- we would get the same result if we would write:
    >         -- s2<=(a+b)+c;
    > end if;
    > end process;
    >
    > Now we can break this path into two pieces - we create a pipeline:
    >
    > process(clk)
    > if rising_edge(clk) then
    >         s1<=a+b;  -- s1 is a signal
    >         s2<=s1+c; -- note that the OLD value of s1 is used!
    >         -- both s1 and s2 will be implemented as flipflops
    > end if;
    > end process;
    >
    > Now we got two combinational paths from a,b to s1 and from s1,c to s2.
    > Both paths are approximately half as long as the initial path from a,b
    > to s2.
    > This results in two facts:
    > 1) one single operation takes two clocks (the latency)
    > 2) if this operation can work continuously we get one result with every
    >    single clock!
    > => We can continuously add operands with the fast clock, but each single
    > operation has a latency of 2 clocks. If we have a continuous data
    > stream, then we can process it with the fast clock. Then usually the
    > latency is not a problem.
    >
    > Ralf



    Hi Ralf,

    thanks for the explanation so now i have a rca (full-adder which made
    up of two half-adder) and want to implement it using pipeline method 4-
    bit. this is what i'm doing:

    i will need 5 levels of flip flops such that first level will have 9 f/
    f and then second level will have 8 and so on or:

    1-level 9 flip flops (y3 x3 y2 x2 y1 x1 y0 x0 c_in)
    2-level 8 flip flops (y3 x3 y2 x2 y1 x1 c_out0 sum0)
    3-level 7 flip flops (y3 x3 y2 x2 c_out1 sum1 sum0)
    4-level 6 flip flops (y3 x3 c_out2 sum2 sum1 sum0)
    5-level 5 flip flops (c_out3 sum3 sum2 sum1 sum0)

    is that right?

    please advise me on this.

    thanks,
    amt
    Amit, Sep 28, 2008
    #3
  4. Amit

    Amit Guest

    On Sep 27, 5:08 am, Brian Drummond <>
    wrote:
    > On Sat, 27 Sep 2008 07:13:36 +0200, Ralf Hildebrandt
    >
    >
    >
    > <> wrote:
    > >Amit schrieb:

    >
    > >> can sombody please tell me what makes a ciruit operate faster using
    > >> pipeline? let's say there is a full adder (normal) and another full
    > >> adder which is implemneted base on pipeline. or at least could give me
    > >> some links to follow on this.

    >
    > >Ok, let's assume the operation a+b+c wich can be implemented as
    > >s1=a+b;
    > >s2=s1+c;

    >
    > >Now we have a combinational path from a over s1 to s2. The same holds
    > >for b. Let us assume, that this path is too slow for your clock. This means:

    >
    > >process(clk)
    > >variable   s1   unsigned(a'high+1 downto a'low);
    > >if rising_edge(clk) then
    > >    s1:=a+b;
    > >    s2<=s1+c;
    > >    -- we would get the same result if we would write:
    > >    -- s2<=(a+b)+c;
    > >end if;
    > >end process;

    >
    > >Now we can break this path into two pieces - we create a pipeline:

    >
    > >process(clk)
    > >if rising_edge(clk) then
    > >    s1<=a+b;  -- s1 is a signal
    > >    s2<=s1+c; -- note that the OLD value of s1 is used!
    > >    -- both s1 and s2 will be implemented as flipflops
    > >end if;
    > >end process;

    >
    > And for completeness, to illustrate you can use registers in a
    > pipeline...
    >
    > process(clk)
    > variable   s1:   unsigned(a'high+1 downto a'low);
    > if rising_edge(clk) then
    >         s2<=s1+c;
    >         -- s1 still contains the OLD value
    >         s1:=a+b;
    > end if;
    > end process;
    >
    > ... but you must describe the pipeline backwards.
    > This is one place where use of signals in a process really does help
    > clarity.
    >
    > - Brian



    hi Brian,

    thanks for your response. as far as i know when I use those above 5
    processes they will be synthesis as 5 flip flops. I'm not sure what
    you meant by registers. Do I need to add registers? if yes, what
    should I do?
    also, I'm not sure what you meant by backwards.

    thanks,
    amit
    Amit, Sep 28, 2008
    #4
  5. Amit

    Amit Guest

    On Sep 27, 8:08 pm, Amit <> wrote:
    > On Sep 27, 5:08 am, Brian Drummond <>
    > wrote:
    >
    >
    >
    > > On Sat, 27 Sep 2008 07:13:36 +0200, Ralf Hildebrandt

    >
    > > <> wrote:
    > > >Amit schrieb:

    >
    > > >> can sombody please tell me what makes a ciruit operate faster using
    > > >> pipeline? let's say there is a full adder (normal) and another full
    > > >> adder which is implemneted base on pipeline. or at least could give me
    > > >> some links to follow on this.

    >
    > > >Ok, let's assume the operation a+b+c wich can be implemented as
    > > >s1=a+b;
    > > >s2=s1+c;

    >
    > > >Now we have a combinational path from a over s1 to s2. The same holds
    > > >for b. Let us assume, that this path is too slow for your clock. This means:

    >
    > > >process(clk)
    > > >variable   s1   unsigned(a'high+1 downto a'low);
    > > >if rising_edge(clk) then
    > > >    s1:=a+b;
    > > >    s2<=s1+c;
    > > >    -- we would get the same result if we would write:
    > > >    -- s2<=(a+b)+c;
    > > >end if;
    > > >end process;

    >
    > > >Now we can break this path into two pieces - we create a pipeline:

    >
    > > >process(clk)
    > > >if rising_edge(clk) then
    > > >    s1<=a+b;  -- s1 is a signal
    > > >    s2<=s1+c; -- note that the OLD value of s1 is used!
    > > >    -- both s1 and s2 will be implemented as flipflops
    > > >end if;
    > > >end process;

    >
    > > And for completeness, to illustrate you can use registers in a
    > > pipeline...

    >
    > > process(clk)
    > > variable   s1:   unsigned(a'high+1 downto a'low);
    > > if rising_edge(clk) then
    > >         s2<=s1+c;
    > >         -- s1 still contains the OLD value
    > >         s1:=a+b;
    > > end if;
    > > end process;

    >
    > > ... but you must describe the pipeline backwards.
    > > This is one place where use of signals in a process really does help
    > > clarity.

    >
    > > - Brian

    >
    > hi Brian,
    >
    > thanks for your response. as far as i know when I use those above 5
    > processes they will be synthesis as 5 flip flops. I'm not sure what
    > you meant by registers. Do I need to add registers? if yes, what
    > should I do?
    > also, I'm not sure what you meant by backwards.
    >
    > thanks,
    > amit



    hi again,

    i was looking at http://www.cs.umbc.edu/help/VHDL/samples/samples.shtml#stall
    and it seems it is not only flip flops but need to add registers.
    Right? does this mean I have to add registers to my design?

    any online source that I can learn about this?

    thanks!
    Amit, Sep 28, 2008
    #5
  6. Amit schrieb:


    > thanks for your response. as far as i know when I use those above 5
    > processes they will be synthesis as 5 flip flops. I'm not sure what
    > you meant by registers.


    Registers are storage elements. In most cases this means "flipflops",
    sometimes "latches" and in some very special case it can be ROM / ROM
    whatever.

    Ralf
    Ralf Hildebrandt, Sep 28, 2008
    #6
  7. Amit schrieb:

    > thanks for the explanation so now i have a rca (full-adder which made
    > up of two half-adder) and want to implement it using pipeline method 4-
    > bit. this is what i'm doing:


    It this some "homework" from the university or do you want to model
    something for a real chip? For the second use the "+" Operator. It
    synthesizes fine to an error-free and quite optimal result.

    For special hand-made adder designs I suggest the pen and paper method.
    Paint every full- or halfadder and for pipelining every flipflop you
    want to use in your design.

    Ralf
    Ralf Hildebrandt, Sep 28, 2008
    #7
  8. Amit schrieb:

    > i will need 5 levels of flip flops such that first level will have 9 f/
    > f and then second level will have 8 and so on or:
    >
    > 1-level 9 flip flops (y3 x3 y2 x2 y1 x1 y0 x0 c_in)
    > 2-level 8 flip flops (y3 x3 y2 x2 y1 x1 c_out0 sum0)
    > 3-level 7 flip flops (y3 x3 y2 x2 c_out1 sum1 sum0)
    > 4-level 6 flip flops (y3 x3 c_out2 sum2 sum1 sum0)
    > 5-level 5 flip flops (c_out3 sum3 sum2 sum1 sum0)
    >
    > is that right?



    Let me add: Pipelining only makes sense, if your combinational circuit
    is too slow for your clock. Otherwise it is a big waste of area
    (additional flipflops), power (a lot of switching activity in the
    flipflops) and an unnessecary introduction of latency.

    So in most cases it only makes sense to split a combinational block into
    two or three parts - not more.

    Ralf
    Ralf Hildebrandt, Sep 28, 2008
    #8
  9. Amit

    Amit Guest

    On Sep 28, 3:43 am, Ralf Hildebrandt <> wrote:
    > Amit schrieb:
    >
    > > i will need 5 levels of flip flops such that first level will have 9 f/
    > > f and then second level will have 8 and so on or:

    >
    > > 1-level  9 flip flops (y3 x3    y2 x2   y1 x1   y0 x0  c_in)
    > > 2-level  8 flip flops (y3 x3    y2 x2   y1 x1   c_out0  sum0)
    > > 3-level  7 flip flops (y3 x3    y2 x2   c_out1   sum1    sum0)
    > > 4-level  6 flip flops (y3 x3    c_out2   sum2   sum1    sum0)
    > > 5-level  5 flip flops (c_out3    sum3   sum2   sum1    sum0)

    >
    > > is that right?

    >
    > Let me add: Pipelining only makes sense, if your combinational circuit
    > is too slow for your clock. Otherwise it is a big waste of area
    > (additional flipflops), power (a lot of switching activity in the
    > flipflops) and an unnessecary introduction of latency.
    >
    > So in most cases it only makes sense to split a combinational block into
    > two or three parts - not more.
    >
    > Ralf



    hello group,

    thanks! yes finally i got the point of what you had explained the
    other day. my adder is working fine.

    thanks again
    amit
    Amit, Oct 1, 2008
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ingmar Seifert
    Replies:
    10
    Views:
    17,074
    swatig29
    Nov 4, 2009
  2. Eqbal Z
    Replies:
    1
    Views:
    1,243
  3. Replies:
    3
    Views:
    1,415
  4. Replies:
    3
    Views:
    648
  5. ghanta

    Pipeline

    ghanta, Jul 17, 2003, in forum: ASP .Net
    Replies:
    1
    Views:
    528
    Teemu Keiski
    Jul 17, 2003
Loading...

Share This Page