Pipelining question

Discussion in 'VHDL' started by Divyang M, Mar 31, 2005.

  1. Divyang M

    Divyang M Guest

    Hi,
    I am looking for some insight into how I can go about pipelining my
    system.

    The system is an image interpolator which contains a buffer (on-chip
    dual-port RAM) and an interpolator block.
    The buffer stores incoming data (real-time say at X MHz). The
    interpolator requires 4 data elements from this buffer to produce 1
    output.

    To keep the systems real-time, I am running my system at X MHz and also
    write to the buffer at this rate. But read the data out of the buffer
    at 4X MHz so that at each clock cycle I have all the 4 data elements
    that the intepolator unit needs. This has limited X to 50 MHz because
    the internal block RAM max out at 200 MHz (well, according to Altera it
    can go upto 287 MHz but I hit the wall at some point or another).

    I was wondering if there is a way to pipeline the design so that I can
    run the whole system at a single clock frequency but still not have a
    huge backlog of data accumulation (since there is finite amount of
    on-chip storage) or if there are examples of such in any books?

    Thanks,
    Divyang M
     
    Divyang M, Mar 31, 2005
    #1
    1. Advertising

  2. "Divyang M" <> writes:

    > Hi,
    > I am looking for some insight into how I can go about pipelining my
    > system.
    >
    > The system is an image interpolator which contains a buffer (on-chip
    > dual-port RAM) and an interpolator block.
    > The buffer stores incoming data (real-time say at X MHz). The
    > interpolator requires 4 data elements from this buffer to produce 1
    > output.
    >
    > To keep the systems real-time, I am running my system at X MHz and also
    > write to the buffer at this rate. But read the data out of the buffer
    > at 4X MHz so that at each clock cycle I have all the 4 data elements
    > that the intepolator unit needs. This has limited X to 50 MHz because
    > the internal block RAM max out at 200 MHz (well, according to Altera it
    > can go upto 287 MHz but I hit the wall at some point or another).
    >
    > I was wondering if there is a way to pipeline the design so that I can
    > run the whole system at a single clock frequency but still not have a
    > huge backlog of data accumulation (since there is finite amount of
    > on-chip storage) or if there are examples of such in any books?


    If you have the storage available, write to four RAMs in parallel and
    read 1 element from each RAM per clock cycle.
    Other combinations are also possible.

    Regards,


    Kai
    --
    Kai Harrekilde-Petersen <khp(at)harrekilde(dot)dk>
     
    Kai Harrekilde-Petersen, Mar 31, 2005
    #2
    1. Advertising

  3. Divyang M

    Divyang M Guest

    Hello Kai,

    I do not have that much storage space to store the data 4 times since
    my data is a 240x320 gray level (8-bit/pixel) image.

    The other option I was thinking of is to delay the first of the four
    output by 4 cycles (registers), the second by 3 cycles, the third
    output by 2 cycles, and the fourth output by 1 cycle. These four
    points will then be aligned, but if I use this strategy, then I get a
    valid output out of my system once every 4 cycles, so a throughput of
    0.25 (if I'm using the definition of throughput correctly). But I would
    ideally like the throughput to be 1.

    Any others suggestions you have would be welcome.

    Thanks,
    Divyang M
     
    Divyang M, Mar 31, 2005
    #3
  4. Divyang M

    info_ Guest

    Are the four element consecutive ?
    Then it's a simple pipelined FIR, and it runs at full speed,
    producing data at X Ms/s with a X MHz clock.

    This become a little bit more interesting when you come to 2D
    correlation :) with 8 "adjacent" pixels spread over 3 lines.

    Bert Cuzeau

    Divyang M wrote:
    > Hello Kai,
    >
    > I do not have that much storage space to store the data 4 times since
    > my data is a 240x320 gray level (8-bit/pixel) image.
    >
    > The other option I was thinking of is to delay the first of the four
    > output by 4 cycles (registers), the second by 3 cycles, the third
    > output by 2 cycles, and the fourth output by 1 cycle. These four
    > points will then be aligned, but if I use this strategy, then I get a
    > valid output out of my system once every 4 cycles, so a throughput of
    > 0.25 (if I'm using the definition of throughput correctly). But I would
    > ideally like the throughput to be 1.
    >
    > Any others suggestions you have would be welcome.
    >
    > Thanks,
    > Divyang M
    >
     
    info_, Mar 31, 2005
    #4
  5. Divyang M

    Divyang M Guest

    Hi Bert,

    The four elements are not consecutive. I am actually working with an
    image. So the four elements are essentially 2 elements each in 2 rows,
    something like A11, A12, A21, A22.

    But these can be any 4 "adjacent" elements in the image, so they do not
    go by a particular order either.

    Thanks,
    Divyang
     
    Divyang M, Mar 31, 2005
    #5
  6. Hi Divyang,

    Is there some scrambled addressing way that you can write the data so that
    the words you need to read are always consecutive? That way you could set
    your write port width to 8, and your read port width to 32.

    If this is possible, your write bandwidth would be limited to the Tpd of the
    logic that sets up the write address, but at least you will be able to
    clock both ends at write speed.

    Best regards,


    Ben
     
    Ben Twijnstra, Mar 31, 2005
    #6
  7. Divyang M

    info_ Guest

    I think you just need to pipeline almost one line (N-2 depth), and pipeline it on two
    FF stages at the entrance and the exit. So at a single clock cycle, you would have
    your four elements available for combining, every clock cyle.

    The big pipeline fits nicely in a dual port memory used as circular buffer,
    or simply a ready-made Fifo.

    Assuming the pixels per line is n+1 :

    FF -> FF -> [[[[[ Fifo(n-2) ]]]]] -> FF -> FF
    D2n D2n-1 D2n-2 ... D21 D20 D1n D1n-1

    so you have D1n, D1n-1, D2n, D2n-1 available at the same time.
    Just be careful at the edges...

    What can we not do with pipelining :) ?

    It's late here, I hope I didn't goof.

    Bert

    Divyang M wrote:

    > Hi Bert,
    >
    > The four elements are not consecutive. I am actually working with an
    > image. So the four elements are essentially 2 elements each in 2 rows,
    > something like A11, A12, A21, A22.
    >
    > But these can be any 4 "adjacent" elements in the image, so they do not
    > go by a particular order either.
    >
    > Thanks,
    > Divyang
    >
     
    info_, Mar 31, 2005
    #7
  8. Divyang M

    Divyang M Guest

    Hi Bert,

    That would work if I was doing a "forward mapping" (ie taking 4 know
    input pixels to comupte an output pixel which can end up anywhere in
    the image), but I am doing "inverse mapping" (ie I know the output
    pixel I am computing, but the 4 input pixels can be located anywhere in
    the input image).

    Thanks for your time and help.

    Divyang
     
    Divyang M, Mar 31, 2005
    #8
  9. Divyang M

    Divyang M Guest

    Hi Ben,

    That's what I am thinking now. There seems to be no straight-forward
    solution to "inverse mapping" problems (due to the uncertainity of
    which input pixels to access) even in any of the literatures. So I
    might have to go this way.

    Thanks,
    Divyang M
     
    Divyang M, Mar 31, 2005
    #9
  10. Hi Divyang M,

    > The four elements are not consecutive. I am actually working with an
    > image. So the four elements are essentially 2 elements each in 2 rows,
    > something like A11, A12, A21, A22.
    >
    > But these can be any 4 "adjacent" elements in the image, so they do not
    > go by a particular order either.


    I once wrote a linear interpolation algorithm for a Bayer-matrix camera
    interface for one of my customers. I can't publish the code due to legal
    reasons, but the idea is as follows:

    There is no frame buffer. You use an FSM to capture video into one of four
    line buffers when writing. Thus, three buffers are available for reading,
    and the fourth is being updated.

    Thus, assuming that your destination pixel (xd,yd) is a function of one or
    two (x+d,y-1), one or two (x+d,y) and one or two (x+d, y+1) pixels (where d
    is some arbitrary X distance) you can read pixels at write speed from the
    three line buffers.

    In my case, the worst case computation was
    p(x-1,y-1)*C1+p(x+1,y-1)*C2+p(x-1,y+1)*C3+p(x+1,y+1)*C4, which would
    normally cause 4 fetches from a frame buffer, 2*2 fetches using the
    abovementioned technique, or simply 2*1 fetch if you store the x-1 and x
    fetches in 8-bit 'cache' registers.

    In this case I could simply work from left to right, so I used 6*8 DFFs to
    hold

    (x-1, y-1), (x, y-1)
    (x-1, y) , (x, y )
    (x-1, y+1), (x, y+1)

    and compute my formula by setting the address on the line buffers to x+1.
    The data outputs would then yield

    (x+1, y-1)
    (x+1, y )
    (x+1, y+1)

    giving me all the necessary date to compute RGB values for every pixel in
    the Bayer pattern, or basically any 3x3 Fourier kernel on grayscale data.

    The whole idea yields a 3-scanline delay between input and output, but this
    way you can run at high clock speeds and be creative about the
    interpolation method.

    Best regards,


    Ben
     
    Ben Twijnstra, Mar 31, 2005
    #10
  11. Divyang M

    Divyang M Guest

    Hi Ben,

    Thanks for the insight. I will look carefully at your suggestion and
    hopefully it will yield something for my problem as well :).

    Thanks,
    Divyang
     
    Divyang M, Mar 31, 2005
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. koko

    Pipelining in VHDL

    koko, Apr 29, 2004, in forum: VHDL
    Replies:
    2
    Views:
    5,690
    mizocom
    Apr 29, 2004
  2. Peggy
    Replies:
    0
    Views:
    642
    Peggy
    Sep 8, 2004
  3. tulip

    pipelining

    tulip, Nov 5, 2004, in forum: VHDL
    Replies:
    0
    Views:
    643
    tulip
    Nov 5, 2004
  4. Andrea Campi

    Pipelining tutorial wanted

    Andrea Campi, Nov 14, 2004, in forum: VHDL
    Replies:
    9
    Views:
    2,206
    Mike Treseler
    Nov 19, 2004
  5. panatha

    Question about http pipelining

    panatha, Aug 17, 2007, in forum: Java
    Replies:
    2
    Views:
    776
Loading...

Share This Page