help on 2-d arry .vs. register file

Discussion in 'VHDL' started by systolic, Oct 23, 2004.

  1. systolic

    systolic Guest

    Again, some questions about here:

    Inside my top-level design, I have a 32x32 8-bit data block flowing
    through several modules, some modules are in sequence, some in parallel.
    Inside each module, I need to process the data block as a 2-D array,
    like 4x4 block-based operations, etc.

    How could I pass the 32x32 data block very efficiently among those
    modules in terms of system speed and logical element utilization?

    Will it be possible and efficient for me to have a 2-D array defined in
    top-level design, and pass the 2-D array among those modules? If it is
    possible, how to do it? And will it consume too much resource?

    Or, I need to have a small piece of memeory or register file using
    lpm_ram, then let each module access the memory through the bus? Then
    how will I process the data in 2-D array inside each module? Do i need
    to buffer the data inside each module for array-wise operations? Then
    will it be slow and also consume extra resourse?

    Maybe I am in the wrong track. I am not quite familiar with VHDL. Still
    kind of C programmer. :( :p

    Please help me on it. Thank you a lot. :)
    systolic, Oct 23, 2004
    #1
    1. Advertising

  2. systolic wrote:

    > Inside my top-level design, I have a 32x32 8-bit data block flowing
    > through several modules, some modules are in sequence, some in parallel.
    > Inside each module, I need to process the data block as a 2-D array,
    > like 4x4 block-based operations, etc.


    Write your top level entity before you start slicing.
    I expect that there are no 1024 bit interfaces at the top.
    Maybe a dot clock and video data in and out?
    Next work out the top architecture signals
    Do you need to count out rows and columns?
    Are you processing everything live?
    Line buffers? Frame buffers?

    -- Mike Treseler
    Mike Treseler, Oct 23, 2004
    #2
    1. Advertising

  3. systolic

    systolic Guest

    Mike Treseler wrote:

    > systolic wrote:
    >
    >> Inside my top-level design, I have a 32x32 8-bit data block flowing
    >> through several modules, some modules are in sequence, some in
    >> parallel. Inside each module, I need to process the data block as a
    >> 2-D array, like 4x4 block-based operations, etc.

    >
    >
    > Write your top level entity before you start slicing.
    > I expect that there are no 1024 bit interfaces at the top.
    > Maybe a dot clock and video data in and out?
    > Next work out the top architecture signals
    > Do you need to count out rows and columns?
    > Are you processing everything live?
    > Line buffers? Frame buffers?
    >
    > -- Mike Treseler


    Mike, thank you for the reply.

    Yes, I assume there is a frame buffer, which feeds data into my top
    level design in a 32-bit interface (4 pixels in one time).
    Then I need to perform 32x32 block-based operations inside the top level
    design among several modules. Totally, I have 4 modules in three levels.
    The last one need to perform the block-based operations from 32x32 block
    all the way down to 4x4 blocks.

    I think I could pass everything among those modules on a 32-bit bus,
    then re-format data into a 32x32 block inside each module. But it would
    consume more memory and impact the system speed.

    I am expecting to have possibility to passing the 32x32 block through
    each modules. I am really not quite sure I could do that and how. Guess
    it is also not worth for such huge interface among those module if this
    is possible.

    I would like to have some suggestions or hints.

    Maybe I still have to go back to a 32-bit bus and reformat the 32x32
    block inside modules. Is this the normal way to do it? No way to work
    around this?
    systolic, Oct 23, 2004
    #3
  4. systolic wrote:

    > Mike, thank you for the reply.
    >
    > Yes, I assume there is a frame buffer, which feeds data into my top
    > level design in a 32-bit interface (4 pixels in one time).


    Consider verifying this before you proceed.

    > Then I need to perform 32x32 block-based operations inside the top level
    > design among several modules. Totally, I have 4 modules in three levels.
    > The last one need to perform the block-based operations from 32x32 block
    > all the way down to 4x4 blocks.


    Are those bit blocks or pixel blocks?

    > I think I could pass everything among those modules on a 32-bit bus,
    > then re-format data into a 32x32 block inside each module. But it would
    > consume more memory and impact the system speed.


    What is the speed requirement?
    Do you have to keep up with each frame,
    or are you post-processing a single frame.
    If you are planning to put this in a fpga,
    a 1024 bit input bus in unrealistic.

    > I am expecting to have possibility to passing the 32x32 block through
    > each modules. I am really not quite sure I could do that and how. Guess
    > it is also not worth for such huge interface among those module if this
    > is possible.


    Once you have shifted in the data block, processing 1024 bits in
    parallel is possible.

    > Maybe I still have to go back to a 32-bit bus and reformat the 32x32
    > block inside modules. Is this the normal way to do it? No way to work
    > around this?


    The limit is FPGA pins. They are three for a dollar.

    -- Mike Treseler
    Mike Treseler, Oct 24, 2004
    #4
  5. systolic

    rickman Guest

    systolic wrote:
    >
    > Again, some questions about here:
    >
    > Inside my top-level design, I have a 32x32 8-bit data block flowing
    > through several modules, some modules are in sequence, some in parallel.
    > Inside each module, I need to process the data block as a 2-D array,
    > like 4x4 block-based operations, etc.
    >
    > How could I pass the 32x32 data block very efficiently among those
    > modules in terms of system speed and logical element utilization?
    >
    > Will it be possible and efficient for me to have a 2-D array defined in
    > top-level design, and pass the 2-D array among those modules? If it is
    > possible, how to do it? And will it consume too much resource?
    >
    > Or, I need to have a small piece of memeory or register file using
    > lpm_ram, then let each module access the memory through the bus? Then
    > how will I process the data in 2-D array inside each module? Do i need
    > to buffer the data inside each module for array-wise operations? Then
    > will it be slow and also consume extra resourse?
    >
    > Maybe I am in the wrong track. I am not quite familiar with VHDL. Still
    > kind of C programmer. :( :p


    I have read the replies to this post and I can see that you are still
    thinking in terms of C rather than hardware. VHDL stands for VHSIC
    Hardware Description Language. The key part is HARDWARE. VHDL is used
    for describing hardware, not algorithms. So instead of thinking of this
    as a program that will be turned into hardware by some magical process,
    think of it as a way to describe the hardware you want built. If you
    don't know how to design the hardware, it is unlikely that you will get
    hardware that will be at all efficient.

    VHDL uses modules also known as components. How you transfer the data
    between them does not appreciably matter since the signals are just
    wires and require very little time to transfer a signal. Wires also
    don't use much in the way of resources. The only exception is when you
    are receiving data serially and you want to process data serially. Then
    there is no need to transfer your data in parallel.

    So draw some block diagrams showing your processing and break it down to
    the level of registers. Label all the interfaces with the number of
    wires in each path. Then decide where you want the blocks grouped into
    modules and start "describing" your hardware. It will go a lot easier
    this way.


    --

    Rick "rickman" Collins


    Ignore the reply address. To email me use the above address with the XY
    removed.

    Arius - A Signal Processing Solutions Company
    Specializing in DSP and FPGA design URL http://www.arius.com
    4 King Ave 301-682-7772 Voice
    Frederick, MD 21701-3110 301-682-7666 FAX
    rickman, Oct 24, 2004
    #5
  6. systolic

    systolic Guest

    Rickman, thx for your reply.

    This design has been frustrating for a while. I broke down the entire
    design to several modules and thought about the interface among modules
    and between the compression FPGA unit and the frame-buffer unit.

    So as you said, it is ok to have 1024 wires among modules inside one FPGA.

    The way I need to manipulate the 32x32 pixle-block is performing some
    arithmetical operations based on the whole block, then some other
    operations from 4x4 pixle-blocks all the way up to 32x32 pixle-block, or
    from 32x32 pixle-block all the way down to 4x4 pixel-blocks in different
    modules. It is a kind of quartree operation: splitting 32x32 pixle-block
    to 4 16x16 pixle-blocks, 4 16x16 to 16 8x8, and so on.

    In this way, I hope to have the 32x32 pixel-block ready for each module
    when they need it and take advantage of the array index operations.

    So my concern is:
    1. If I can pass a 32x32 pixle-block result among those modules in one
    time. (Looks the answer is NO)
    2. If I can not pass 32x32 pixel-block in one time, which will be better
    for buffering 32x32 pixle-block inside each module .vs. having a
    register file in top level which updated after the operations in each
    module.
    3. Or there are some other better ways? Or I am still in the wrong track.


    Ok, thank a lot for your time and replies.


    rickman wrote:

    > systolic wrote:
    >
    >>Again, some questions about here:
    >>
    >>Inside my top-level design, I have a 32x32 8-bit data block flowing
    >>through several modules, some modules are in sequence, some in parallel.
    >>Inside each module, I need to process the data block as a 2-D array,
    >>like 4x4 block-based operations, etc.
    >>
    >>How could I pass the 32x32 data block very efficiently among those
    >>modules in terms of system speed and logical element utilization?
    >>
    >>Will it be possible and efficient for me to have a 2-D array defined in
    >>top-level design, and pass the 2-D array among those modules? If it is
    >>possible, how to do it? And will it consume too much resource?
    >>
    >>Or, I need to have a small piece of memeory or register file using
    >>lpm_ram, then let each module access the memory through the bus? Then
    >>how will I process the data in 2-D array inside each module? Do i need
    >>to buffer the data inside each module for array-wise operations? Then
    >>will it be slow and also consume extra resourse?
    >>
    >>Maybe I am in the wrong track. I am not quite familiar with VHDL. Still
    >>kind of C programmer. :( :p

    >
    >
    > I have read the replies to this post and I can see that you are still
    > thinking in terms of C rather than hardware. VHDL stands for VHSIC
    > Hardware Description Language. The key part is HARDWARE. VHDL is used
    > for describing hardware, not algorithms. So instead of thinking of this
    > as a program that will be turned into hardware by some magical process,
    > think of it as a way to describe the hardware you want built. If you
    > don't know how to design the hardware, it is unlikely that you will get
    > hardware that will be at all efficient.
    >
    > VHDL uses modules also known as components. How you transfer the data
    > between them does not appreciably matter since the signals are just
    > wires and require very little time to transfer a signal. Wires also
    > don't use much in the way of resources. The only exception is when you
    > are receiving data serially and you want to process data serially. Then
    > there is no need to transfer your data in parallel.
    >
    > So draw some block diagrams showing your processing and break it down to
    > the level of registers. Label all the interfaces with the number of
    > wires in each path. Then decide where you want the blocks grouped into
    > modules and start "describing" your hardware. It will go a lot easier
    > this way.
    >
    >
    systolic, Oct 24, 2004
    #6
  7. systolic

    rickman Guest

    systolic wrote:
    >
    > Rickman, thx for your reply.
    >
    > This design has been frustrating for a while. I broke down the entire
    > design to several modules and thought about the interface among modules
    > and between the compression FPGA unit and the frame-buffer unit.
    >
    > So as you said, it is ok to have 1024 wires among modules inside one FPGA.
    >
    > The way I need to manipulate the 32x32 pixle-block is performing some
    > arithmetical operations based on the whole block, then some other
    > operations from 4x4 pixle-blocks all the way up to 32x32 pixle-block, or
    > from 32x32 pixle-block all the way down to 4x4 pixel-blocks in different
    > modules. It is a kind of quartree operation: splitting 32x32 pixle-block
    > to 4 16x16 pixle-blocks, 4 16x16 to 16 8x8, and so on.
    >
    > In this way, I hope to have the 32x32 pixel-block ready for each module
    > when they need it and take advantage of the array index operations.
    >
    > So my concern is:
    > 1. If I can pass a 32x32 pixle-block result among those modules in one
    > time. (Looks the answer is NO)
    > 2. If I can not pass 32x32 pixel-block in one time, which will be better
    > for buffering 32x32 pixle-block inside each module .vs. having a
    > register file in top level which updated after the operations in each
    > module.
    > 3. Or there are some other better ways? Or I am still in the wrong track.


    I didn't say that using a lot of wires is ok. Each wire needs a driver,
    so there is cost in the hardware. But if the data is being produced in
    parallel and you already have the drivers, there is no need to reduce
    the size of the interface.

    You seem to be focusing on how you will pass the data between blocks
    rather than how the blocks will work. If you are going to do all your
    math in parallel and *need* to have the data all at once, then you will
    need a wide interface. But if your data is being processed in chunks
    that are less than the size of the entire array, then the chunk size
    would be the best interface size.

    Think of hardware like an assembly line. If 12 items get stuffed into a
    box, they don't move 12 items along the assembly line in parallel. They
    get delivered one at a time so each one can then be put into the box.
    Or maybe three at a time can be put in the box, so they travel three
    wide, maybe. If it takes the same time to deliver three items, one at a
    time, as it does to put all three in the box, then they can still be
    delivered on a one wide belt.

    So do your modules need the data all at once? Or a few items at a
    time? Maybe you should leave the definition of the size of your
    interfaces until you know more about the design of the blocks?

    --

    Rick "rickman" Collins


    Ignore the reply address. To email me use the above address with the XY
    removed.

    Arius - A Signal Processing Solutions Company
    Specializing in DSP and FPGA design URL http://www.arius.com
    4 King Ave 301-682-7772 Voice
    Frederick, MD 21701-3110 301-682-7666 FAX
    rickman, Oct 24, 2004
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Hayami
    Replies:
    2
    Views:
    996
    Hayami
    Mar 7, 2004
  2. pandora
    Replies:
    0
    Views:
    561
    pandora
    Apr 14, 2004
  3. New User ^_^
    Replies:
    3
    Views:
    23,324
    sammy
    Aug 2, 2009
  4. Freddy Drogt

    Filter arry

    Freddy Drogt, Jan 29, 2004, in forum: Perl
    Replies:
    0
    Views:
    540
    Freddy Drogt
    Jan 29, 2004
  5. David
    Replies:
    1
    Views:
    95
    David
    Nov 12, 2006
Loading...

Share This Page