help on 2-d arry .vs. register file

S

systolic

Again, some questions about here:

Inside my top-level design, I have a 32x32 8-bit data block flowing
through several modules, some modules are in sequence, some in parallel.
Inside each module, I need to process the data block as a 2-D array,
like 4x4 block-based operations, etc.

How could I pass the 32x32 data block very efficiently among those
modules in terms of system speed and logical element utilization?

Will it be possible and efficient for me to have a 2-D array defined in
top-level design, and pass the 2-D array among those modules? If it is
possible, how to do it? And will it consume too much resource?

Or, I need to have a small piece of memeory or register file using
lpm_ram, then let each module access the memory through the bus? Then
how will I process the data in 2-D array inside each module? Do i need
to buffer the data inside each module for array-wise operations? Then
will it be slow and also consume extra resourse?

Maybe I am in the wrong track. I am not quite familiar with VHDL. Still
kind of C programmer. :( :p

Please help me on it. Thank you a lot. :)
 
M

Mike Treseler

systolic said:
Inside my top-level design, I have a 32x32 8-bit data block flowing
through several modules, some modules are in sequence, some in parallel.
Inside each module, I need to process the data block as a 2-D array,
like 4x4 block-based operations, etc.

Write your top level entity before you start slicing.
I expect that there are no 1024 bit interfaces at the top.
Maybe a dot clock and video data in and out?
Next work out the top architecture signals
Do you need to count out rows and columns?
Are you processing everything live?
Line buffers? Frame buffers?

-- Mike Treseler
 
S

systolic

Mike said:
Write your top level entity before you start slicing.
I expect that there are no 1024 bit interfaces at the top.
Maybe a dot clock and video data in and out?
Next work out the top architecture signals
Do you need to count out rows and columns?
Are you processing everything live?
Line buffers? Frame buffers?

-- Mike Treseler

Mike, thank you for the reply.

Yes, I assume there is a frame buffer, which feeds data into my top
level design in a 32-bit interface (4 pixels in one time).
Then I need to perform 32x32 block-based operations inside the top level
design among several modules. Totally, I have 4 modules in three levels.
The last one need to perform the block-based operations from 32x32 block
all the way down to 4x4 blocks.

I think I could pass everything among those modules on a 32-bit bus,
then re-format data into a 32x32 block inside each module. But it would
consume more memory and impact the system speed.

I am expecting to have possibility to passing the 32x32 block through
each modules. I am really not quite sure I could do that and how. Guess
it is also not worth for such huge interface among those module if this
is possible.

I would like to have some suggestions or hints.

Maybe I still have to go back to a 32-bit bus and reformat the 32x32
block inside modules. Is this the normal way to do it? No way to work
around this?
 
M

Mike Treseler

systolic said:
Mike, thank you for the reply.

Yes, I assume there is a frame buffer, which feeds data into my top
level design in a 32-bit interface (4 pixels in one time).

Consider verifying this before you proceed.
Then I need to perform 32x32 block-based operations inside the top level
design among several modules. Totally, I have 4 modules in three levels.
The last one need to perform the block-based operations from 32x32 block
all the way down to 4x4 blocks.

Are those bit blocks or pixel blocks?
I think I could pass everything among those modules on a 32-bit bus,
then re-format data into a 32x32 block inside each module. But it would
consume more memory and impact the system speed.

What is the speed requirement?
Do you have to keep up with each frame,
or are you post-processing a single frame.
If you are planning to put this in a fpga,
a 1024 bit input bus in unrealistic.
I am expecting to have possibility to passing the 32x32 block through
each modules. I am really not quite sure I could do that and how. Guess
it is also not worth for such huge interface among those module if this
is possible.

Once you have shifted in the data block, processing 1024 bits in
parallel is possible.
Maybe I still have to go back to a 32-bit bus and reformat the 32x32
block inside modules. Is this the normal way to do it? No way to work
around this?

The limit is FPGA pins. They are three for a dollar.

-- Mike Treseler
 
R

rickman

systolic said:
Again, some questions about here:

Inside my top-level design, I have a 32x32 8-bit data block flowing
through several modules, some modules are in sequence, some in parallel.
Inside each module, I need to process the data block as a 2-D array,
like 4x4 block-based operations, etc.

How could I pass the 32x32 data block very efficiently among those
modules in terms of system speed and logical element utilization?

Will it be possible and efficient for me to have a 2-D array defined in
top-level design, and pass the 2-D array among those modules? If it is
possible, how to do it? And will it consume too much resource?

Or, I need to have a small piece of memeory or register file using
lpm_ram, then let each module access the memory through the bus? Then
how will I process the data in 2-D array inside each module? Do i need
to buffer the data inside each module for array-wise operations? Then
will it be slow and also consume extra resourse?

Maybe I am in the wrong track. I am not quite familiar with VHDL. Still
kind of C programmer. :( :p

I have read the replies to this post and I can see that you are still
thinking in terms of C rather than hardware. VHDL stands for VHSIC
Hardware Description Language. The key part is HARDWARE. VHDL is used
for describing hardware, not algorithms. So instead of thinking of this
as a program that will be turned into hardware by some magical process,
think of it as a way to describe the hardware you want built. If you
don't know how to design the hardware, it is unlikely that you will get
hardware that will be at all efficient.

VHDL uses modules also known as components. How you transfer the data
between them does not appreciably matter since the signals are just
wires and require very little time to transfer a signal. Wires also
don't use much in the way of resources. The only exception is when you
are receiving data serially and you want to process data serially. Then
there is no need to transfer your data in parallel.

So draw some block diagrams showing your processing and break it down to
the level of registers. Label all the interfaces with the number of
wires in each path. Then decide where you want the blocks grouped into
modules and start "describing" your hardware. It will go a lot easier
this way.


--

Rick "rickman" Collins

(e-mail address removed)
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX
 
S

systolic

Rickman, thx for your reply.

This design has been frustrating for a while. I broke down the entire
design to several modules and thought about the interface among modules
and between the compression FPGA unit and the frame-buffer unit.

So as you said, it is ok to have 1024 wires among modules inside one FPGA.

The way I need to manipulate the 32x32 pixle-block is performing some
arithmetical operations based on the whole block, then some other
operations from 4x4 pixle-blocks all the way up to 32x32 pixle-block, or
from 32x32 pixle-block all the way down to 4x4 pixel-blocks in different
modules. It is a kind of quartree operation: splitting 32x32 pixle-block
to 4 16x16 pixle-blocks, 4 16x16 to 16 8x8, and so on.

In this way, I hope to have the 32x32 pixel-block ready for each module
when they need it and take advantage of the array index operations.

So my concern is:
1. If I can pass a 32x32 pixle-block result among those modules in one
time. (Looks the answer is NO)
2. If I can not pass 32x32 pixel-block in one time, which will be better
for buffering 32x32 pixle-block inside each module .vs. having a
register file in top level which updated after the operations in each
module.
3. Or there are some other better ways? Or I am still in the wrong track.


Ok, thank a lot for your time and replies.
 
R

rickman

systolic said:
Rickman, thx for your reply.

This design has been frustrating for a while. I broke down the entire
design to several modules and thought about the interface among modules
and between the compression FPGA unit and the frame-buffer unit.

So as you said, it is ok to have 1024 wires among modules inside one FPGA.

The way I need to manipulate the 32x32 pixle-block is performing some
arithmetical operations based on the whole block, then some other
operations from 4x4 pixle-blocks all the way up to 32x32 pixle-block, or
from 32x32 pixle-block all the way down to 4x4 pixel-blocks in different
modules. It is a kind of quartree operation: splitting 32x32 pixle-block
to 4 16x16 pixle-blocks, 4 16x16 to 16 8x8, and so on.

In this way, I hope to have the 32x32 pixel-block ready for each module
when they need it and take advantage of the array index operations.

So my concern is:
1. If I can pass a 32x32 pixle-block result among those modules in one
time. (Looks the answer is NO)
2. If I can not pass 32x32 pixel-block in one time, which will be better
for buffering 32x32 pixle-block inside each module .vs. having a
register file in top level which updated after the operations in each
module.
3. Or there are some other better ways? Or I am still in the wrong track.

I didn't say that using a lot of wires is ok. Each wire needs a driver,
so there is cost in the hardware. But if the data is being produced in
parallel and you already have the drivers, there is no need to reduce
the size of the interface.

You seem to be focusing on how you will pass the data between blocks
rather than how the blocks will work. If you are going to do all your
math in parallel and *need* to have the data all at once, then you will
need a wide interface. But if your data is being processed in chunks
that are less than the size of the entire array, then the chunk size
would be the best interface size.

Think of hardware like an assembly line. If 12 items get stuffed into a
box, they don't move 12 items along the assembly line in parallel. They
get delivered one at a time so each one can then be put into the box.
Or maybe three at a time can be put in the box, so they travel three
wide, maybe. If it takes the same time to deliver three items, one at a
time, as it does to put all three in the box, then they can still be
delivered on a one wide belt.

So do your modules need the data all at once? Or a few items at a
time? Maybe you should leave the definition of the size of your
interfaces until you know more about the design of the blocks?

--

Rick "rickman" Collins

(e-mail address removed)
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,565
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top