Smart coding for big multiplexer

Massi · Apr 17, 2009

Hi everyone, I'm working on a Xilinx Virtex 5 FPGA with ISE 10.1. In
my design I have to instantiate 128 ram blocks, each one of them is
1024 bytes wide. The outuput of my device depends on only one ram
block at a time, therefore I have to multiplex them. Which is the
smartest way to implement such a huge multiplexer?
Thanks a lot for you help.

Massi · Apr 17, 2009

Do you really mean 1024 bytes WIDE? That's way scary -

an 8192-bit data path I guess you mean that each
RAM block is in fact 1024 locations, each 8 bits wide.
That's normally known as a "depth" of 1024.

Silly me....of course I meant depth, that's my bad english fault.

We've found that XST does a better job of optimizing
wide MUXes if you code them as an explicit AND-OR
structure. I don't know why this is, and I don't
know if it will always be true; you could imagine,
for example, that a synthesis tool might be able
to exploit carry chains to build the big OR gates.
Anyway, here's a sketch of the code:

-- useful declarations
subtype byte is std_logic_vector(7 downto 0);
type byte_array is array(natural range <>) of byte;

-- one result from each of your 128 RAM blocks
signal RAM_read_data: byte_array(0 to 127);

-- final output
signal mux_data: byte;

-- memory selector, chooses one from 128
signal which_RAM: integer range RAM_read_data'range;

...
process (RAM_read_data, which_RAM)
variable mux_result: byte;
begin
mux_result := (others => '0');
for i in RAM_read_data'range loop
if i = which_RAM then
mux_result := mux_result OR RAM_read_data(which_RAM);
end if;
end loop;
mux_data <= mux_result;
end process;

If this trick doesn't provide the improvement you need,
the next step is to consider pipelining. It won't reduce
the area, but will give you better Fmax.

I'm sure other folk will have more, better ideas.

I really appreciate your help, I'll immediatly try to integrate your
code in my design...thank you!

Chris Maryan · Apr 17, 2009

The difference is that, when the loop is unrolled, you
are subscripting the array with a CONSTANT (i) rather
than with a variable. It can be important for optimization,
even though the two are functionally identical.

Yes, that's VERY important. I ran into something like this a while ago
with Synplify, where the constant version properly instantiated a mux
and the variable version implemented some sort of variable shift
widget that was about an order of magnitude larger.

Chris

Andy · Apr 17, 2009

Which is the
smartest way to implement such a huge multiplexer?

The smartest way is to let the synthesis tool do as much of the work
as possible. Don't try to outsmart it unless you have to. If the
simplest, easiest to read, understand or write description will work
(i.e. meet timing, area, etc. requirements), then use that.

Borrowing Jonathan's definitions:

-- 128-to-1, byte wide multiplexer:
mux_data <= RAM_read_data(which_RAM);

If you don't know your requirements, then you won't know whether the
implementation you used is good enough, no matter how fast/small/cool/
elegant it is.

Andy

Mike Treseler · Apr 18, 2009

Massi said:
Hi everyone, I'm working on a Xilinx Virtex 5 FPGA with ISE 10.1. In
my design I have to instantiate 128 ram blocks, each one of them is
1024 bytes wide. The outuput of my device depends on only one ram
block at a time, therefore I have to multiplex them. Which is the
smartest way to implement such a huge multiplexer?
Thanks a lot for you help.

I agree with Andy.
I don't solve a synthesis problem until I have one.
The cleanest mux description is an array selection.
Give ISE a crack at it and have a look at
the RTL viewer and static timing.

I also agree with Jonathan.
Declare register/port dimensions first.
VHDL gives us an unfair advantage here.

-- Mike

Dal · Apr 20, 2009

If you only need one RAM at a time could you merge the rams into a
smaller number of larger ones? This would require that you only write
to one ram at a time too.

Also, I have used tbufs in the past to do this, however it appears
that V5's don't have these.

Darrin

Andy · Apr 20, 2009

If you only need one RAM at a time could you merge the rams into a
smaller number of larger ones? This would require that you only write
to one ram at a time too.

Also, I have used tbufs in the past to do this, however it appears
that V5's don't have these.

Darrin

Tri-state bus code is translated into equivalent multiplexer type
circuits. The tristate enables are assumed to be mutually exclusive
for the multiplexor implementation. This actually comes in handy in
some applications where it is difficult to convince the synthesis tool
that separate inputs are mutually exclusive.

Andy

Store register instantiation	2	Apr 2, 2009
coding style for arithmetic operations	2	Aug 22, 2009
Inferring RAM with FOR loop	1	Apr 3, 2006
Inferring block ram in Spartan II with non standard bus sizes	2	Oct 20, 2006
Please help: Error during synthesizing program with sensitivity list in process	1	Apr 11, 2010
How to check if ROM got inferred from synth reports	3	Oct 24, 2006
unexplainable Problem on Spartan 3	1	Dec 18, 2006
comp.lang.vhdl FAQ part 1 of 4: general	0	Jul 8, 2003

Smart coding for big multiplexer

Massi

Massi

Chris Maryan

Andy

Mike Treseler

Dal

Andy

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads