Smart coding for big multiplexer



Hi everyone, I'm working on a Xilinx Virtex 5 FPGA with ISE 10.1. In
my design I have to instantiate 128 ram blocks, each one of them is
1024 bytes wide. The outuput of my device depends on only one ram
block at a time, therefore I have to multiplex them. Which is the
smartest way to implement such a huge multiplexer?
Thanks a lot for you help.


Do you really mean 1024 bytes WIDE? That's way scary -
an 8192-bit data path :) I guess you mean that each
RAM block is in fact 1024 locations, each 8 bits wide.
That's normally known as a "depth" of 1024.

Silly me....of course I meant depth, that's my bad english fault.
We've found that XST does a better job of optimizing
wide MUXes if you code them as an explicit AND-OR
structure.  I don't know why this is, and I don't
know if it will always be true; you could imagine,
for example, that a synthesis tool might be able
to exploit carry chains to build the big OR gates.
Anyway, here's a sketch of the code:

  -- useful declarations
  subtype byte is std_logic_vector(7 downto 0);
  type byte_array is array(natural range <>) of byte;

  -- one result from each of your 128 RAM blocks
  signal RAM_read_data: byte_array(0 to 127);

  -- final output
  signal mux_data: byte;

  -- memory selector, chooses one from 128
  signal which_RAM: integer range RAM_read_data'range;

  process (RAM_read_data, which_RAM)
    variable mux_result: byte;
    mux_result := (others => '0');
    for i in RAM_read_data'range loop
      if i = which_RAM then
        mux_result := mux_result OR RAM_read_data(which_RAM);
      end if;
    end loop;
    mux_data <= mux_result;
  end process;

If this trick doesn't provide the improvement you need,
the next step is to consider pipelining.  It won't reduce
the area, but will give you better Fmax.

I'm sure other folk will have more, better ideas.

I really appreciate your help, I'll immediatly try to integrate your
code in my design...thank you!

Chris Maryan

The difference is that, when the loop is unrolled, you
are subscripting the array with a CONSTANT (i) rather
than with a variable.  It can be important for optimization,
even though the two are functionally identical.

Yes, that's VERY important. I ran into something like this a while ago
with Synplify, where the constant version properly instantiated a mux
and the variable version implemented some sort of variable shift
widget that was about an order of magnitude larger.



Which is the
smartest way to implement such a huge multiplexer?

The smartest way is to let the synthesis tool do as much of the work
as possible. Don't try to outsmart it unless you have to. If the
simplest, easiest to read, understand or write description will work
(i.e. meet timing, area, etc. requirements), then use that.

Borrowing Jonathan's definitions:

-- 128-to-1, byte wide multiplexer:
mux_data <= RAM_read_data(which_RAM);

If you don't know your requirements, then you won't know whether the
implementation you used is good enough, no matter how fast/small/cool/
elegant it is.


Mike Treseler

Massi said:
Hi everyone, I'm working on a Xilinx Virtex 5 FPGA with ISE 10.1. In
my design I have to instantiate 128 ram blocks, each one of them is
1024 bytes wide. The outuput of my device depends on only one ram
block at a time, therefore I have to multiplex them. Which is the
smartest way to implement such a huge multiplexer?
Thanks a lot for you help.

I agree with Andy.
I don't solve a synthesis problem until I have one.
The cleanest mux description is an array selection.
Give ISE a crack at it and have a look at
the RTL viewer and static timing.

I also agree with Jonathan.
Declare register/port dimensions first.
VHDL gives us an unfair advantage here.

-- Mike


If you only need one RAM at a time could you merge the rams into a
smaller number of larger ones? This would require that you only write
to one ram at a time too.

Also, I have used tbufs in the past to do this, however it appears
that V5's don't have these.



If you only need one RAM at a time could you merge the rams into a
smaller number of larger ones?  This would require that you only write
to one ram at a time too.

Also, I have used tbufs in the past to do this, however it appears
that V5's don't have these.


Tri-state bus code is translated into equivalent multiplexer type
circuits. The tristate enables are assumed to be mutually exclusive
for the multiplexor implementation. This actually comes in handy in
some applications where it is difficult to convince the synthesis tool
that separate inputs are mutually exclusive.


Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Latest member

Latest Threads
