How best do I implement routing boxes in RTL?

news reader · Mar 8, 2007

In the design I have 256 3-bit registers, every time I need to read or
write 16 of them (data_o0, 1, ...15).
The read/write address is not totally random.

For example, assuming that I arrange the register into a 16X16 matrix,
data_o0 accesses
among the zeros row or column. data_o1 may access from 20 of the
registers, but not 256, data_o2 may
access from 30 of the variables, etc.

If I code such that every output reads from the 256 registers, the final
logic will be overkill and highly redundant.

If I use case statements to list each of the senarios, the RTL code may end
up 500 kilobyte.
Will design compiler synthesize a 500KB design efficiently? Will NCVerilog
compile and simulate it efficiently?

Are there any neater techniques to attack this problem?

=?iso-8859-1?B?VXRrdSDWemNhbg==?= · Mar 8, 2007

Hi "news reader", my humble perls in between..

news said:
In the design I have 256 3-bit registers, every time I need to read or
write 16 of them (data_o0, 1, ...15).
The read/write address is not totally random.

It seems that you have an algorithm that handles a deterministic
distribution of the values to be accessed. Therefore you think you can
implement it with logic only.

I assume you are modeling an algorithm for a special matrix operation.

For example, assuming that I arrange the register into a 16X16 matrix,
data_o0 accesses among the zeros row or column. data_o1 may access from 20 of the
registers, but not 256, data_o2 may access from 30 of the variables, etc.

The values do not give us much info. data_ox (x = 1, 2, ...) is
accessing which elements and in which distribution?

If I code such that every output reads from the 256 registers, the final
logic will be overkill and highly redundant.

You think that the distribution of elements can be accessed with pure
logic.
Therefore you tried to model your logic to cover every case, or you
want to do it so.

If I use case statements to list each of the senarios, the RTL code may end
up 500 kilobyte.

This is reasonable then.

Will design compiler synthesize a 500KB design efficiently?

What means "efficience" for you? Speed or minimum logic?
If minimum logic, then please share with us the algorithm you are
trying to implement.

Will NCVerilog compile and simulate it efficiently?

NCVerilog does not care about logic implementation. It defines the
behaviour of the system, no matter how the objects are linked.

Are there any neater techniques to attack this problem?

Since you have not given much data, I think you can implement this
stuff with a RAM.
Why don't you use a RAM? Then you can define the RAM addresses to
model your matrix. You will generate addresses to define the positions
for your matrix which mimics your algorithm.

Utku.

news reader · Mar 9, 2007

Utku Özcan said:
Hi "news reader", my humble perls in between..

It seems that you have an algorithm that handles a deterministic
distribution of the values to be accessed. Therefore you think you can
implement it with logic only.

I assume you are modeling an algorithm for a special matrix operation.

It's not matrix, but the memory access is intensive, must accomplish r/w in
single clock cycle, so register is used instead of memory.

The values do not give us much info. data_ox (x = 1, 2, ...) is
accessing which elements and in which distribution?

In each clock cycle, 16 addresses are generated, and 16 data are
read/written. However,
each of the 16 data is read/written only to n/256 addresses (0<n<255).

You think that the distribution of elements can be accessed with pure
logic.
Therefore you tried to model your logic to cover every case, or you
want to do it so.

This is reasonable then.

By means of case statement, I use 32 case statements, in each case statement
there
are less than 256 choices. Some have only 20, 30 choices, etc.

What means "efficience" for you? Speed or minimum logic?
If minimum logic, then please share with us the algorithm you are
trying to implement.

NCVerilog does not care about logic implementation. It defines the
behaviour of the system, no matter how the objects are linked.

For example in read operation,
--------------------- implementation A------------------
input [7:0] addr_i0, addr_r1, ...addr_r15;
output [2:0] dat_o0, dat_o1, ...dat_o15;

reg [2:0] mymemory[0:255]; // Main memory

dat_o0 <= mymemory[addr_i0];
dat_o1 <= mymemory[addr_i1];
.....
dat_o15 <= mymemory[addr_i15];
--------------------- End A------------------

--------------------- implementation B------------------

case (addr_i0) // I can calculate these options through simulations.
8'd0 : dat_o0 <= mymemory[0 ];
8'd5 : dat_o0 <= mymemory[5 ];
8'd54 : dat_o0 <= mymemory[54 ];
8'd122: dat_o0 <= mymemory[122];
8'd125: dat_o0 <= mymemory[125];
....
8'd166: dat_o0 <= mymemory[166];
8'd233: dat_o0 <= mymemory[233];
default: dat_o0 <= mymemory[0 ];
endcase

case (addr_i1)
8'd0 : dat_o1 <= mymemory[0 ];
8'd7 : dat_o1 <= mymemory[7 ];
8'd9 : dat_o1 <= mymemory[9 ];
8'd13 : dat_o1 <= mymemory[13 ];
8'd25 : dat_o1 <= mymemory[25 ];
8'd57 : dat_o1 <= mymemory[57 ];
8'd124: dat_o1 <= mymemory[124];
....
8'd133: dat_o1 <= mymemory[133];
8'd155: dat_o1 <= mymemory[155];
8'd277: dat_o1 <= mymemory[277];
default: dat_o1 <= mymemory[0 ];
endcase

....
case (addr_i15)
....
--------------------- End B------------------

In terms of hardware implementation, is it certain that implementation B
saves hardware
compared to A? Will the large chunks of RTL codes causes a DC or NCVerilog
to
choke up?

Since you have not given much data, I think you can implement this
stuff with a RAM.
Why don't you use a RAM? Then you can define the RAM addresses to
model your matrix. You will generate addresses to define the positions
for your matrix which mimics your algorithm.

I used registers instead of RAM due to the memory throughput.

jtw · Mar 11, 2007

I have had similar requirements (updating state variables, or some such)
where I used dual-port RAM; I use one port for the read, and the other
(delayed a clock) for the modify-write.

The pipeline needs to be managed properly, but it can save tremendously on
registers (assuming that only one index needs to be updated at a time. If
all entries need concurrent access--well, a memory won't cut it. For my
application(s), typically TDM processing of multiple channels, it works
well.)

JTW

news reader said:
Utku Özcan said:

Hi "news reader", my humble perls in between..

It seems that you have an algorithm that handles a deterministic
distribution of the values to be accessed. Therefore you think you can
implement it with logic only.

I assume you are modeling an algorithm for a special matrix operation.

Click to expand...

It's not matrix, but the memory access is intensive, must accomplish r/w
in
single clock cycle, so register is used instead of memory.

The values do not give us much info. data_ox (x = 1, 2, ...) is
accessing which elements and in which distribution?

Click to expand...

In each clock cycle, 16 addresses are generated, and 16 data are
read/written. However,
each of the 16 data is read/written only to n/256 addresses (0<n<255).

You think that the distribution of elements can be accessed with pure
logic.
Therefore you tried to model your logic to cover every case, or you
want to do it so.

This is reasonable then.

Click to expand...

By means of case statement, I use 32 case statements, in each case
statement there
are less than 256 choices. Some have only 20, 30 choices, etc.

What means "efficience" for you? Speed or minimum logic?
If minimum logic, then please share with us the algorithm you are
trying to implement.

NCVerilog does not care about logic implementation. It defines the
behaviour of the system, no matter how the objects are linked.

Click to expand...

For example in read operation,
--------------------- implementation A------------------
input [7:0] addr_i0, addr_r1, ...addr_r15;
output [2:0] dat_o0, dat_o1, ...dat_o15;

reg [2:0] mymemory[0:255]; // Main memory

dat_o0 <= mymemory[addr_i0];
dat_o1 <= mymemory[addr_i1];
....
dat_o15 <= mymemory[addr_i15];
--------------------- End A------------------

--------------------- implementation B------------------

case (addr_i0) // I can calculate these options through simulations.
8'd0 : dat_o0 <= mymemory[0 ];
8'd5 : dat_o0 <= mymemory[5 ];
8'd54 : dat_o0 <= mymemory[54 ];
8'd122: dat_o0 <= mymemory[122];
8'd125: dat_o0 <= mymemory[125];
...
8'd166: dat_o0 <= mymemory[166];
8'd233: dat_o0 <= mymemory[233];
default: dat_o0 <= mymemory[0 ];
endcase

case (addr_i1)
8'd0 : dat_o1 <= mymemory[0 ];
8'd7 : dat_o1 <= mymemory[7 ];
8'd9 : dat_o1 <= mymemory[9 ];
8'd13 : dat_o1 <= mymemory[13 ];
8'd25 : dat_o1 <= mymemory[25 ];
8'd57 : dat_o1 <= mymemory[57 ];
8'd124: dat_o1 <= mymemory[124];
...
8'd133: dat_o1 <= mymemory[133];
8'd155: dat_o1 <= mymemory[155];
8'd277: dat_o1 <= mymemory[277];
default: dat_o1 <= mymemory[0 ];
endcase

...
case (addr_i15)
...
--------------------- End B------------------

In terms of hardware implementation, is it certain that implementation B
saves hardware
compared to A? Will the large chunks of RTL codes causes a DC or NCVerilog
to
choke up?

Since you have not given much data, I think you can implement this
stuff with a RAM.
Why don't you use a RAM? Then you can define the RAM addresses to
model your matrix. You will generate addresses to define the positions
for your matrix which mimics your algorithm.

Click to expand...

I used registers instead of RAM due to the memory throughput.

Utku.

Click to expand...

FAQ 8.12 How do I start a process in the background?	0	Feb 23, 2011
FAQ 7.9 How do I create a module?	0	Mar 1, 2011
FAQ 5.26 How do I get a file's timestamp in perl?	0	Apr 13, 2011
FAQ 8.10 How do I read and write the serial port?	0	Jan 15, 2011
FAQ 6.24 How do I match a regular expression that's in a variable?	0	Apr 19, 2011
FAQ 1.10 Can I do [task] in Perl?	0	Jan 22, 2011
FAQ 6.17 How do I efficiently match many regular expressions at once?	0	Apr 28, 2011
FAQ 5.29 How can I read in an entire file all at once?	0	Mar 16, 2011

How best do I implement routing boxes in RTL?

news reader

=?iso-8859-1?B?VXRrdSDWemNhbg==?=

news reader

jtw

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads