Storing many 32-bits "parameters" ?

Frédéric Lochon · Nov 29, 2008

Hello,

I have a VHDL project which is implemented in a Virtex5. This Virtex5 is
connected to a LAN chipset.
I use this system to communicate with a PC, and more precisely this
system gets "many" 32-bits parameters.

And here is the problem: such a high number of 32-bits parameters (more
than 100) increases the synthesis and implementation time quite
significantly because of the number of registers inferred.

So, I'm looking for a way of optimizing such a "high" number of
registers. Because of how the parameters are acquired in the FPGA
(through the "slow" LAN compared to an FPGA), I can accept that the
storage of these parameters (which are used in several different places
in the design) is not "fast", i.e. if there is a high propagation time
through the whole design, it's ok.

In other words, is there a way of storing such a number of parameters
which would not degrade the synthesis and implementation times knowing
the I have no criteria on the performance.

I can very easily handle up to 1 ms (which is *huge* compared to the
50MHz clock) between when the parameters gets into the FPGA and when I
can use them in different VHDL modules (possibly at the same time), but,
once they are usable, several parameters may be accessed at the same time.

My first thought was about using some Coregen IP which would reduce (at
least) synthesis time, for example distributed RAM. But I don't think it
would be that efficient (I would use something like 4 times more bits
than necessary).
I also thought using a RAM with different input and output width but the
ratio is quite small.

In the past, I tried using "partitions", but ISE (at least 9.1) doesn't
handle partitions very well, and was even more buggy.

If someone has any idea, I'm interested.

Thanks in advance.

Mike Treseler · Nov 29, 2008

Frédéric Lochon said:
And here is the problem: such a high number of 32-bits parameters (more
than 100) increases the synthesis and implementation time quite
significantly because of the number of registers inferred.

This is the downside to debugging
by trial and error synthesis.
It may work fine in small cases,
but it falls apart in large ones.
A Virtex5 design might take an hour
to re-image and test on the bench.
If I am lucky enough to find the problem
this way, it will take me another hour
to test the solution even if the fix
is one line of code.

By writing my own synthesis and testbench code,
I can code and sim a small change in a minute.
I might run an synthesis once a week, but
to check Fmax and utilization, not functionality.

So, I'm looking for a way of optimizing such a "high" number of
registers. Because of how the parameters are acquired in the FPGA
(through the "slow" LAN compared to an FPGA), I can accept that the
storage of these parameters (which are used in several different places
in the design) is not "fast", i.e. if there is a high propagation time
through the whole design, it's ok.

The only thing that synthesizes faster than a register is a wire.
Changing the design to a ram or shift registers will
not make much difference to synthesis time.

In the past, I tried using "partitions", but ISE (at least 9.1) doesn't
handle partitions very well, and was even more buggy.

Yes, partitions are rarely worth the pain.
ISE might have synthesis options
like quartus smart-compile to trade
some disk space for synthesis time.

Good luck on your design.

-- Mike Treseler

Eli Bendersky · Nov 30, 2008

Hello,

I have a VHDL project which is implemented in a Virtex5. This Virtex5 is
connected to a LAN chipset.
I use this system to communicate with a PC, and more precisely this
system gets "many" 32-bits parameters.

And here is the problem: such a high number of 32-bits parameters (more
than 100) increases the synthesis and implementation time quite
significantly because of the number of registers inferred.

Perhaps I'm mis-understanding you, but you need to receive ~100 32-bit
parameters, which is 400 bytes. What is wrong with instantiating a RAM
block of the FPGA and filling the parameters into it from the LAN
chip ?

Eli

Frédéric Lochon · Nov 30, 2008

Eli Bendersky a écrit :

Perhaps I'm mis-understanding you, but you need to receive ~100 32-bit
parameters, which is 400 bytes. What is wrong with instantiating a RAM
block of the FPGA and filling the parameters into it from the LAN
chip ?

There's nothing wrong with that. In fact, I'm doing it this way in a
first step.
But, as different LAN packets concern different modules, I have to load
them from the packet RAM and store them in registers.
This way I can use them at any clock edge without loosing time accessing
them (which would be a real problem) and I can access all parameters at
the same time.
If I had used another RAM to store module-specific parameters, there
would have been access time considerations and it would have been
difficult (if not impossible) to access several parameters at the same time.

I think that a good trick would have been to have some kind of N-port
(read-only ports) RAM (most probably distributed-RAM) which, afaik,
doesn't exist for Virtex5.
I believe that such an "IP" would most certainly reduce synthesis time
and probably implementation time.

KJ · Nov 30, 2008

Frédéric Lochon said:
Eli Bendersky a écrit :

There's nothing wrong with that. In fact, I'm doing it this way in a first
step.
But, as different LAN packets concern different modules, I have to load
them from the packet RAM and store them in registers.
This way I can use them at any clock edge without loosing time accessing
them (which would be a real problem) and I can access all parameters at
the same time.
If I had used another RAM to store module-specific parameters, there would
have been access time considerations and it would have been difficult (if
not impossible) to access several parameters at the same time.

Since the parameters eventually need to be stored in registers in order to
use them for their function you don't gain any inherent functionality by
storing them in some RAM. What the RAM does buy you is decoupling of the
parameter load from the usage (i.e. new parameters can get loaded while
still using the old ones). You haven't stated though whether or not that is
a requirement or not for your application. If it's not a requirement, then
adding the RAM only makes your life more difficult. If it is a requirement,
and the RAM therefore is needed to hold the parameters until they can be
used later, then one useful trick is to make the 100 or so parameter
registers be in a large shift chain. The RAM transfers the data into the
start of the chain which then transfers it to the next, etc. You still have
the individual registers and the RAM, but now the fanout is about as low as
you can get. It's easiest if you can load all of the parameters in the
chain, a bit more difficult if the loading needs to be selective.

I think that a good trick would have been to have some kind of N-port
(read-only ports) RAM (most probably distributed-RAM) which, afaik,
doesn't exist for Virtex5.

They're called flip flops.

I believe that such an "IP" would most certainly reduce synthesis time

I doubt it.

and probably implementation time.

As Mike mentioned in his post, simulation using testbenches is the best
method to date for reducing the amount of time needed to get to a working
design (along with static timing analysis). Long build times are only a
problem when you try to debug on the bench/system.

Kevin Jennings

Frédéric Lochon · Nov 30, 2008

Mike Treseler a écrit :

Frédéric Lochon wrote:

The only thing that synthesizes faster than a register is a wire.
Changing the design to a ram or shift registers will
not make much difference to synthesis time.

I already noticed once a clear difference on synthesis and
implementation time when I made a small modification in a vhdl code that
made XST incapable of recognizing that a signal should be implemented as
a block RAM. I know there is a big difference between block-RAM and
registers, and this is why there is such a difference on synthesis time.

Yes, partitions are rarely worth the pain.
ISE might have synthesis options
like quartus smart-compile to trade
some disk space for synthesis time.

I think that in the future I will partition my design physically using 2
FPGA, because my design is part of a versatile test system which has 1
part quite "constant" and 1 part "variable", and the "constant" part is
the most important part and thus the part which has the bigger impact on
synthesis time.

You're right when you say that simulation is the best way to have a
working design, but because my design is at 90% the same whatever the
final application, the synthesis time is quite annoying when the
application is "simple", and this is why I'm interested in reducing the
synthesis and implementation time.

Mike Treseler · Dec 1, 2008

Frédéric Lochon said:
I already noticed once a clear difference on synthesis and
implementation time when I made a small modification in a vhdl code that
made XST incapable of recognizing that a signal should be implemented as
a block RAM. I know there is a big difference between block-RAM and
registers, and this is why there is such a difference on synthesis time.

Yes, a block ram synthesizes faster than the
same number of registers. However, a block
ram by itself, does not solve your problem.

I think that in the future I will partition my design physically using 2
FPGA, because my design is part of a versatile test system which has 1
part quite "constant" and 1 part "variable", and the "constant" part is
the most important part and thus the part which has the bigger impact on
synthesis time.

Good luck.
Note that any changed to the "constant" part
is a start-over on partitioning.

You're right when you say that simulation is the best way to have a
working design, but because my design is at 90% the same whatever the
final application, the synthesis time is quite annoying when the
application is "simple", and this is why I'm interested in reducing the
synthesis and implementation time.

Everyone is entitled to their own preferences.
I find bouncing bits on the simulator to
be an amusing pastime.

-- Mike Treseler

Minimising chi square to fit two parameters	1	Dec 11, 2022
Register clear on read	0	Jun 7, 2019
64 bits values or 32 bits	2	Jun 30, 2011
Integers which are more than 32 bit	1	Mar 19, 2014
32/64 bit cc differences	110	Jan 10, 2014
Reading and "storing" 32 bits values	1	Jan 29, 2005
Any integer number is always 32 bits	2	May 14, 2012
Counting number of asserted register bits in VHDL	12	May 14, 2013

Storing many 32-bits "parameters" ?

Frédéric Lochon

Mike Treseler

Eli Bendersky

Frédéric Lochon

KJ

Frédéric Lochon

Mike Treseler

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads