'Better' is always in the eye of the beholder though
And there is first bit of a rub from the standpoint of straight code re-use,
the package. If you keep the package, entity and architecture all together
and source control them, then each new design, since it will have a
different address map will need it's own package. But updating the source
file for the new design now breaks things for the old design which still
needs to be maintained. As the creator of such source, you'd want to
provide the 'package' in the form of commented out sample package code that
the user would need to create on their own. That way the code that is
invariant across all designs will not need any modification and could simply
be used.
The second rub which you're already aware of is that the package effectively
then
prevents you from using a second instance with different data widths in the
same
design. Most of the time this is not a problem since many designs simply
have
one processor type of bus so only one instance is needed, but it does
indicate where
the design doesn't scale. Maybe that's what you meant when you mentioned
something about using for a medium sized design.
Ummm....the reason one has read/write ports in a design in the first place
is because you need direct access to all of the bits in the ports at all
times. If you only need access to one port at any given time then what you
need is simple dual port memory. So you need an additional output that is
an array of T_reg_readback as an output of the entity as well.
I use a two dimensional array of std_ulogic, that way data width and the
register list length are not limited by something in the package. The
package contains functions to convert to/from the 2d array to get a specific
register element. These functions are invariant across all designs. Within
each design there may also be an enumeration list of ports, which then
require conversion between the enumeration and an integer using 'val and
'pos attributes. Also, I don't actually implement the read/write ports in
that entity for a couple of reasons:
- Many designs have a bit or two in some port(s) that need to reset with
application of a reset signal. By implementing the ports themselves, you
either need to make a global decision to reset everything for every design
(which wastes routing resources, but some probably think is OK) or bring in
some generic parameters to specify which bits in which ports need to be
affected by reset and which do not (which makes the entity somewhat more
difficult to use). Neither problem is a *big* issue to deal with though.
- Sometimes the 'port select' is really a decode to some other external hunk
of code. In other words, the whole idea of implementing a global design
read/write port entity is not what you want in the first place. By simply
implementing the decoder/encoder to route data between the master and the
various slave devices (or ports) you get something that has no package
created restrictions and can scale as needed. For example, long lists of
registers running at a high clock speed will likely require multiple clocks
to do the muxing back to read data (which is going to be the critical timing
path element for this entity)...having a generic to control how many ticks
delay there is gives the user control of that function for dealing with
clock cycle performance of the module)
Lastly, a minor nit is the defaulting of the register read back data to '0'.
This is not needed and just adds additional routing and can slow down
performance in the (likely) critical path for muxing the data.
Thanks for the posting, nice to see higher level things than "my design
doesn't work...help"
Kevin Jennings
Thanks to both KJ and Jonathan for the good ideas. Agreed - some high-
level design is nice to see! I like some of the aspects that
Jonathan's approach brings. Having wrestled the register problem
myself, and in the spirit of really diving in, I'll throw the
following into the ring.
It seems to me that the main aspects his structure brings to the table
are
1) embedding of the register names into enum literals, adding to the
readability
2) a centralized and reusable decoder/mux through which all register
accesses are funnelled
3) a nice way of parameterizing and describing the registers to allow
(2) to be created
As has also been discussed, using a higher-level tool like Denali's
Blueprint, or other open source tools, can somewhat assist with point
(1) by automatically creating VHDL code which defines the constants
for you, (or with some code hacking, could probably build an enum).
I've seen an open-source tool in use (the name eludes me) which has a
nice HTML output so you can print out a nicely formatted 20 page
register spec as soon as your VHDL is compiled... it's very nice. You
can also get "C" language header files with all the right #define's to
make the software guys happy.
The centralized mux (item 2) along with the descriptors (3) are good
for describing *where* the registers are, but not so much for
describing *what* the registers are. Clearly, they will be domain-
specific, but may have some common characteristics.
In a lot of my designs, I have a clear delineation of register
classes: (a) simple read/write, like for static mode bits etc, (b)
read-only (like status and packet counters), (c) sticky alarm bits
with uniform "write-a-'1'-back-to-clear-it' semantics. I implemented
a framework which addresses aspect (3) above as well as the register
classifications. It's (yet again) partly orthogonal and partly
overlapping to both the above schemes.
I "crack" the incoming bus transaction into a single record type as
soon as the external bus hits the pins. There's always someone who
forgets whether CS is active H or L, and forgets whether or not got
gate WE with CS!
-- instead of separate cs, re, we flags we will use this
type TRANSACTION_KIND is ( T_IDLE, T_READ, T_WRITE);
-- this TYPE is used for interfaces from cracked external bus to
-- sub-blocks and direct registers. 'data' ignored for reads.
type TRANSACTION
is record
addr : std_logic_vector(ADDRWIDTH-1 downto 1);
data : std_logic_vector(BITS_PER_WORD - 1 downto 0);
action : TRANSACTION_KIND;
end record;
Then I set up a similar range descriptor, but used a bitmask to define
which bits are ignored when decoding the address.
type MEM_RANGE_DESCRIPTOR
is record
mask : std_logic_vector(ADDRWIDTH-1 downto 0);
match : std_logic_vector(ADDRWIDTH-1 downto 0);
end record;
Some simple predicates on transactions (is_in_range, is_read,
is_write, etc) make the later code fairly readable.
Now, if I want to build a simple read/write register I can use a
common procedure such as this:
-- ------------------------------------------------------------
--
-- procedure to create a regular read/write micro register
--
-- the register is a vector of D flops: loaded with value on write,
--
-- register is automatically decoded from address & bit index given
-- ------------------------------------------------------------
procedure micro_rw_reg (
constant reg_desc : in MEM_RANGE_DESCRIPTOR;
constant trans : in TRANSACTION;
signal reg : inout std_logic_vector(BITS_PER_WORD-1 downto 0)
)
is begin
if (is_in_range(trans, reg_desc)) and is_write(trans) then
reg <= trans.data;
end if;
end micro_rw_reg;
Or if I want to create only one bit of a register (say a register with
16 individual control bits) I can call the following procedure sixteen
times:
procedure micro_rw_bit (
constant reg_desc : in MEM_RANGE_DESCRIPTOR;
constant trans : in TRANSACTION;
signal reg : inout std_logic_vector(BITS_PER_WORD-1 downto 0);
constant index: in integer
)
is begin
if (is_in_range(trans, reg_desc)) and is_write(trans) then
reg(index) <= trans.data(index);
end if;
end micro_rw_bit;
Then the actual creation of, say, and interrupt source control
register could be built like this:
--
----------------------------------------------------------------------
--
-- irq source register
--
--
----------------------------------------------------------------------
src_reg_proc: process (clk)
begin
if rising_edge(clk) then
irq_src_reg(15 downto 2) <= (others => '0');
if reset = '1' then
irq_src_reg <= (others => '0');
else
micro_rw_bit( IRQ_SRC_REG_DESC, w_trans, irq_src_reg, 1);
micro_rw_bit( IRQ_SRC_REG_DESC, w_trans, irq_src_reg, 0);
end if;
end if; -- rising edge
end process src_reg_proc;
Where I had earlier defined a signal for irq_src_reg as well as a
descriptor:
constant IRQ_SRC_REG_DESC : MEM_RANGE_DESCRIPTOR :=
( mask => '0' & x"0FFFF", match => '0' & x"00018");
Wherease I could build the three word-wide foobar, fiddle, and faddle
registers by writing this:
constant FOOBAR_REG_DESC : MEM_RANGE_DESCRIPTOR :=
( mask => '0' & x"0FFFF", match => '0' & x"00000");
constant FIDDLE_REG_DESC : MEM_RANGE_DESCRIPTOR :=
( mask => '0' & x"0FFFF", match => '0' & x"00004");
constant FADDLE_REG_DESC : MEM_RANGE_DESCRIPTOR :=
( mask => '0' & x"0FFFF", match => '0' & x"00008");
reg_trinity_proc: process (clk)
begin
if rising_edge(clk) then
if reset = '1' then
foobar_reg <= (others => '0');
fiddle_reg <= (others => '0');
faddle_reg <= (others => '0');
else
micro_rw_reg( FOOBAR_REG_DESC, w_trans, foobar_reg);
micro_rw_reg( FIDDLE_REG_DESC, w_trans, fiddle_reg);
micro_rw_reg( FADDLE_REG_DESC, w_trans, faddle_reg);
end if;
end if; -- rising edge
end process reg_trinity_proc;
The approach to building the central decoder/read mux is similar to
Jonathan's but instead of decoding directly from the descriptors, I
decoded the main bus into fixed-size address segments (say on 64K
boundaries, which matches the mask of "ffff" above). Each fixed-size
segment gets routed to one "submodule" in the design; say one
submodule has a set of registers for an interrupt controller, one
submodule has registers for a video decoder, one submodule has the
chip common version, ID, and master reset registers, etc. The main
bus interface has N transaction output ports and N data readback
ports:
entity main_bus_interface
generic ( ADDRESS_CHUNK_SIZE_IN_BITS : integer ;
NPORTS : integer)
port ( ---
external bus pins
---
txn : TRANSACTION_VECTOR (0 to NPORTS); -- e.g. 64K per port
rdata : std_logic_vector(BITS_PER_WORD downto 0) );
I can build a subfunction (like a video processor, a packet interface,
or FIFO portal) in an entity which talks to the main_bus_interface
through a standard transaction bus. I can put down individual
registers fairly simply within each submodule using the procedures
described above. And furthermore, I can put down multiple subfunctions
(say three interrupt controllers and four video processors) by
instantiating multiple entities attached to the main_bus_interface;
each identical submodule then gets an identical set of registers
created at (for example) 64K offsets from each other by virtue of the
decoding performed in the main decoder. On the main decoder txn
ports, there would only be one port with a non-IDLE transaction
occurring at any given time.
So the clear drawbacks of this are:
- hard coding of ADDRWIDTH and BITS_PER_WORD - but they can be defined
in a package per FPGA
- lacking the elegance of identifying registers with enum literals
- the registers per se are not created automagically; they still have
to be written with a procedures
- pipelining and delay through the central entity have to be managed
somehow - but that's true always
But it kind of bridges the gap: I can simplify the act of defining a
register to some fairly well-named procedures, and force the designer
to declare some nice descriptors which help the VHDL documentation.
Plus I have a fairly resuable central bus entity.
My two bits.
- Kenn