The best way to implement this non-power-of-two modulo-like function on a limited subtype?

Discussion in 'VHDL' started by Murph, Dec 4, 2006.

  1. Murph

    Murph Guest

    Hello!
    I've got two a subtypes that represents a piece on a gameboard made
    from 20 rows.

    subtype row is interger range 0 to 19;
    subtype boardSize is integer range 0 to 199;

    thus, row 0 is boardSize from 0-9, row 1 is boardSize from 10-19, etc.

    Is there any efficient way to get the related "row" from a "boardSize"?
    I know I can't do non power-of-two modulo operations. I need to do this
    a couple dozen times within my program.

    I'd love to just write a function getRow that I could call, but I don't
    want it to synthesize into something impossibly complex (Latest Xilinx
    Webpack --> Spartan 3).

    If i wrote the function with < operators, for example, would it have to
    synthesize twenty sets of comparators for each time I called it, or
    would it reuse them? (example function below)

    Or would it be a better idea to create a temporary variable and then
    use repeated subtractions in a for loop?

    Or just to write a huge if statement that would choose the right return
    values based on all 200 possible inputs?

    I don't really know which method would be the most "managable" when
    synthesized.

    Thanks for your time,
    --Murph


    function getRow(input : boardSize) return row is
    variable result : row := 0;
    begin
    if (input < 10) then
    result := 0;
    elsif (input <20) then
    result := 1;
    -- ...
    else
    result := 19;
    end if;
    return result;
    end function;
     
    Murph, Dec 4, 2006
    #1
    1. Advertising

  2. On 4 Dec 2006 15:06:50 -0800, "Murph" <> wrote:

    >Hello!
    >I've got two a subtypes that represents a piece on a gameboard made
    >from 20 rows.
    >
    >subtype row is interger range 0 to 19;
    >subtype boardSize is integer range 0 to 199;
    >
    >thus, row 0 is boardSize from 0-9, row 1 is boardSize from 10-19, etc.
    >
    >Is there any efficient way to get the related "row" from a "boardSize"?
    >I know I can't do non power-of-two modulo operations. I need to do this
    >a couple dozen times within my program.


    Is there anything stopping you from implementing the board with rows 32
    bits wide but only using the first 20 locations?

    That would allow the obvious trivial implementation of "mod" and
    simplify addressing considerably.

    If only the first 20 locations are accessed, the synthesis tools should
    optimise away resources related to the remaining 12. If "board" storage
    is in block RAM, the excess capacity is probably going to waste anyway.
    Whether it uses less resources than the "straight" implementation can
    probably only be determined by experiment.

    Alternatively, remember that division can also be accomplished by
    multiplication by the reciprocal, or an approximation to the reciprocal.
    Given a limited range of inputs, the question is, how coarse an
    approximation can you get away with? (Or in FPGA, are the multiplier
    blocks good enough?)

    Given only 200 inputs, you can simulate this, even in a spreadsheet if
    you have to, and verify correct output (or otherwise) for any
    approximation you wish to try.

    - Brian
     
    Brian Drummond, Dec 5, 2006
    #2
    1. Advertising

  3. Murph

    Ben Jones Guest

    "Brian Drummond" <> wrote in message
    news:...
    > On 4 Dec 2006 15:06:50 -0800, "Murph" <> wrote:
    >
    >>Hello!
    >>I've got two a subtypes that represents a piece on a gameboard made
    >>from 20 rows.
    >>
    >>subtype row is interger range 0 to 19;
    >>subtype boardSize is integer range 0 to 199;
    >>
    >>thus, row 0 is boardSize from 0-9, row 1 is boardSize from 10-19, etc.
    >>
    >>Is there any efficient way to get the related "row" from a "boardSize"?
    >>I know I can't do non power-of-two modulo operations. I need to do this
    >>a couple dozen times within my program.

    >
    > Is there anything stopping you from implementing the board with rows 32
    > bits wide but only using the first 20 locations?
    >
    > That would allow the obvious trivial implementation of "mod" and
    > simplify addressing considerably.


    This was my first thought too. Looks like your gameboard is 10x20 for a
    total of 200 squares. You can store that in 8 bits, but then you run into
    the problem of having to do a modulo-10 or modulo-20 operation.

    If instead you use a 4-bit field to hold the column number (0-9) and 5 bits
    for the row (0-19), you only have one extra bit to store but the row/column
    addressing is simplified. If you subsequently need the "boardsize" value
    (0-199) you can simply do 20*C + R or 10*R + C, depending on whether you're
    a row-major or column-major person. Those constant-coefficient multiplies
    are *really* cheap (20*C == (C + C<<2) << 2) - in fact just a single
    addition!

    Of course it *can* be done the other way, but it's almost certainly going to
    be a bigger and slower circuit. If you need a single-cycle implementation,
    my initial idea would be just to work out the equations for each binary
    digit of the result. The MSBs are much simpler than the LSBs. For example:

    result(4) = (input > 160)
    result(3) = (input > 80) && !(input > 160);
    result(2) = ((input > 40) && !(input > 80)) || ((input > 120) && !(input >
    160));
    result(1) = ((input > 20) && !(input > 40)) || ...

    Continue that pattern and you've got yourself a circuit. Depending on your
    synthesis tool, the original description you gave with an if...elsif...elsif
    might produce similar result. Try it and see!

    Good luck,

    -Ben-
     
    Ben Jones, Dec 5, 2006
    #3
  4. Murph

    Murph Guest

    Thanks for all the help.

    Before I go through and switch to a row/column storage method (which
    sounds like a good idea, but would result in some more work for other
    things), I was considering just implementing the whole function into a
    look-up-table that could be stored in a bram.

    I tried to create a constant array that'd represent this look-up-table:

    constant getRow : rowArray (boardSize) := (
    0,0,0,0,0,0,0,0,0,0,
    1,1,1,1,1,1,1,1,1,1,
    ....
    19,19,19,19,19,19,19,19,19,19);

    However, I get this info message when it synthesizes:

    INFO:Xst:2504 - HDL ADVISOR - Initial contents of this register
    prevents it from being combined with the ROM for implementation as
    read-only block RAM.
    INFO:Xst:1651 - Address input of ROM <Mrom__mux0026> is tied to
    register <pieceLocations_0>.

    Is there some way I can modify how I am declaring/using the "initial
    contents" of the currentPiece register so that it will be implemented
    as a bram? or is there some way to really know if that's talking about
    the same ROM as I just created? :)

    --Murph
     
    Murph, Dec 6, 2006
    #4
  5. Murph

    Ben Jones Guest

    Hi Murph,

    "Murph" <> wrote in message
    news:...
    > Thanks for all the help.
    >
    > Before I go through and switch to a row/column storage method (which
    > sounds like a good idea, but would result in some more work for other
    > things), I was considering just implementing the whole function into a
    > look-up-table that could be stored in a bram.
    >
    > I tried to create a constant array that'd represent this look-up-table:
    >
    > constant getRow : rowArray (boardSize) := (
    > 0,0,0,0,0,0,0,0,0,0,
    > 1,1,1,1,1,1,1,1,1,1,
    > ...
    > 19,19,19,19,19,19,19,19,19,19);
    >
    > However, I get this info message when it synthesizes:
    >
    > INFO:Xst:2504 - HDL ADVISOR - Initial contents of this register
    > prevents it from being combined with the ROM for implementation as
    > read-only block RAM.
    > INFO:Xst:1651 - Address input of ROM <Mrom__mux0026> is tied to
    > register <pieceLocations_0>.
    >
    > Is there some way I can modify how I am declaring/using the "initial
    > contents" of the currentPiece register so that it will be implemented
    > as a bram? or is there some way to really know if that's talking about
    > the same ROM as I just created? :)


    I think it probably *is* talking about that same ROM. As I recall, the
    address registers embedded in the BRAM cannot be initialized; well, they can
    be initialized with all-zeros, but that's it.

    Without seeing the rest of the code it's hard to say, but I expect somewhere
    you have a line in a clocked process more or less like

    somethingOrOther <= getRow(currentPiece);

    If the default value of currentPiece is non-zero, then the rowArray ROM
    cannot be packed into a read-only BRAM.

    I actually think a distributed ROM implementation is better than a BRAM for
    this table. It only requires 5x200 = 1kbit, whereas you are targeting
    Spartan-3, which has 18kbit BRAM resources. So rather than waste a BRAM, you
    could just leave it to be implemented in the fabric. This also allows XST to
    make optimizations based on the (simple) pattern of the contents.

    So I just ran this table through XST targeting a LUT ROM, and it comes out
    as just 10 slices.... that's probably going to be hard to beat.

    Cheers,

    -Ben-
     
    Ben Jones, Dec 6, 2006
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Patrick Kowalzick
    Replies:
    5
    Views:
    485
    Patrick Kowalzick
    Mar 14, 2006
  2. Replies:
    8
    Views:
    379
    Mark Dickinson
    Apr 17, 2008
  3. qharz

    modulo function

    qharz, May 18, 2009, in forum: VHDL
    Replies:
    3
    Views:
    583
  4. dino d.
    Replies:
    1
    Views:
    165
    Gregor Kofler
    Aug 13, 2010
  5. ast
    Replies:
    8
    Views:
    219
    Evertjan.
    Feb 23, 2011
Loading...

Share This Page