Re: Count bits in VHDL, with loop and unrolled loop producesdifferent results

Discussion in 'VHDL' started by a s, Mar 2, 2011.

  1. a s

    a s Guest

    On Mar 2, 5:52 pm, Gabor <> wrote:
    > I didn't catch which device you are targeting, but I
    > decided to try this myself with XST and Spartan 3A,
    > using Verilog to see if there are any significant
    > differences in synthesis performance.


    I am targeting Virtex4FX.

    > Here's the code:
    > module count_bits
    > #(
    >   parameter IN_WIDTH = 32,
    >   parameter OUT_WIDTH = 6
    > )
    > (
    >   input wire  [IN_WIDTH-1:0]  data_in,
    >   output reg [OUT_WIDTH-1:0]  data_out
    > );
    >
    > always @*
    > begin : proc
    >   integer i;
    >   integer sum;
    >   sum = 0;
    >   for (i = 0;i < IN_WIDTH;i = i + 1) sum = sum + data_in;
    >   data_out = sum;
    > end
    >
    > endmodule
    >
    > And the results for the 32-bit case (XST)
    >
    > Number of Slices:                       41  out of   1792     2%  
    > Number of 4 input LUTs:                 73  out of   3584     2%  
    >
    > which is very close to your original unrolled result.


    I get the same results with XST targeting V4.

    But that's really interesting how XST produces better results
    with Verilog than with VHDL for basically exactly the same input.

    Running your module through Synopsys results again
    in seemingly "optimum" 57LUTs and 34 slices.

    I find it pretty amusing how many options did we come up already
    with such a "basic" problem as is counting ones in a word. ;-)

    Regards
     
    a s, Mar 2, 2011
    #1
    1. Advertising

  2. Re: Count bits in VHDL, with loop and unrolled loop produces different results

    In comp.arch.fpga a s <> wrote:
    (snip)

    > Running your module through Synopsys results again
    > in seemingly "optimum" 57LUTs and 34 slices.


    One should probably also compare propagation delay in addition
    to the number of LUTs or slices used. I don't believe it is
    large, but there is some tradeoff between the two. Worst
    delay would be (N-1) consecutive adders, increasing in width
    down the line.

    > I find it pretty amusing how many options did we come up already
    > with such a "basic" problem as is counting ones in a word. ;-)


    -- glen
     
    glen herrmannsfeldt, Mar 2, 2011
    #2
    1. Advertising

  3. a s

    JustJohn Guest

    On Mar 2, 12:38 pm, a s <> wrote:
    > On Mar 2, 5:52 pm, Gabor <> wrote:
    >
    > > I didn't catch which device you are targeting, but I
    > > decided to try this myself with XST and Spartan 3A,
    > > using Verilog to see if there are any significant
    > > differences in synthesis performance.

    >
    > I am targeting Virtex4FX.
    >
    >
    >
    >
    >
    > > Here's the code:
    > > module count_bits
    > > #(
    > >   parameter IN_WIDTH = 32,
    > >   parameter OUT_WIDTH = 6
    > > )
    > > (
    > >   input wire  [IN_WIDTH-1:0]  data_in,
    > >   output reg [OUT_WIDTH-1:0]  data_out
    > > );

    >
    > > always @*
    > > begin : proc
    > >   integer i;
    > >   integer sum;
    > >   sum = 0;
    > >   for (i = 0;i < IN_WIDTH;i = i + 1) sum = sum + data_in;
    > >   data_out = sum;
    > > end

    >
    > > endmodule

    >
    > > And the results for the 32-bit case (XST)

    >
    > > Number of Slices:                       41  outof   1792     2%  
    > > Number of 4 input LUTs:                 73  out of   3584     2%  

    >
    > > which is very close to your original unrolled result.

    >
    > I get the same results with XST targeting V4.
    >
    > But that's really interesting how XST produces better results
    > with Verilog than with VHDL for basically exactly the same input.
    >
    > Running your module through Synopsys results again
    > in seemingly "optimum" 57LUTs and 34 slices.
    >
    > I find it pretty amusing how many options did we come up already
    > with such a "basic" problem as is counting ones in a word. ;-)
    >
    > Regards- Hide quoted text -
    >
    > - Show quoted text -


    Eight years ago (Sept/Oct 2003), we went through this exercise in the
    thread "Counting Ones" (I was posting as JustJohn back then, not
    John_H). See that thread for some ASCII art of the trees. I ended up
    with the following VHDL function that produces "optimum" 55 4-input
    LUTs for 32-bit vector input. I haven't seen anything better yet. I
    liked Andy's recursion suggestion, it'll take some thought to figure
    out how to auto-distribute the carry-in bits to the adders.

    Yesterday, Gabor posted 35 6-input LUTs.
    Gabor, what code did you use?
    I think a nice challenge to the C.A.F. group mind is to beat that.

    John L. Smith

    -- This function counts bits = '1' in a 32-bit word, using a tree
    -- structure with Full Adders at leafs for "minimum" logic
    utilization.
    function vec32_sum2( in_vec : in UNSIGNED ) return UNSIGNED is
    type FA_Arr_Type is array ( 0 to 9 ) of UNSIGNED( 1 downto
    0 );
    variable FA_Array : FA_Arr_Type;
    variable result : UNSIGNED( 5 downto 0 );
    variable Leaf_Bits : UNSIGNED( 2 downto 0 );
    variable Sum3_1 : UNSIGNED( 2 downto 0 );
    variable Sum3_2 : UNSIGNED( 2 downto 0 );
    variable Sum3_3 : UNSIGNED( 2 downto 0 );
    variable Sum3_4 : UNSIGNED( 2 downto 0 );
    variable Sum3_5 : UNSIGNED( 2 downto 0 );
    variable Sum4_1 : UNSIGNED( 3 downto 0 );
    variable Sum4_2 : UNSIGNED( 3 downto 0 );
    variable Sum5_1 : UNSIGNED( 4 downto 0 );
    begin
    for i in 0 to 9 loop
    Leaf_Bits := in_vec( 3 * i + 2 downto 3 * i );
    case Leaf_Bits is
    when "000" => FA_Array( i ) := "00";
    when "001" => FA_Array( i ) := "01";
    when "010" => FA_Array( i ) := "01";
    when "011" => FA_Array( i ) := "10";
    when "100" => FA_Array( i ) := "01";
    when "101" => FA_Array( i ) := "10";
    when "110" => FA_Array( i ) := "10";
    when others => FA_Array( i ) := "11";
    end case;
    end loop;
    Sum3_1 := ( "0" & FA_Array( 0 ) ) + ( "0" & FA_Array( 1 ) );
    Sum3_2 := ( "0" & FA_Array( 2 ) ) + ( "0" & FA_Array( 3 ) );
    Sum3_3 := ( "0" & FA_Array( 4 ) ) + ( "0" & FA_Array( 5 ) );
    Sum3_4 := ( "0" & FA_Array( 6 ) ) + ( "0" & FA_Array( 7 ) )
    + ( "00" & in_vec( 30 ) );
    Sum3_5 := ( "0" & FA_Array( 8 ) ) + ( "0" & FA_Array( 9 ) )
    + ( "00" & in_vec( 31 ) );
    Sum4_1 := ( "0" & Sum3_1 ) + ( "0" & Sum3_2 );
    Sum4_2 := ( "0" & Sum3_3 ) + ( "0" & Sum3_4 );
    Sum5_1 := ( "0" & Sum4_1 ) + ( "0" & Sum4_2 );
    result := ( "0" & Sum5_1 )
    + ( "000" & Sum3_5 );
    return result;
    end vec32_sum2;
     
    JustJohn, Mar 4, 2011
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. sarmin kho
    Replies:
    2
    Views:
    846
    A. Lloyd Flanagan
    Jun 15, 2004
  2. Miki Tebeka
    Replies:
    1
    Views:
    455
    Marcin 'Qrczak' Kowalczyk
    Jun 14, 2004
  3. a s
    Replies:
    16
    Views:
    4,722
    JustJohn
    Mar 8, 2011
  4. Gabor Sz
    Replies:
    0
    Views:
    873
    Gabor Sz
    Mar 5, 2011
  5. hara
    Replies:
    4
    Views:
    168
    David Squire
    May 25, 2006
Loading...

Share This Page