This question seems simpler than it actually is...

Discussion in 'VHDL' started by arubin, Sep 25, 2006.

  1. arubin

    arubin Guest

    Hello All,

    Let's start with a simple problem: Given a buffer of "Buf_Depth"
    elements (let's say they are unsigned vectors with "Data_Width" number
    of bits), find the minimum element in the buffer.

    Our first solution is straightforward:

    -----------------------
    PROCESS
    VARIABLE min : UNSIGNED(Data_Width-1 DOWNTO 0);
    BEGIN

    min := Data_Buf(0);

    FOR i IN 1 TO Buf_Depth-1 LOOP
    IF (Data_Buf(i) < min) THEN
    min := Data_Buf(i)
    END IF;
    END LOOP;

    FinalResult <= min;

    END PROCESS;
    --------------------------

    The nice thing about this solution is that it will work for any value
    "Buf_Depth". Everything seems great until we look at the timing
    analysis (after synthesis). This design is painfully slow.

    To get better performance, we can speed up the design with some
    pipelining. Setting "Buf_Depth" = 4, we get:

    -----------------------

    SIGNAL pipeA, pipeB : UNSIGNED(Data_Width-1 DOWNTO 0);

    PROCESS(Clk)
    BEGIN
    IF (rising_edge(Clk)) THEN

    IF (Data_Buf(0) < Data_Buf(1)) THEN
    pipeA <= Data_Buf(0);
    ELSE
    pipeA <= Data_Buf(1);
    END IF;

    IF (Data_Buf(2) < Data_Buf(3)) THEN
    pipeB <= Data_Buf(2);
    ELSE
    pipeB <= Data_Buf(3);
    END IF;

    IF (pipeA < pipeB) THEN
    FinalResult <= pipeA;
    ELSE
    FinalResult <= pipeB;
    END IF;

    END IF;
    END PROCESS;

    -----------------------

    This solution (after synthesis) is very fast, but Buf_Depth is hard
    coded to '4' - ie it is not a generic solution.

    So here is my question: How can I code up a VHDL block similar to
    the pipelined solution that can be instantiated for any value of
    Buff_Depth?

    Thanks,
    Amir
     
    arubin, Sep 25, 2006
    #1
    1. Advertising

  2. arubin

    David Ashley Guest

    arubin wrote:
    > Hello All,
    >
    > Let's start with a simple problem: Given a buffer of "Buf_Depth"
    > elements (let's say they are unsigned vectors with "Data_Width" number
    > of bits), find the minimum element in the buffer.
    >
    > Our first solution is straightforward:
    >
    > -----------------------
    > PROCESS
    > VARIABLE min : UNSIGNED(Data_Width-1 DOWNTO 0);
    > BEGIN
    >
    > min := Data_Buf(0);
    >
    > FOR i IN 1 TO Buf_Depth-1 LOOP
    > IF (Data_Buf(i) < min) THEN
    > min := Data_Buf(i)
    > END IF;
    > END LOOP;
    >
    > FinalResult <= min;
    >
    > END PROCESS;
    > --------------------------
    >
    > The nice thing about this solution is that it will work for any value
    > "Buf_Depth". Everything seems great until we look at the timing
    > analysis (after synthesis). This design is painfully slow.
    >
    > To get better performance, we can speed up the design with some
    > pipelining. Setting "Buf_Depth" = 4, we get:
    >
    > -----------------------
    >
    > SIGNAL pipeA, pipeB : UNSIGNED(Data_Width-1 DOWNTO 0);
    >
    > PROCESS(Clk)
    > BEGIN
    > IF (rising_edge(Clk)) THEN
    >
    > IF (Data_Buf(0) < Data_Buf(1)) THEN
    > pipeA <= Data_Buf(0);
    > ELSE
    > pipeA <= Data_Buf(1);
    > END IF;
    >
    > IF (Data_Buf(2) < Data_Buf(3)) THEN
    > pipeB <= Data_Buf(2);
    > ELSE
    > pipeB <= Data_Buf(3);
    > END IF;
    >
    > IF (pipeA < pipeB) THEN
    > FinalResult <= pipeA;
    > ELSE
    > FinalResult <= pipeB;
    > END IF;
    >
    > END IF;
    > END PROCESS;
    >
    > -----------------------
    >
    > This solution (after synthesis) is very fast, but Buf_Depth is hard
    > coded to '4' - ie it is not a generic solution.
    >
    > So here is my question: How can I code up a VHDL block similar to
    > the pipelined solution that can be instantiated for any value of
    > Buff_Depth?
    >
    > Thanks,
    > Amir
    >


    This sounds like a good candidate for recursion. I don't know
    how you'd code it in VHDL, but the theory would be
    to implement a function that would return the minimum
    element of an array. Here's some 'c' code that works:

    int min(int *v, int n)
    {
    int l, r, n2;
    if(n==1) return *v;
    n2 = n/2;
    l=min(v, n2);
    r=min(v+n2, n-n2);
    return l<r ? l : r;
    }

    Just convert this to an equivalent VHDL function
    declaration.

    -Dave

    --
    David Ashley http://www.xdr.com/dash
    Embedded linux, device drivers, system architecture
     
    David Ashley, Sep 25, 2006
    #2
    1. Advertising

  3. arubin

    KJ Guest

    To refine David's suggestion a bit and to incorporate the pipelining
    you want to improve performance you'll probably want to make a
    procedure that takes as input both the array that you want to get the
    minimum value of along with another integer parameter (a constant, call
    it 'N' for now) that defines how many clock cycles you're willing to
    wait for this to be calculated.

    This procedure would then have 'N' calls to David's minimum value
    function each function call working on BufDepth / N elements of the
    array. After one clock cycle that procedure would then have an array
    of 'N' values which you would then find the minimum of by repeating
    this process (as David mentions recursion would be handy here).

    Obviously a few of those sticky details would need to be worked out but
    that would be the basic idea.

    KJ
     
    KJ, Sep 25, 2006
    #3
  4. arubin

    rickman Guest

    KJ wrote:
    > To refine David's suggestion a bit and to incorporate the pipelining
    > you want to improve performance you'll probably want to make a
    > procedure that takes as input both the array that you want to get the
    > minimum value of along with another integer parameter (a constant, call
    > it 'N' for now) that defines how many clock cycles you're willing to
    > wait for this to be calculated.
    >
    > This procedure would then have 'N' calls to David's minimum value
    > function each function call working on BufDepth / N elements of the
    > array. After one clock cycle that procedure would then have an array
    > of 'N' values which you would then find the minimum of by repeating
    > this process (as David mentions recursion would be handy here).
    >
    > Obviously a few of those sticky details would need to be worked out but
    > that would be the basic idea.


    I think all of you are going about this the wrong way (IMHO). First, I
    find a lot of logic designers forget what HDL stands for... Hardware
    Description Language. Instead of designing their logic to suit both
    the behavior required as well as the practical requirements of the
    problem, they often start coding the problem as if they were writing
    software. The result is... well I don't know what the result is and
    that is the point, neither do they! Of course if you have coded a
    problem in the past in a similar style you can anticipate what the
    compiler will do with your code. But if you have not written this sort
    of code before, you won't know!

    I design my logic without writing a line of HDL. Then once I am happy
    with the approach, I start writing HDL to "describe" the logic I want.
    Every HDL tool I have seen in the last 10 years has provided templates
    for HDL that the tool understands and explains how it will map to
    hardware. If you use those templates to describe your hardware, you
    will get what you have designed and you will understand your result
    completely.

    The difference between the two approaches above is that one chains N-1
    compares in a serial fashion and the other seems to be oriented in a
    tree structure. I actually have no idea exactly how the logic will
    work out for the serial coding because the compares are all done
    against a variable rather than between the input elements. I have
    never coded in this style so I can't say how the synthesis will
    translate this.

    The tree solution can be designed using Generate statements and a few
    basic calculations to figure out the number of levels (log of Buf_Depth
    with upward rounding) and the number of compares in the first level
    (difference between 2 to the power of the number of levels and
    Buf_Depth). If this is hard to think of just consider that you are
    describing a logarithmic triangle, so to speak. This should be a
    simple geometry problem.

    BTW, this type of solution is not likely to be very practical for a
    buffer of any size since it will consume O(N) hardware resources.
     
    rickman, Sep 25, 2006
    #4
  5. arubin

    KJ Guest

    rickman wrote:
    > KJ wrote:
    > > To refine David's suggestion a bit and to incorporate the pipelining
    > > you want to improve performance you'll probably want to make a
    > > procedure that takes as input both the array that you want to get the
    > > minimum value of along with another integer parameter (a constant, call
    > > it 'N' for now) that defines how many clock cycles you're willing to
    > > wait for this to be calculated.
    > >
    > > This procedure would then have 'N' calls to David's minimum value
    > > function each function call working on BufDepth / N elements of the
    > > array. After one clock cycle that procedure would then have an array
    > > of 'N' values which you would then find the minimum of by repeating
    > > this process (as David mentions recursion would be handy here).
    > >
    > > Obviously a few of those sticky details would need to be worked out but
    > > that would be the basic idea.

    >
    > I think all of you are going about this the wrong way (IMHO).

    I don't...in my opinion ;)

    > First, I
    > find a lot of logic designers forget what HDL stands for... Hardware
    > Description Language.

    Yep...and the 'V' in VHDL stands for 'VHSIC' which stands for Very High
    Speed Integrated Circuit....not sure what you were getting at, but this
    just as relevant.

    > Instead of designing their logic to suit both
    > the behavior required as well as the practical requirements of the
    > problem, they often start coding the problem as if they were writing
    > software.

    Again, kind of missing the relevance to the original post?

    >The result is... well I don't know what the result is and
    > that is the point, neither do they! Of course if you have coded a
    > problem in the past in a similar style you can anticipate what the
    > compiler will do with your code. But if you have not written this sort
    > of code before, you won't know!

    I do. I've written min/max functions before. Many times it is simply
    to search through some set of constants/generics and find the min/max
    and return a constant....and that's exactly what is returned and
    synthesized...a constant. I've also written much more complicated
    functions to do all kinds of things and again, what got synthesized is
    right along the lines of what I wanted and expected to pop out.

    Granted the poster is looking to do min on signals and shouldn't get a
    constant as a result but if one can succintly code what he is looking
    to do and parameterize it so as to allow for multiple clock cycle for
    pipelining roughly along the lines of how I suggested and have it work
    out just fine.

    >
    > I design my logic without writing a line of HDL. Then once I am happy
    > with the approach, I start writing HDL to "describe" the logic I want.
    > Every HDL tool I have seen in the last 10 years has provided templates
    > for HDL that the tool understands and explains how it will map to
    > hardware. If you use those templates to describe your hardware, you
    > will get what you have designed and you will understand your result
    > completely.

    The only thing in the original post was a 'min' function which should
    translate into a subtractor to implement each (if X < Y)....not much of
    a stretch there in my opinion.

    >
    > The difference between the two approaches above is that one chains N-1
    > compares in a serial fashion and the other seems to be oriented in a
    > tree structure.

    Serial versus pipelined is what I would call it but OK.

    > I actually have no idea exactly how the logic will
    > work out for the serial coding because the compares are all done
    > against a variable rather than between the input elements. I have
    > never coded in this style so I can't say how the synthesis will
    > translate this.

    As with all other code...it will translate it exactly as written....by
    that I mean the synthesis output will be functionally equivalent to the
    source code.....if you're not always getting that all the time then you
    have a tool problem.

    >
    > The tree solution can be designed using Generate statements and a few
    > basic calculations to figure out the number of levels (log of Buf_Depth
    > with upward rounding) and the number of compares in the first level
    > (difference between 2 to the power of the number of levels and
    > Buf_Depth). If this is hard to think of just consider that you are
    > describing a logarithmic triangle, so to speak. This should be a
    > simple geometry problem.

    Like I said, I did leave some of the nasty details to the original
    poster...I just gave an outline of how I would approach it.

    >
    > BTW, this type of solution is not likely to be very practical for a
    > buffer of any size since it will consume O(N) hardware resources.

    Not true. 'N' will be bounded and will be synthesizable into 'some'
    number of logic cells. Presumably the poster already has a family of
    parts in mind that he/she is targetting and will either get it to fit
    in the design or say...DANG....I can't do it this way and see about if
    it would be possible to stream the data in and calculate the minimum on
    the fly....that way only needing one comparator for the entire design.
    But that's not what they were asking for, maybe there is some reason
    that the data can not be streamed in but is just 'there' and now he
    needs to find the minimum of it.

    My 2 cents.
    KJ
     
    KJ, Sep 25, 2006
    #5
  6. arubin

    David Ashley Guest

    rickman wrote:
    > I think all of you are going about this the wrong way (IMHO). First, I
    > find a lot of logic designers forget what HDL stands for... Hardware
    > Description Language. Instead of designing their logic to suit both
    > the behavior required as well as the practical requirements of the
    > problem, they often start coding the problem as if they were writing
    > software. The result is... well I don't know what the result is and
    > that is the point, neither do they! Of course if you have coded a


    I think I understand the point you're trying to make but that's not
    where the approach is coming from. I'm not trying to shoehorn
    'c' into VHDL, far from it. What I'm trying to do is express the
    concept of automatically building up the comparator tree, yet
    have the code be readable + intuitive.

    For the exercise I went ahead and did this with a 6 element array.
    It simulates fine, and then I synthesized it. The synthesis report
    indicates 5 comparators were used, which is exactly what would
    be required:

    HDL Synthesis Report

    Macro Statistics
    # Registers : 3
    8-bit register : 3
    # Comparators : 5
    8-bit comparator greater : 2
    8-bit comparator less : 3
    # Multiplexers : 3
    8-bit 4-to-1 multiplexer : 2
    8-bit 8-to-1 multiplexer : 1

    Below is the code for reference.

    Now, I look at the function "min" and it seems perfectly
    intuitive what it is doing. And it maps perfectly into VHDL,
    and it seems to get the job done.

    There is an accepted principle that any algorithm implemented
    using recursion can be implemented without recursion. However
    I can't imagine a non-recursive form being this nifty.

    Now, in practice, I can't see much use for this sort of thing
    in the first place. I think in actual digital circuit design one
    wouldn't have all the elements of an array built up with their
    values. You'd have to have logic to initialize the array
    elements -- I doubt if in one clock cycle you'd fill out the whole
    array. So if you're going to spread out the initialization of the
    array across multiple clock cycles, you can take advantage
    of that and spread out the mechanism to determine the
    minimum value over those very same clock cycles.

    I look at this as just an interesting exercise. A possible
    application might be if you have X devices trying to control
    a resource, and each has a priority associated with it, you
    want to pick the device with the highest priority.

    -Dave

    library IEEE;
    use ieee.numeric_std.all;
    use ieee.std_logic_1164.all;

    entity system is
    port (
    clk, rst_p : inout std_logic;
    min_out : out std_logic_vector(7 downto 0));
    end system;

    architecture behav of system is


    subtype entry is unsigned(7 downto 0);
    type invect is array (natural range <>) of entry;
    signal vec : invect(0 to 5);
    signal minimum : entry;

    type state is (c0, c1, c2, c3);
    signal mystate : state := c0;

    function min(iv : invect) return entry is
    variable n2 : integer;
    variable l, r : entry;
    begin
    if iv'length = 1 then
    return iv(iv'left);
    end if;
    n2 := (iv'left + iv'right + 1) / 2;
    l := min(iv(iv'left to n2-1));
    r := min(iv(n2 to iv'right));
    if l < r then
    return l;
    else
    return r;
    end if;
    end;


    begin


    -- pragma translate_off
    process
    begin
    clk <= '0';
    for i in 0 to 100 loop
    wait for 5 ns;
    clk <= not clk;
    end loop;
    wait;
    end process;
    process
    begin
    rst_p <= '1';
    wait for 20 ns;
    rst_p <= '0';
    wait;
    end process;
    -- pragma translate_on




    process(clk)
    begin

    if clk'event and clk='1' then
    if rst_p = '1' then
    mystate <= c0;
    end if;
    case mystate is
    when c0 =>
    vec(0) <= X"22";
    vec(1) <= X"11";
    vec(2) <= X"55";
    vec(3) <= X"77";
    vec(4) <= X"33";
    vec(5) <= X"99";
    mystate <= c1;
    when c1 =>
    vec(3) <= X"05";
    mystate <= c2;
    when c2 =>
    vec(3) <= X"77";
    vec(1) <= X"aa";
    mystate <= c3;
    when c3 =>
    vec(5) <= X"02";
    mystate <= c0;
    end case;
    end if;

    end process;

    minimum <= min(vec);
    min_out <= std_logic_vector(minimum);

    end behav;


    --
    David Ashley http://www.xdr.com/dash
    Embedded linux, device drivers, system architecture
     
    David Ashley, Sep 25, 2006
    #6
  7. arubin

    KJ Guest

    "David Ashley" <> wrote in message
    news:...

    I agree with everything you said and pass kudos for posting code and
    results...as well as for the DDR thing you had posted a while ago too.

    > Now, I look at the function "min" and it seems perfectly
    > intuitive what it is doing. And it maps perfectly into VHDL,
    > and it seems to get the job done.

    And that's what makes it a 'good' implementation in my book. It maps well
    to the hardware and is painfully obvious to understand.

    >
    > There is an accepted principle that any algorithm implemented
    > using recursion can be implemented without recursion. However
    > I can't imagine a non-recursive form being this nifty.
    >
    > Now, in practice, I can't see much use for this sort of thing
    > in the first place. I think in actual digital circuit design one
    > wouldn't have all the elements of an array built up with their
    > values. You'd have to have logic to initialize the array
    > elements -- I doubt if in one clock cycle you'd fill out the whole
    > array.

    What about if on every clock cycle you're getting some sensor inputs and you
    need something to find the minimum for whatever reason as part of the signal
    processing. Those sensor inputs would all be in parallel.

    > So if you're going to spread out the initialization of the
    > array across multiple clock cycles, you can take advantage
    > of that and spread out the mechanism to determine the
    > minimum value over those very same clock cycles.

    Probably true in many cases but the original poster may have something else
    in mind that would preclude this....like the above mentioned sensor inputs.
    Although if he hadn't considered this approach the mention that had now been
    made a couple times in this thread may tickle a brain cell on his side to
    question if the serial approach might be better....since there is a logic
    resource issue to contend with doing it in parallel.

    > I look at this as just an interesting exercise.

    Yep, and the good thing is that sometimes you get new insights when
    pondering these exercises. Even if not directly applicable now, you've
    tickled a brain cell or two that might remember it when you need it.

    KJ
     
    KJ, Sep 26, 2006
    #7
  8. arubin

    arubin Guest


    > What about if on every clock cycle you're getting some sensor inputs and you
    > need something to find the minimum for whatever reason as part of the signal
    > processing. Those sensor inputs would all be in parallel.


    This is exactly my scenario. Every clock cycle I grab data from my
    sensors and shift the new data into the data buffer (oldest data gets
    shifted out). Part of my signal processing involves getting the
    minimum value of this data buffer each cycle.

    > > So if you're going to spread out the initialization of the
    > > array across multiple clock cycles, you can take advantage
    > > of that and spread out the mechanism to determine the
    > > minimum value over those very same clock cycles.


    I actually tried a few approaches similar to this and couldn't come up
    with anything I liked.

    > > I look at this as just an interesting exercise.


    I do to. I could have just 'hard-coded' the data buffer size, but I
    want a solution that is flexible enough so that I can re-use it with
    different buffer sizes (and maybe eventually be able to use different
    buffer sizes on the fly). I also wanted to explore what I could do
    with VHDL a little bit more.

    > Yep, and the good thing is that sometimes you get new insights when
    > pondering these exercises. Even if not directly applicable now, you've
    > tickled a brain cell or two that might remember it when you need it.


    Yes - big thanks to everyone's insightful comments
     
    arubin, Sep 26, 2006
    #8
  9. arubin

    jacko Guest

    for generate?
     
    jacko, Sep 26, 2006
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    14
    Views:
    790
    clintonG
    Oct 12, 2005
  2. Mark P

    a simpler "new" question

    Mark P, Apr 26, 2005, in forum: C++
    Replies:
    5
    Views:
    342
    James Dennett
    Apr 27, 2005
  3. Replies:
    9
    Views:
    354
    Douglas Alan
    Mar 8, 2006
  4. webEater
    Replies:
    3
    Views:
    113
    Thomas 'PointedEars' Lahn
    Jun 15, 2010
  5. Kevin Walzer
    Replies:
    5
    Views:
    242
    Rainer Weikusat
    Jan 28, 2013
Loading...

Share This Page