improving code

Discussion in 'VHDL' started by mysticlol, Sep 15, 2006.

  1. mysticlol

    mysticlol Guest

    Hi,

    I am having trouble with this code. It is reducing the design max clock
    frequency considerably. How can I improve this?

    architecture rtl of clock_gen is
    constant DEFAULT_TIME : std_logic_vector := "1111011010";
    signal ontime : std_logic_vector( 10 downto 0);
    begin

    process(clock, reset)
    begin
    if reset='1' then
    ontime <= DEFAULT_TIME;
    elsif rising_edge(clock) then
    if go='1' then
    ontime <= dataWord(11 downto 1)-'1'; -- dataWord - 12 bit i/p
    end if;
    end if;
    end process;
    ..
    ..
    ..
    end architecture rtl;
     
    mysticlol, Sep 15, 2006
    #1
    1. Advertising

  2. mysticlol

    David Ashley Guest

    mysticlol wrote:
    > Hi,
    >
    > I am having trouble with this code. It is reducing the design max clock
    > frequency considerably. How can I improve this?
    >
    > architecture rtl of clock_gen is
    > constant DEFAULT_TIME : std_logic_vector := "1111011010";
    > signal ontime : std_logic_vector( 10 downto 0);
    > begin
    >
    > process(clock, reset)
    > begin
    > if reset='1' then
    > ontime <= DEFAULT_TIME;
    > elsif rising_edge(clock) then
    > if go='1' then
    > ontime <= dataWord(11 downto 1)-'1'; -- dataWord - 12 bit i/p
    > end if;
    > end if;
    > end process;
    > .
    > .
    > .
    > end architecture rtl;
    >


    Probably everyone will say go with a synchronous clock approach. Also
    you can compute the next data word in combinatorial logic:

    architecture rtl of clock_gen is
    constant DEFAULT_TIME : std_logic_vector := "1111011010";
    signal ontime : std_logic_vector( 10 downto 0);
    signal next_dataWord : std_logic_vector (10 downto 0);

    begin
    next_dataWord <= dataWord(11 downto 1) - '1';

    process(clock)
    begin
    if rising_edge(clock) then
    if reset='1' then
    ontime <= DEFAULT_TIME;
    elsif go='1' then
    ontime <= next_dataWord;
    end if;
    end if;
    end process;

    end architecture rtl;


    --
    David Ashley http://www.xdr.com/dash
    Embedded linux, device drivers, system architecture
     
    David Ashley, Sep 15, 2006
    #2
    1. Advertising

  3. mysticlol

    mysticlol Guest

    Hi David,

    I already tried that. It is not making any difference..

    Actually I have a 100MHz clock. dataWord is user input.
    100 MHz/dataWord(11 downto 0) = required Clock Period.

    I am taking dataWord(11 downto 1) - so that I can get ontime which is
    half of this dataWord. As I am counting from 0, I am substracting 1
    from this ontime.

    regards,
    mysticlol


    David Ashley wrote:
    > mysticlol wrote:
    > > Hi,
    > >
    > > I am having trouble with this code. It is reducing the design max clock
    > > frequency considerably. How can I improve this?
    > >
    > > architecture rtl of clock_gen is
    > > constant DEFAULT_TIME : std_logic_vector := "1111011010";
    > > signal ontime : std_logic_vector( 10 downto 0);
    > > begin
    > >
    > > process(clock, reset)
    > > begin
    > > if reset='1' then
    > > ontime <= DEFAULT_TIME;
    > > elsif rising_edge(clock) then
    > > if go='1' then
    > > ontime <= dataWord(11 downto 1)-'1'; -- dataWord - 12 bit i/p
    > > end if;
    > > end if;
    > > end process;
    > > .
    > > .
    > > .
    > > end architecture rtl;
    > >

    >
    > Probably everyone will say go with a synchronous clock approach. Also
    > you can compute the next data word in combinatorial logic:
    >
    > architecture rtl of clock_gen is
    > constant DEFAULT_TIME : std_logic_vector := "1111011010";
    > signal ontime : std_logic_vector( 10 downto 0);
    > signal next_dataWord : std_logic_vector (10 downto 0);
    >
    > begin
    > next_dataWord <= dataWord(11 downto 1) - '1';
    >
    > process(clock)
    > begin
    > if rising_edge(clock) then
    > if reset='1' then
    > ontime <= DEFAULT_TIME;
    > elsif go='1' then
    > ontime <= next_dataWord;
    > end if;
    > end if;
    > end process;
    >
    > end architecture rtl;
    >
    >
    > --
    > David Ashley http://www.xdr.com/dash
    > Embedded linux, device drivers, system architecture
     
    mysticlol, Sep 15, 2006
    #3
  4. mysticlol

    David Ashley Guest

    mysticlol wrote:
    > Hi David,
    >
    > I already tried that. It is not making any difference..
    >
    > Actually I have a 100MHz clock. dataWord is user input.
    > 100 MHz/dataWord(11 downto 0) = required Clock Period.
    >
    > I am taking dataWord(11 downto 1) - so that I can get ontime which is
    > half of this dataWord. As I am counting from 0, I am substracting 1
    > from this ontime.
    >
    > regards,
    > mysticlol


    If you take out the subtracting 1 and cause the user input to
    just be one less than it otherwise would be, does it meet timing?

    How about an intermediate register that already has the -1 applied --
    so ontime gets reloaded from that?

    Maybe add 1 to the DEFAULT_TIME value and change the down
    counter to stop at one more than it currently stops at? Then you
    don't need the minus 1.

    -Dave




    --
    David Ashley http://www.xdr.com/dash
    Embedded linux, device drivers, system architecture
     
    David Ashley, Sep 15, 2006
    #4
  5. mysticlol

    KJ Guest

    mysticlol wrote:
    > Hi,
    >
    > I am having trouble with this code. It is reducing the design max clock
    > frequency considerably. How can I improve this?
    >
    > architecture rtl of clock_gen is
    > constant DEFAULT_TIME : std_logic_vector := "1111011010";
    > signal ontime : std_logic_vector( 10 downto 0);
    > begin
    >
    > process(clock, reset)
    > begin
    > if reset='1' then
    > ontime <= DEFAULT_TIME;
    > elsif rising_edge(clock) then
    > if go='1' then
    > ontime <= dataWord(11 downto 1)-'1'; -- dataWord - 12 bit i/p
    > end if;
    > end if;
    > end process;
    > .
    > .
    > .
    > end architecture rtl;


    First take a look at the critical timing path that is being reported by
    the fitter and try to understand what is going on in that path.
    Understanding what is going on in that path is the key to improving
    performance.

    The synthesis/fitter tools are all pretty smart and have probably
    already performed the optomization that Dave suggested about moving the
    calculation of 'ontime' outside the process. That's why when you tried
    it, it didn't help...it probably synthesized to the exact same thing.
    For that reason, you probably shouldn't write your code in that manner,
    but keep it inside the logic block where it belongs. Sometimes there
    are exceptions to this general guideline but in this case there isn't.

    I'm guessing that the worst case path is the setup time from the input
    'dataword' to the flip flops that compute 'ontime'. Since setup times
    that start from external pins are almost always longer than ones that
    start from an internal flip flop the way to improve clock cycle
    performance is to simply register 'dataword' right at the input and
    then feed the 'dataword_delayed' into your computation of 'ontime'.
    The disadvantage is that you have added one clock cycle latency but in
    this case it looks like you can probably change your code to compensate
    for that.

    KJ
     
    KJ, Sep 15, 2006
    #5
  6. mysticlol

    Eric Guest

    Strange. Your critical path is only the carry propagation along the
    substraction and your target clock frequency is easy to reach for the
    current ASIC/FPGA technology (100MHz according to your reply?). Could
    you give some more infos about

    - Which technology do you target?
    - Which frequency would you like to reach?
    - What is the dataWord input delay wrt clock?

    If your technology really prevent you to compute the 11-bit
    substraction using a basic ripple carry substractor, then you'll have
    to use some carry anticipation technics (such as carry-save adders and
    so on). Try google and "arithmetic operator implementation".

    Eric
     
    Eric, Sep 15, 2006
    #6
  7. mysticlol

    David Ashley Guest

    KJ wrote:
    > The synthesis/fitter tools are all pretty smart and have probably
    > already performed the optomization that Dave suggested about moving the
    > calculation of 'ontime' outside the process. That's why when you tried
    > it, it didn't help...it probably synthesized to the exact same thing.
    > For that reason, you probably shouldn't write your code in that manner,
    > but keep it inside the logic block where it belongs. Sometimes there
    > are exceptions to this general guideline but in this case there isn't.


    KJ,

    Can you be specific about:
    > For that reason, you probably shouldn't write your code in that manner,
    > but keep it inside the logic block where it belongs.


    I want to make sure I'm following. Is it a question of

    #1:

    process(clk)
    begin
    if clk'event and clk = '1' then
    state <= next_state;
    end if;
    end process;

    process(state, ...)
    begin
    case state is
    when state1 => next_state <= state2;blah;
    when state2 => next_state <= state3;blah;
    when others => next_state <= state1;blah;
    end case;
    end process;

    VS
    #2:

    process(clk)
    begin
    if clk'event and clk = '1' then
    case state is
    when state1 => state <= state2;blah;
    when state2 => state <= state3;blah;
    when state3 => state <= state1;blah;
    end case;
    end if;
    end process;

    I just moved a block of code that had been doing #2 to
    #1, but are you saying #2 is the way to go? I'm trying
    to develop the right habits...

    -Dave

    -
    David Ashley http://www.xdr.com/dash
    Embedded linux, device drivers, system architecture
     
    David Ashley, Sep 15, 2006
    #7
  8. mysticlol

    KJ Guest

    David Ashley wrote:
    > KJ wrote:
    > > The synthesis/fitter tools are all pretty smart and have probably
    > > already performed the optomization that Dave suggested about moving the
    > > calculation of 'ontime' outside the process. That's why when you tried
    > > it, it didn't help...it probably synthesized to the exact same thing.
    > > For that reason, you probably shouldn't write your code in that manner,
    > > but keep it inside the logic block where it belongs. Sometimes there
    > > are exceptions to this general guideline but in this case there isn't.

    >
    > KJ,
    >
    > Can you be specific about:
    > > For that reason, you probably shouldn't write your code in that manner,
    > > but keep it inside the logic block where it belongs.

    >
    > I want to make sure I'm following. Is it a question of
    >
    > #1:
    >
    > process(clk)
    > begin
    > if clk'event and clk = '1' then
    > state <= next_state;
    > end if;
    > end process;
    >
    > process(state, ...)
    > begin
    > case state is
    > when state1 => next_state <= state2;blah;
    > when state2 => next_state <= state3;blah;
    > when others => next_state <= state1;blah;
    > end case;
    > end process;
    >
    > VS
    > #2:
    >
    > process(clk)
    > begin
    > if clk'event and clk = '1' then
    > case state is
    > when state1 => state <= state2;blah;
    > when state2 => state <= state3;blah;
    > when state3 => state <= state1;blah;
    > end case;
    > end if;
    > end process;
    >
    > I just moved a block of code that had been doing #2 to
    > #1, but are you saying #2 is the way to go? I'm trying
    > to develop the right habits...
    >


    This point almost always ignites a ton of responses from people with
    some taking one side, others taking the other....so we best be careful
    here lest wind get out of this :-0

    The example you called #1 would be considered by many to be an example
    of 'two process' coding style where you have one clocked process and
    another combinatorial process and where some or all of the outputs of
    the combinatorial process feeds into the clocked process. What you
    called #2 is an example of what many would call 'single process' coding
    style where there is simply one clocked process. The point that people
    argue about is whether or not the 'one process' approach is better than
    the 'two process'...and like I said, if word of this conversation gets
    around there will no doubt be a number of people on both sides of the
    fence stating their opinion....so I'll go first ;)

    - The 'one process' coding style will always take fewer lines of code.
    Even your simple example takes 8 lines for the two process version
    versus 7 for the one process (not counting the begin/end
    process...which really should be counted also but just trying to strip
    down to the absolute indisputable basics here. Typically what you'll
    find is that the two process style bloats it up a bit more....the
    number of lines of code difference between the two typically grows a
    bit for each additional output signal of the process that is added.
    - The 'two process' coding style is prone to two types of errors
    * Incomplete sensitivity list (you had "process(state, ...)" ).
    Maintenance of that "..." is not trivial in a real design, will
    simulate just fine with no errors or warnings but synthesize to
    something functionally different. There is no shortcut that I know of
    to aid in that maintenance other than to run the code through a
    synthesis engine and see where it warns you about an incomplete
    sensitivity list and then fix it...and re-simulate....wasted time if
    you ask me.
    * Default values for signals inside the combinatorial process. Your
    example covers every possible path assigning the one signal that is
    inside the process but let's say that process assigned to 4 different
    signals. You'd need to make sure that every possible path always got
    assigned regardless of the logic. Typically what is done is right at
    the begining of the combinatorial process you would have 'default'
    assignments to those 4 signals and then head down into the meat of what
    the process is attempting to do. If every possible path does not
    assign a value then you've created a combinatorial latch...definite
    'no,no' in FPGA/CPLDs. The method of always having a 'default'
    assignment right at the begining of the process is at least a way to
    insure that you cover this....and is also a contributor to why I
    mentioned that the difference between the number of lines of code
    written using the two process approach versus using the one process
    approach grows with the number of signals being assigned within that
    process. Using the one process coding style these additional
    statements just aren't necessary.
    - Whether one uses the 'one process' coding style or the 'two process'
    coding style you will get the exact same synthesis results.
    - More lines of code translates into more possible bugs.

    Those are pretty much the indisputable facts about 'one process' versus
    'two process' coding style that even the two process people will
    begrudingly acknowledge. Pick up a textbook and you'll often see the
    two process being waxed poetically and how it has some esoteric
    advantage but when you cut through the cr-p and dig down to how it is
    in any way better you'll come up empty. A lot of times the style that
    people seem to use has more to do with
    - How they were first taught it.
    - How their first 'mentor' taught them.
    - How the company they work for demands it.

    >From that they now develop a personal preference and tend to stick with

    it.

    So take a look at the above mentioned disadvantages and I'll bet that
    they are probably fairly closely aligned to how your personal
    productivity will be measured in your job and decide for yourself.
    Better yet, take some stripped down but somewhat realistic code and
    implement it both ways and see firsthand for yourself.

    As for the specific code of the original post, I simply think it
    belongs inside the clocked process since ultimately this is a clocked
    output and having two lines of code that are physically separated in
    the file to implement what can be done with one is not a method I would
    encourage.

    The optomization that you mentioned as a suggestion would have been a
    good one if the synthesis tool hadn't (most likely) already implemented
    it. Based on the original poster's comment that he had tried your
    suggestion and got the same performance would suggest that the
    synthesis process had indeed sniffed this one out. It can be
    enlightening sometimes to take a look at the synthesized output code
    and see how it has implemented functionally the same thing but in a
    much better (space, performance) way than an inspection of your source
    code would seem to suggest that it was intended to be implemented.

    KJ
     
    KJ, Sep 15, 2006
    #8
  9. mysticlol

    David Ashley Guest

    KJ wrote:
    >...
    > - Whether one uses the 'one process' coding style or the 'two process'
    > coding style you will get the exact same synthesis results.
    > - More lines of code translates into more possible bugs.
    >
    > Those are pretty much the indisputable facts about 'one process' versus
    > 'two process' coding style that even the two process people will
    > begrudingly acknowledge. Pick up a textbook and you'll often see the
    > two process being waxed poetically and how it has some esoteric
    > advantage but when you cut through the cr-p and dig down to how it is
    > in any way better you'll come up empty. A lot of times the style that
    > people seem to use has more to do with
    > - How they were first taught it.
    > - How their first 'mentor' taught them.
    > - How the company they work for demands it.
    >...
    >
    > KJ


    KJ,

    I think I'm convinced. I had a module that was using the single clocked
    process style, and I converted it to the 2 process style. So for every
    clocked signal I had to declare a next_<whatever> signal. During
    simulation I wasted time because my non-clocked process didn't have
    all the sensitivities listed. Then during synthesis, I got bit because
    I missed one or to next_<whatever> signals that didn't have a default
    setting (as in starting out with next_cmd <= cmd;) So in this one
    exercise I experienced pretty much everything you listed. Especially:
    works fine in simulation, fails on actual hardware -- what a headache
    to track down.

    Now for the 2 coding style I had been attracted because it eliminates
    one level of if -- the non-clocked region doesn't need to be inside
    a "if clk'event and clk = '1' then". This makes for less indented code.

    Another possible benefit to clocked/non-clocked processes: the non-clocked
    can be divided up into different processes which can be kept small.
    However that craps out when you want 2 processes to define the same
    signal depending on state -- which means your state dependent code
    has to be in one process anyway.

    Trying both on a real world, but non-complex circuit, was a very good
    exercise. Actually the design was for a simple memory test unit using
    the open cores ddr controller -- it fills memory up with a non-repetitive
    pattern, then reads it out and verifies it. If all is ok an LED stays lit.

    All in all I think I lean towards the single clocked process. Thanks for
    taking the time to elaborate all that!

    -Dave

    --
    David Ashley http://www.xdr.com/dash
    Embedded linux, device drivers, system architecture
     
    David Ashley, Sep 16, 2006
    #9
  10. mysticlol

    mysticlol Guest

    Hi KJ,

    The critical path is between 1. ontime FF to Count FF (6.36ns) and 2.
    dataword_2 to ontime_8 (5.08 ns)

    Actually it is meeting 100 MHz clock requirement. But in overall
    project I need this module should reach 200 MHz. Now it is giving upto
    150 MHz.

    I know this might be annoying like asking questions without full code.
    So let me give code....


    library IEEE;
    use IEEE.STD_LOGIC_1164.ALL;
    use IEEE.STD_LOGIC_ARITH.ALL;
    use IEEE.STD_LOGIC_UNSIGNED.ALL;
    use IEEE.NUMERIC_STD.ALL;

    ---- Uncomment the following library declaration if instantiating
    ---- any Xilinx primitives in this code.
    --library UNISIM;
    --use UNISIM.VComponents.all;

    entity clock_gen is
    generic (default_div : std_logic_vector(10 downto 0) :=
    "00000110001");
    Port (
    clk100 : in std_logic;
    reset : in std_logic;
    enable : in std_logic;
    go : in std_logic;
    dataWord : in std_logic_vector(11 downto 0);
    clock_out : out std_logic;
    done : out std_logic
    );
    end entity clock_gen;

    architecture Behavioral of clock_gen is
    signal count : std_logic_vector(10 downto 0);
    signal ontime : std_logic_vector(10 downto 0);
    signal clkout : std_logic;
    signal go_en : std_logic;
    begin

    go_en <= go and enable;

    process(clk100, reset)
    begin
    if reset='1' then
    ontime <= default_div;
    elsif rising_edge(clk100) then
    if go_en='1' then
    ontime <= dataWord(11 downto 1)-'1';
    end if;
    end if;
    end process;

    process(clk100, reset)
    begin
    if reset='1' then
    done <= '1';
    elsif rising_edge(clk100) then
    if go_en='1' then
    done <= '0';
    else
    done <= '1';
    end if;
    end if;
    end process;

    process(clk100, reset)
    begin
    if reset='1' then
    count <= (others => '0');
    elsif rising_edge(clk100) then
    if go_en='1' or count=ontime then
    count <= (others => '0');
    else
    count <= count + '1';
    end if;
    end if;
    end process;

    process(clk100, reset)
    begin
    if reset='1' then
    clkout <= '0';
    elsif rising_edge(clk100) then
    if count=ontime then
    clkout <= not clkout;
    end if;
    end if;
    end process;

    clock_out <= clkout;

    end architecture Behavioral;


    KJ, I am speachless .. what a guess.
    You are an inspiration to programmers like me..


    Regards,
    Krishna




    KJ wrote:
    > mysticlol wrote:
    > > Hi,
    > >
    > > I am having trouble with this code. It is reducing the design max clock
    > > frequency considerably. How can I improve this?
    > >
    > > architecture rtl of clock_gen is
    > > constant DEFAULT_TIME : std_logic_vector := "1111011010";
    > > signal ontime : std_logic_vector( 10 downto 0);
    > > begin
    > >
    > > process(clock, reset)
    > > begin
    > > if reset='1' then
    > > ontime <= DEFAULT_TIME;
    > > elsif rising_edge(clock) then
    > > if go='1' then
    > > ontime <= dataWord(11 downto 1)-'1'; -- dataWord - 12 bit i/p
    > > end if;
    > > end if;
    > > end process;
    > > .
    > > .
    > > .
    > > end architecture rtl;

    >
    > First take a look at the critical timing path that is being reported by
    > the fitter and try to understand what is going on in that path.
    > Understanding what is going on in that path is the key to improving
    > performance.
    >
    > The synthesis/fitter tools are all pretty smart and have probably
    > already performed the optomization that Dave suggested about moving the
    > calculation of 'ontime' outside the process. That's why when you tried
    > it, it didn't help...it probably synthesized to the exact same thing.
    > For that reason, you probably shouldn't write your code in that manner,
    > but keep it inside the logic block where it belongs. Sometimes there
    > are exceptions to this general guideline but in this case there isn't.
    >
    > I'm guessing that the worst case path is the setup time from the input
    > 'dataword' to the flip flops that compute 'ontime'. Since setup times
    > that start from external pins are almost always longer than ones that
    > start from an internal flip flop the way to improve clock cycle
    > performance is to simply register 'dataword' right at the input and
    > then feed the 'dataword_delayed' into your computation of 'ontime'.
    > The disadvantage is that you have added one clock cycle latency but in
    > this case it looks like you can probably change your code to compensate
    > for that.
    >
    > KJ
     
    mysticlol, Sep 18, 2006
    #10
  11. Hi,

    David Ashley schrieb:
    > mysticlol wrote:
    > > process(clock, reset)
    > > begin
    > > if reset='1' then
    > > ontime <= DEFAULT_TIME;
    > > elsif rising_edge(clock) then
    > > if go='1' then
    > > ontime <= dataWord(11 downto 1)-'1'; -- dataWord - 12 bit i/p
    > > end if;
    > > end if;
    > > end process;

    [..]
    > Probably everyone will say go with a synchronous clock approach. Also
    > you can compute the next data word in combinatorial logic:


    You can anyway calculate the data only combinatorical and store the
    result sequentiell.

    > architecture rtl of clock_gen is
    > constant DEFAULT_TIME : std_logic_vector := "1111011010";
    > signal ontime : std_logic_vector( 10 downto 0);
    > signal next_dataWord : std_logic_vector (10 downto 0);
    >
    > begin
    > next_dataWord <= dataWord(11 downto 1) - '1';
    >
    > process(clock)
    > begin
    > if rising_edge(clock) then
    > if reset='1' then
    > ontime <= DEFAULT_TIME;
    > elsif go='1' then
    > ontime <= next_dataWord;
    > end if;
    > end if;
    > end process;


    The most interessting difference I see, is the change of async reset to
    a sync reset.
    Your code will be better for real stupid tools, but I think there is no
    commercial tool making any difference between
    <clocked>
    A=B+1;
    </clocked>
    and
    temp=B+1;
    <clocked>
    A=temp;
    </clocked>

    What the OP wants is a faster '+ (-1)' than the one he gets with using
    just '-1'.
    Fist step is to investigate long paths with respect to layout and
    netload.
    Then have a look at the available technology, maybe you have a fast
    carry chain outrunning even CLA (Carry Lookahead Adder) with a RCA
    (Ripple-Carry Adder).

    Last chance would be the use of pipeline stages. Or use double HW, if
    your adder runs 100 MHz but not 200 MHz you could possibly use two
    adders parallel running at 100 MHz to get the needed number of results
    per second

    bye Thomas
     
    Thomas Stanka, Sep 18, 2006
    #11
  12. mysticlol

    KJ Guest

    mysticlol wrote:
    > Hi KJ,
    >
    > The critical path is between 1. ontime FF to Count FF (6.36ns) and 2.
    > dataword_2 to ontime_8 (5.08 ns)
    >
    > Actually it is meeting 100 MHz clock requirement. But in overall
    > project I need this module should reach 200 MHz. Now it is giving upto
    > 150 MHz.
    >
    > So let me give code....
    >
    >
    > library IEEE;
    > use IEEE.STD_LOGIC_1164.ALL;
    > use IEEE.STD_LOGIC_ARITH.ALL;
    > use IEEE.STD_LOGIC_UNSIGNED.ALL;
    > use IEEE.NUMERIC_STD.ALL;


    Get rid of the std_logic_arith and std_logic_unsigned....numeric_std is
    the preferred approach and is a standard. The other two (despite being
    in the 'ieee' library) are not standards. This won't affect your
    timing, just a comment.

    The path from dataword_2 to ontime_8 (5.08 ns) might improve by adding
    the previously mentioned flip flop delay. Many parts have I/O buffers
    that include flip flops. Making use of those will get the best
    possible timing, whether it will get you up to 150-200 MHz depends on
    the specific part.

    In order to 'make use of those flip flops' you need to make sure that
    the code you write get implemented in the way that you want it to.
    Some parts with the flip flops in the I/O buffer do not have any sort
    of clock enable or reset....they are just flip flops. So if the code
    that you write infers something more than that (as your does) than it
    won't be able to make use of those flops. It might cost you one clock
    cycle of latency but if clock frequency is the issue, then this will
    help.

    The path from ontime FF to Count FF is caused by your statements "if
    go_en='1' or count=ontime then". In order to see if 'count = ontime'
    logic needs to be implemented to compare two 11 bit numbers. There's
    no magic, a compare of two 11 bit numbers is a function of 22 inputs
    producing a single output. The way to improve performance here is to
    try to recast the problem into one that compares an 11 bit number to a
    constant. Now logic needs to be implemented to take 11 bits of input
    and produce a single output. Again, it might not end up being fast
    enough to hit 150-200 MHz but it should get you a lot closer.

    It appears on the surface that instead of resetting count to 0 and then
    incrementing it up until you get to 'ontime' you could instead
    initialize a counter to something dependent on dataword (since that is
    where 'ontime' originates from) and then have that counter decrement
    until it hits 0. That way the equivalent to comparing 'count=ontime'
    will be to compare 'count=0' which, as I said, is comparing an 11 bit
    number to a constant which should speed up that critical path.

    You'll need to work through the detailed changes so that you keep the
    function that you're trying to implement but that should be enough to
    get you going.

    As an aside, always keep in mind whenever you use any sort of multi-bit
    operator ('=', '>' or '<', '+', '-', '*', '/', etc.) that these things
    can be costly in terms of logic and timing. If performance is an issue
    you'll almost always want to see if there are ways to write the code so
    that one of the operands is a constant since that cuts the number of
    inputs to that function in half. It can't always be done, and
    depending on the hardware resources in the part it might not be
    desirable, but it's good to keep this sort of transformation in the
    back of your head for those times when you need it.

    KJ
     
    KJ, Sep 18, 2006
    #12
  13. mysticlol

    Andy Peters Guest

    David Ashley wrote:
    > mysticlol wrote:


    This:

    > > process(clock, reset)
    > > begin
    > > if reset='1' then
    > > ontime <= DEFAULT_TIME;
    > > elsif rising_edge(clock) then
    > > if go='1' then
    > > ontime <= dataWord(11 downto 1)-'1'; -- dataWord - 12 bit i/p
    > > end if;
    > > end if;
    > > end process;


    and this:

    > next_dataWord <= dataWord(11 downto 1) - '1';
    >
    > process(clock)
    > begin
    > if rising_edge(clock) then
    > if reset='1' then
    > ontime <= DEFAULT_TIME;
    > elsif go='1' then
    > ontime <= next_dataWord;
    > end if;
    > end if;
    > end process;


    are equivalent. Mostly; you've used a synchronous reset and
    "mysticlol" has used an async reset. But the "go" signal assignment is
    otherwise identical.
    -a
     
    Andy Peters, Sep 18, 2006
    #13
  14. mysticlol

    mysticlol Guest

    OK.
    I will make these changes to my code.
    Thank you all for the timely suggestions.

    Regards,
    Krishna Janumanchi
     
    mysticlol, Sep 19, 2006
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. ruds
    Replies:
    13
    Views:
    589
    squirrel
    Apr 8, 2007
  2. David Lees
    Replies:
    2
    Views:
    396
    Robert Kern
    Feb 25, 2008
  3. mo reina
    Replies:
    5
    Views:
    241
    John Bokma
    Jul 27, 2010
  4. Joe Van Dyk

    Improving code...

    Joe Van Dyk, Nov 17, 2005, in forum: Ruby
    Replies:
    9
    Views:
    133
    J. Merrill
    Nov 17, 2005
  5. Aditya Mahajan

    Suggestions for improving code

    Aditya Mahajan, Mar 11, 2006, in forum: Ruby
    Replies:
    2
    Views:
    116
    Aditya Mahajan
    Mar 14, 2006
Loading...

Share This Page