[cross-post]path verification

Discussion in 'VHDL' started by alb, Mar 17, 2014.

  1. alb

    alb Guest

    Dear all,

    I have a microcontroller with an FPU which is delivered as an IP (I mean
    the FPU). In order to run at a decent frequency, some of the operations
    are allowed to complete in within a certain amount of cycles, but the
    main problem is that we do not know how many.

    That said, if we run the synthesis tool without timing constraints on
    those paths, we have a design that is much slower than can be.
    Multicycle constraints are out of question because they are hard to
    verify and maintain, so we decided to set false paths and perform
    post-layout sims to extract those values to be used in the RTL in a
    second iteration.

    There are several reasons why I do not particularly like this approach:

    1. it relies on post-layout sims which are resource consuming
    2. if we change technology we will likely need to do the whole process
    3. we are obliged to perform incremental place&route since an optimized
    implementation (maybe done automatically) may have an impact on our

    So far we have not come out with an alternative solution that is not
    going to imply redesign (like pipelining, c-slowing, retiming, ...).

    Any ideas/suggestions?

    alb, Mar 17, 2014
    1. Advertisements

  2. So you paid someone for this?

    I am not sure what you mean by "a certain number of clock cycles"
    and "do not know how many".

    If it is all combinatorial, it will complete with some delay, not
    in some number of clock cycles. That is, the delay will not depend
    on any clock you supply. You then have to either be able to run
    the design through timing analysis and see how long that is, or the
    ones you bought it from should tell you.

    Though more usual, the logic should have a signal indicating when
    the result is valid.

    You could run the FPU in the timing tools with a variety (random)
    inputs and find out how long it takes. Then find the distribution
    of delays, and find a reasonable maximum. It might be data dependent
    and have a long tail. (A post-normalize shifter might depend on the
    number of digits being shifted, and the rare long shifts would have
    to be accounted for.)
    The FPUs that I know of should be pipelined. (Is there a clock input?)
    You shouldn't have to do the pipelining, but you do need to know the
    number of clock cycles (and clock rate) for each operation.

    If the design is encrypted, such that you can't look at it, they
    need to give you enough information to be able to use it.

    -- glen
    glen herrmannsfeldt, Mar 17, 2014
    1. Advertisements

  3. alb

    alb Guest

    Hi Glen,

    that is correct. Well it was a development on a european project where
    several universities did something and then we tried to stich it
    together... The aim was to have a small footprint embedded
    microcontroller capable of floating-point calculations.
    I admit I was not too clear, let's try again. The IP is the FPU and it
    came fully verified but never validated on the hardware (i.e. no P&R, no
    STA, no backannotate sim). We built around it a microcontroller and now
    it is time to target the technology for the specific project.

    So at this stage we do not know, considering the target logic, what is
    the logic depth for each operation of the FPU and we do not know how
    many clock cycles we need to wait in order to get the value out at the
    given target frequency.
    As you correctly pointed out the delay does not depend on the clock
    frequency, but it depends on the target technology and final routing. In
    order for the microcontroller to work correctly I need to 'wait' for
    each specific operation a certain amount of clock cycles in order to be
    able to sample correctly the result.

    I already know that, at the target frequency, it will take more than one
    cycle to complete most of the operations, therefore my timing analysis
    will fail miserably. Not only that, without releasing some timing
    constraints on some specific path, the synthesis tool will take the
    worst path and extract the max frequency from that [1].

    Regarding the possibility to ask the developer(s), the team has fallen
    apart in the meanwhile and now we need to chase people around the globe
    to get some info (not easy).
    that is a valid point indeed, I implicitely assumed there's no such
    signal, but is equally possible that we haven't 'seen it'. In order to
    find something you should look for it...
    I'm not sure I'm following. If for timing tools you mean STA than
    there's no such a thing like 'variety of inputs', the tool is static and
    is only calculating delays associated with paths in a graph. What you
    suggest seems more a post-layout simulation...did I get it wrong?

    Running the STA without constraining the synthesis tool might be
    suboptimal since the synthesis tool did not come out with an optimized
    The FPU is not pipelined otherwise I would have known the amount of
    clock cycles simply with the depth of the pipe. Am I wrong? Why is not
    pipelined is a different topic.
    The design is not encrypted but nobody really wanted to dig into those
    details so far. I agree with you that is probably worth spending some
    effort to understand how it works in the details and come out with a
    solution that is suited to fix the root of the problem rather than come
    up with ad-hoc solutions.
    alb, Mar 18, 2014
  4. alb

    alb Guest

    forgot to add a note that I refer to in my previous article...


    [1] I actually do not know the way the synthesis tool works, but it
    seems my simple model pretty much matches what is happening
    alb, Mar 18, 2014
  5. alb

    HT-Lab Guest

    I would suggest you speak to your boss to see if you can spend some
    money on getting a Fishtail Focus license. This tool will automatically
    extract multicyle and false path from your design. The output is a bunch
    of SDC constraints and assertions (PSL/SVA) for verifications.


    HT-Lab, Mar 19, 2014
  6. alb

    alb Guest

    Hi Glen, sorry for the delayed reply...been quite busy lately.
    I have digged a little in the code and found a signal called /ready/ and
    thought I solved my issues, but then wait a minute, how can you
    implement a signal ready that takes into account a combinatorial path?
    And even if, I need to inform my synthesis tool about those paths being
    either multicycle paths or false paths, otherwise it'll try to make them
    fit in a single clock cycle.

    For a pipelined FPU the signal /ready/ makes much more sense (at least
    that's the only sense I see), but being not the case here I'll have to
    find a different way to verify the design.
    I might have found a different approach. Being the FPU part of an
    embedded microprocessor, I may take the advantage of having the
    possibility to run a program on it and perform the verification with it.
    My testbench would not generate any particular signal, just the ones
    enough for the embedded processor to run, but the program loaded into it
    will perform the FPU operations and check they are indeed correct. If
    not I'll need to incrementally add a clock cycle delay before fetching
    the result into the output register.

    This approach might be very time consuming, but I see two main advantages:

    1. it's totally agnostic w.r.t. the implementation. I do not need to
    know the details and I can run it for any technology, without the need
    to update my multicycle paths (I still need to keep the false path in
    place though).

    2. it's the simulator that works, not me. Considering how much I'm paid
    per hour, I think it is much less expensive if a stupid machine does the
    job instead of me.

    I have not yet run a full-fledged program within modelsim, but I managed
    to run a simple 'hello world' program with no time.

    alb, Mar 24, 2014
  7. alb

    rickman Guest

    If I understand you correctly, you have a piece of combinatorial logic
    and you need to know how fast it will run in your design. This will
    then let your surrounding circuitry wait some number of clock cycles to
    read the result, that give you a longer delay than the delay though the

    I think your starting premise that multi-cycle constraints are "out of
    the question" is where you have erred. Multi-cycle constraints are
    exactly what are required and if you don't understand how to use them
    you are not likely to get a good result.

    Post P&R simulation is not a good way to validate timing because it is
    so hard to cover every path through the logic. Static timing analysis
    is the right way to do this and you need to learn to use it properly.
    rickman, Apr 2, 2014
  8. alb

    alb Guest

    Hi Rick,

    There are two aspects here to consider:

    1. multicycle constraints need not only a /from/ and /to/ parameter, they also
    need a /through/ parameter. When you have a logic depth of 111 gates you start to
    understand why a multicycle constraint cannot be a sustainable solution.

    2. My experience in setting up multicycle constraints is nearly zero and starting
    off with such an approach on this type of project would be begging for troubles.
    I've read several times on this group the skepticism behind static timing analysis
    when multicycle constraints are in place. I have to search back in the archives to
    really understand the technical motivations, but the bottom line is:

    a. is difficult to maintain them; if the logic path has been optimized the
    constraint does not work anymore
    b. is difficult to verify them; if the path *is not* multicycle you may wrongly
    relax the timing too much and never realize until another optimization takes place
    and your circuit does not work any more.

    If anyone sees a flaw in my points above I'd be glad to be corrected.

    alb, Apr 3, 2014
  9. alb

    HT-Lab Guest

    On 03/04/2014 08:05, alb wrote:

    Hi Al,

    I think you are confusing propagation (or false path) delay with
    multicycle path delay. A multicycle delay is a synchronous "number of
    clock cycle" based delay, it does not depend on the clock frequency. You
    use this delay if you know your circuit takes n clock cycles to
    propagate the result to the destination register.
    You can easily verify them using assertions, see end of the pdf below:


    As I mentioned in another thread, learn PSL, it is a real eye opener for

    if the path *is not* multicycle you may wrongly
    Not exactly, you will simply not get timing closure and your will
    probably end up using more resources then necessary.

    www ht-lab.com
    HT-Lab, Apr 3, 2014
  10. alb

    alb Guest

    Hi Hans,

    IMHO a multicycle path delay is a propagation delay specified as
    relative to the clock period. Hence it *does* depend on the clock
    frequency, while the propagation through your gates does not (it depends
    on the technology).

    If your path takes 12.3 ns you would have to set a multicycle constraint
    of 2 with a 100MHz clock, but 3 with a 200MHz one.

    A false path is a different story. You want to inform your synthesis
    tool that a certain path is never going to be used so do not bother
    optimizing it.
    If I know when I will be reading the result on the destination register
    I may relax the time it will take to propagate the result, but on the
    contrary if I want to know when is the earliest moment to go and get the
    result, the multicycle path is of no use.
    It's in the pipe...a very long one unfortunately ;-) But thanks for the

    Can you verify if a certain path is not violating the setup time of your
    register? Can you verify what is the delay it takes to go from
    register A to register B through some logic?
    Assume a single cycle path that you set to be multicycle because of
    mistake in your analysis. The synthesis tool will not know if your
    multicycle path is correct or wrong, therefore it will relax the timing
    between the selected end points and you will sample the result at the
    wrong time.

    The STA will correctly report the path is indeed fulfilling the
    constraint, but the logic will take the result too early. If you decided
    not to roll your postlayout sim because you relied on your STA, then you
    are set to find nasty surprises on the bench.

    alb, Apr 3, 2014
  11. alb

    HT-Lab Guest

    Hi Al,

    You still have your terminology wrong, here is a SDC example of an
    typical MCP constraint:

    set_multicycle_path 2 -from reg_alu* -to reg_mult*

    Notice there is no time, just a natural number of clock cycles.
    You are mixing your constraints. If your combinational path takes 12.3
    ns you set a clock constraint of 81MHz. If you have a MCP in your design
    you are most likely controlling the output register with an enable pin.
    You do not use a MCP to constraint a propagation delay.

    Not with assertions,

    HT-Lab, Apr 3, 2014
  12. alb

    KJ Guest

    The value of '2' though is computed based on the clock period. Alb already pointed that out earlier in the thread "If your path takes 12.3 ns you would have to set a multicycle constraint of 2 with a 100MHz clock, but 3 with a 200MHz one."

    Kevin Jennings
    KJ, Apr 3, 2014
  13. alb

    alb Guest

    Hi Hans,

    I apologize but I did not understand from this example what is wrong in
    my terminology.
    reading out loud your MCP constraint:

    'the propagation delay from reg_alu* to reg_mult* has to be smaller than
    2 clock cycles (minus setup time)'

    Notion of time is automatically inferred by your tool since it knows
    what is the clock period for those particular registers. If the two
    registers are in two different clock domains I doubt you can really set
    a multicycle path constraint (at least it does not make sense to me).
    I have to find out how much time I need to wait before sampling the
    logic with my output enable. There are several (in the 100s) paths
    between input and output (it's an fpu), therefore I can die under a pile
    of multicycle path constraints.
    IMHO yes you do. You are telling the synthesis tool that a particular
    path (or branch of a graph) can have a propagation delay:

    Tp < N * clock_period - Tsetup

    rather than the usual:

    Tp < clock_period - Tsetup

    Why would you think the MCP does not constraint the propagation delay?

    alb, Apr 3, 2014
  14. alb

    HT-Lab Guest

    We are taking about different issues here. My argument is that you
    should not exchange a clock constraint for an MCP one,

    HT-Lab, Apr 3, 2014
  15. alb

    HT-Lab Guest

    On 03/04/2014 14:48, alb wrote:

    Hi AL,
    Poor choice of words on my part, I should have said you don't use an MCP
    constraint as a clock constraint.

    HT-Lab, Apr 3, 2014
  16. alb

    KJ Guest

    You should be able to wild card the path sources inside your block and specify exactly the output enable signal. There should be no need to specify each path source explicitly.

    Kevin Jennings
    KJ, Apr 3, 2014
  17. alb

    rickman Guest

    On 4/3/2014 3:05 AM, alb wrote:> Hi Rick,
    I can't say I follow that. I have only ever specified a from and to
    parameter for a timing constraint. I have never needed to indicate a
    "through" parameter. If you have special sections of the logic that
    need a shorter timing constraint than others, I would expect that to be
    a subset of the from and to, not a special "though" path. Is there
    something unique about your design that a simple from and to spec
    doesn't capture the nuance?

    for troubles.

    How much experience do you have with any of the other approaches you are
    trying? I mean, you are here asking for advice. So clearly there are
    things about each of these approaches you are not familiar with.

    I don't follow that either. It is seldom that any from/to path would be
    optimized away. If it is, it is likely due to an error in your code
    which you will need to fix anyway.

    ALL timing constraints are difficult to verify... no, make that
    impossible. That has always been one of my complaints about static
    timing analysis, there is no way to verify the constraints other than
    the coverage number which is just a pass/fail sort of thing.

    Perhaps I am missing something. ???
    rickman, Apr 3, 2014
  18. alb

    rickman Guest

    I think you are misreading what is intended. It is assumed there is
    already a clock timing constraint of 100 MHz. That is for the general
    logic in this clock domain. But for a certain section of logic the
    output of the logic is not used for some number of clock cycles that
    will be determined by the delay through the logic which is expected to
    be longer than one clock cycle.

    The OP wants to set this number of clock cycles in the timing
    constraints of that special path to verify that the P&R output will work
    with the timing he has picked. If the timing fails he has the options
    of working to improve the timing in the P&R or changing the logic of the
    register enable to allow more clock cycles for this path.

    In no case would he want to change the timing constraint on the clock
    since that constraint is set by other aspects of his design goals.

    Do I misunderstand what you are trying to say?
    rickman, Apr 3, 2014
  19. alb

    alb Guest

    Hi Kevin,

    Imagine an fpu, with two input registers for the operands, one for the
    operator and an output register for the result. The result register is
    the one that will receive the output enable.

    Depending on the operator I will have a different path. If I wildcard
    the path sources than I'm overly constraining and a 'nop' operation will
    take as much as a division operation, which is not what we want.

    Since most of the combinatorial functions are reused several times in
    each operation, the datapath starts to be painfully complicated. That is
    the main reason why I discarded the option to setup multicycle

    The alternative, though, is not very palatable either. We decided to set
    false paths between the above mentioned registers and let post-par sim
    figure out whether we are in or out with our output enable. The problem
    is that post-par simulation may not cover the whole set of timing
    scerarios the logic will encounter.

    For instance I do not know if a backannotated simulation includes clock
    skew, while AFAIK it shoudl be taken into account in STA. The described
    approach tries to verify timing, but I'm not sure this is really going
    to be risk free.

    Certainly I can add some jitter to my clock within the simulation itself
    to make it more /realistic/ , but I will certainly not cover all the

    Considering the target FPGA is an RTAX2000 (~20'000$), we are kind of
    unconfortable to proceed without a fully consistent picture.

    alb, Apr 4, 2014
  20. alb

    alb Guest

    Hi Rick,

    Imagine your path between two registers (A and B) is set by another
    register C. The resulting operation is to be stored in register D. If
    you do not set a /through/ clause you will constraint each path with the
    maximum delay, which is not desirable.
    I've often done post-par sims, but it was combined with an STA,
    therefore I've always been sure the design was correct as long as STA
    did not report anything fishy *and* post-par sim succeeded.

    Recently I started to look at post-par sims as an additional step which
    is not necessarily required for synchronous logic as long as your input
    constraints are well defined.

    In this case we cannot use STA to do time analysis and I'm
    I certainly was talking about the /through/ clause I mentioned earlier.
    The synthesis tool might optimize away (or maybe rename) certain nets
    and you're constraint will not be applicable anymore.
    That is why you'd be better off if you didn't have them! :)
    alb, Apr 4, 2014
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.