Discrepancy between functional simulation and post-synthesissimulation when inferring a block-RAM

Discussion in 'VHDL' started by Cesar, Jun 11, 2010.

  1. Cesar

    Cesar Guest

    Hello:

    I've been always coding VHDL in 'data-flow' way. For my last module, I
    tried to code it in 'one-process' style. Functional simulation was ok,
    but post-synthesis simulation has different results. I use XST and
    Modelsim and my device is a Spartan-3.
    I've employed RTL viewer from Xilinx ISE 11.4 to check out the
    synthesis results and I've discovered that there is a problem
    inferring a block-RAM.

    I only read the block-RAM (ROM). When reading, a block-RAM should
    latch-in the address in the active clock edge and, after Tco, the data
    should be output at DO. Synchronously speaking, reading the block-RAM
    should imply one clock period delay.
    When inferring the block-RAM in 'one-process' style, XST automatically
    and always add a register for the address input and a register for the
    data output (independently of the VHDL code you have).
    This fact implies that functional simulation does not meet post-
    synthesis one and adds an aditional clock period delay when reading
    the block-RAM.

    Does any body has have a similar problem? Any solution?
    Unfortunally I think I'll have to recode my module in the old style :-
    (

    Regards,
    César
     
    Cesar, Jun 11, 2010
    #1
    1. Advertising

  2. Cesar

    KJ Guest

    On Jun 11, 12:11 pm, Cesar <> wrote:
    > Hello:
    >
    > I've been always coding VHDL in 'data-flow' way. For my last module, I
    > tried to code it in 'one-process' style. Functional simulation was ok,
    > but post-synthesis simulation has different results.


    And is the post-synthesis sim result that an output pin (or pins)
    happen exactly one clock cycle late as you later suggest? Are the
    setup and hold times to all of the input pins of the device in the
    simulation meeting the requirements that pop out of the timing
    report? A testbench such as the following likely will cause a timing
    problem and therefore a sim mismatch

    process(Clock)
    begin
    wait until rising_edge(Clock);
    ...
    Some_Input <= '1';
    ...
    end if;

    Instead the signal assignment should be
    Some_Input <= '1' after 5 ns; -- As an example

    > I use XST and
    > Modelsim and my device is a Spartan-3.
    > I've employed RTL viewer from Xilinx ISE 11.4 to check out the
    > synthesis results and I've discovered that there is a problem
    > inferring a block-RAM.
    >
    > I only read the block-RAM (ROM). When reading, a block-RAM should
    > latch-in the address in the active clock edge and, after Tco, the data
    > should be output at DO. Synchronously speaking, reading the block-RAM
    > should imply one clock period delay.
    > When inferring the block-RAM in 'one-process' style, XST automatically
    > and always add a register for the address input and a register for the
    > data output (independently of the VHDL code you have).


    You have to be a little careful here because the logic for how those
    address signals are generated as well as the logic that uses the data
    outputs could be influencing what is going on and you may be mis-
    interpreting what you're seeing in the viewer. Synthesis is only
    supposed to generate a cycle for cycle match at the device pin
    boundary. There is no guarantee that internal signals. Not saying
    that you're suggesting is wrong, just suggesting something else to
    check first.

    > This fact implies that functional simulation does not meet post-
    > synthesis one and adds an aditional clock period delay when reading
    > the block-RAM.
    >


    Only if the setup and hold times on every input pins is correct. Some
    people think that they can take their testbench and simply plop down
    the post-synthesis model and expect it to functionally work in the
    same manner as the original code. This is only true if you've met
    each of the following conditions:
    - Setup and hold times of all input pins meet the requirements that
    are specified by the timing report
    - Sampling time of all output pins only occurs after the timing report
    specified clock to output requirements are met (or propogation delay
    for combinatorial outputs)
    - No synthesis warnings about things that cannot be synthesized
    exactly. Some examples are:
    * x <= y after 10 ns; -- Can't synthesize delays
    * Incomplete sensitivity lists
    * Combinatorial loops
    * Other things that I may have forgotten

    > Does any body has have a similar problem? Any solution?
    > Unfortunally I think I'll have to recode my module in the old style :-
    >


    Before you go down that path, I'd suggest you take the time to
    investigate this a bit further. It's quite possible that when you
    recode to your 'old style' that you might also introduce a subtle
    difference that accounts for the discrepancy and you'll go off on the
    mistaken belief that it has something to do with your coding style.

    The thing to do is as Brian suggested, create a testbench that
    instantiates both parts and compares the two outputs. I'd add the
    above mentioned cautions about how to generate inputs and when to
    sample outputs. It also might be easier to simply have the block ram
    I/O be the device I/O in order to remove all of the clutter and see if
    it really does have something to do with the coding style.

    Kevin Jennings
     
    KJ, Jun 12, 2010
    #2
    1. Advertising

  3. Cesar wrote:

    > I've been always coding VHDL in 'data-flow' way. For my last module, I
    > tried to code it in 'one-process' style. Functional simulation was ok,
    > but post-synthesis simulation has different results. I use XST and
    > Modelsim and my device is a Spartan-3.
    > I've employed RTL viewer from Xilinx ISE 11.4 to check out the
    > synthesis results and I've discovered that there is a problem
    > inferring a block-RAM.
    > I only read the block-RAM (ROM).


    If all you need is a rom, consider somthing like this.
    http://mysite.verizon.net/miketreseler/sync_rom.vhd

    > When reading, a block-RAM should
    > latch-in the address in the active clock edge and, after Tco, the data
    > should be output at DO. Synchronously speaking, reading the block-RAM
    > should imply one clock period delay.
    > When inferring the block-RAM in 'one-process' style, XST automatically
    > and always add a register for the address input and a register for the
    > data output (independently of the VHDL code you have).


    Here's a template for a block ram that works for brand A.
    It might work for X also.

    http://mysite.verizon.net/miketreseler/block_ram.vhd

    > Does any body has have a similar problem? Any solution?
    > Unfortunately I think I'll have to recode my module in the old style :-


    If I'm in a hurry I don't try synthesis experiments,
    especially with templates for vendor specific features.
    If I am writing rtl code, single process entities work well.

    -- Mike Treseler
     
    Mike Treseler, Jun 13, 2010
    #3
  4. Cesar

    Cesar Guest

    On Jun 12, 4:05 am, KJ <> wrote:
    > On Jun 11, 12:11 pm, Cesar <> wrote:
    > > I've been always coding VHDL in 'data-flow' way. For my last module, I
    > > tried to code it in 'one-process' style. Functional simulation was ok,
    > > but post-synthesis simulation has different results.

    >
    > And is the post-synthesis sim result that an output pin (or pins)
    > happen exactly one clock cycle late as you later suggest?  Are the
    > setup and hold times to all of the input pins of the device in the
    > simulation meeting the requirements that pop out of the timing
    > report?  A testbench such as the following likely will cause a timing
    > problem and therefore a sim mismatch
    >
    > process(Clock)
    > begin
    >   wait until rising_edge(Clock);
    >   ...
    >   Some_Input <= '1';
    >   ...
    > end if;
    >
    > Instead the signal assignment should be
    >   Some_Input <= '1' after 5 ns; -- As an example



    I don't think it is a set-up or hold time issue. I've recoded my
    module in 'data-flow' way and post-synthesis simulation is ok. I'll
    look into it and I'll make you know.





    >
    > > I use XST and
    > > Modelsim and my device is a Spartan-3.
    > > I've employed RTL viewer from Xilinx ISE 11.4 to check out the
    > > synthesis results and I've discovered that there is a problem
    > > inferring a block-RAM.

    >
    > > I only read the block-RAM (ROM). When reading, a block-RAM should
    > > latch-in the address in the active clock edge and, after Tco, the data
    > > should be output at DO. Synchronously speaking, reading the block-RAM
    > > should imply one clock period delay.
    > > When inferring the block-RAM in 'one-process' style, XST automatically
    > > and always add a register for the address input and a register for the
    > > data output (independently of the VHDL code you have).

    >
    > You have to be a little careful here because the logic for how those
    > address signals are generated as well as the logic that uses the data
    > outputs could be influencing what is going on and you may be mis-
    > interpreting what you're seeing in the viewer.  Synthesis is only
    > supposed to generate a cycle for cycle match at the device pin
    > boundary.  There is no guarantee that internal signals.  Not saying
    > that you're suggesting is wrong, just suggesting something else to
    > check first.
    >
    > > This fact implies that functional simulation does not meet post-
    > > synthesis one and adds an aditional clock period delay when reading
    > > the block-RAM.

    >
    > Only if the setup and hold times on every input pins is correct.  Some
    > people think that they can take their testbench and simply plop down
    > the post-synthesis model and expect it to functionally work in the
    > same manner as the original code.  This is only true if you've met
    > each of the following conditions:
    > - Setup and hold times of all input pins meet the requirements that
    > are specified by the timing report
    > - Sampling time of all output pins only occurs after the timing report
    > specified clock to output requirements are met (or propogation delay
    > for combinatorial outputs)
    > - No synthesis warnings about things that cannot be synthesized
    > exactly.  Some examples are:
    > * x <= y after 10 ns; -- Can't synthesize delays
    > * Incomplete sensitivity lists
    > * Combinatorial loops
    > * Other things that I may have forgotten
    >
    > > Does any body has have a similar problem? Any solution?
    > > Unfortunally I think I'll have to recode my module in the old style :-

    >
    > Before you go down that path, I'd suggest you take the time to
    > investigate this a bit further.  It's quite possible that when you
    > recode to your 'old style' that you might also introduce a subtle
    > difference that accounts for the discrepancy and you'll go off on the
    > mistaken belief that it has something to do with your coding style.


    I believe it is something I've made wrong, since it's my first try
    with this coding style.


    > The thing to do is as Brian suggested, create a testbench that
    > instantiates both parts and compares the two outputs.  I'd add the
    > above mentioned cautions about how to generate inputs and when to
    > sample outputs.  It also might be easier to simply have the block ram
    > I/O be the device I/O in order to remove all of the clutter and see if
    > it really does have something to do with the coding style.
    > Kevin Jennings


    I've created that testbench and there are different outputs. As I told
    you, I'm going to spend more time to think about it and I'll tell you.

    Thanks,
    César
     
    Cesar, Jun 14, 2010
    #4
  5. Cesar

    Cesar Guest

    On Jun 13, 11:45 pm, Mike Treseler <> wrote:
    > Cesar wrote:
    > > I've been always coding VHDL in 'data-flow' way. For my last module, I
    > > tried to code it in 'one-process' style. Functional simulation was ok,
    > > but post-synthesis simulation has different results. I use XST and
    > > Modelsim and my device is a Spartan-3.
    > > I've employed RTL viewer from Xilinx ISE 11.4 to check out the
    > > synthesis results and I've discovered that there is a problem
    > > inferring a block-RAM.
    > > I only read the block-RAM (ROM).

    >
    > If all you need is a rom, consider somthing like this.http://mysite.verizon.net/miketreseler/sync_rom.vhd


    Hello Mike:

    In fact, I've been employing your templates from your site for this
    module .
    Since I was in a hurry, I had to code my module in the 'data-flow' way
    and it worked.
    But I want to understand why 'one-process' module does not.

    I made a testbench instantiating my module and the post-synthesis
    module, as suggested by KJ. The outputs are different.
    The problem is when I try to look into the module to understand where
    the problem is. I'm not used to work with variables (instead of
    signals), and watching the variables in ModelSim is a little messy for
    me. I don't know if they represent the registered value or not.
    When I attach variables to test-point ports, they always get
    registered at the port. So it is impossible to see in the actual cycle
    the combinatorial value. When you have many variables, it is hard to
    remember which test_point is being used combinatorially within the
    module and which one is used registered.

    Any way, now I have a 'data-flow' module that gets synthetized as I
    expected. Then, I'll compare the RTL-view of both modules to
    understand what's going on and I'll post it.

    Regards,
    César
     
    Cesar, Jun 14, 2010
    #5
  6. Cesar wrote:

    > I made a testbench instantiating my module and the post-synthesis
    > module, as suggested by KJ. The outputs are different.
    > The problem is when I try to look into the module to understand where
    > the problem is. I'm not used to work with variables (instead of
    > signals), and watching the variables in ModelSim is a little messy for
    > me. I don't know if they represent the registered value or not.


    I don't know that with signals either.
    The advantage to using abstractions such as variables or
    signals is that I shouldn't have to care as long as
    synthesis works. Unfortunately, synthesis is weak
    on fixed vendor logic like block_rams.

    > When I attach variables to test-point ports, they always get
    > registered at the port.


    That is a result of using a synchronous process.

    > So it is impossible to see in the actual cycle
    > the combinatorial value. When you have many variables, it is hard to
    > remember which test_point is being used combinatorially within the
    > module and which one is used registered.


    With variables, I trace code using the step command.


    > Any way, now I have a 'data-flow' module that gets synthetized as I
    > expected. Then, I'll compare the RTL-view of both modules to
    > understand what's going on and I'll post it.


    Thanks.

    -- Mike Treseler
     
    Mike Treseler, Jun 14, 2010
    #6
  7. Cesar

    Cesar Guest

    On Jun 13, 11:45 pm, Mike Treseler <> wrote:
    > Cesar wrote:
    > > I've employed RTL viewer from Xilinx ISE 11.4 to check out the
    > > synthesis results and I've discovered that there is a problem
    > > inferring a block-RAM.
    > > I only read the block-RAM (ROM).

    >
    > If all you need is a rom, consider somthing like this.http://mysite.verizon.net/miketreseler/sync_rom.vhd
    >
    > > When reading, a block-RAM should
    > > latch-in the address in the active clock edge and, after Tco, the data
    > > should be output at DO. Synchronously speaking, reading the block-RAM
    > > should imply one clock period delay.
    > > When inferring the block-RAM in 'one-process' style, XST automatically
    > > and always add a register for the address input and a register for the
    > > data output (independently of the VHDL code you have).

    >
    > Here's a template for a block ram that works for brand A.
    > It might work for X also.
    >
    > http://mysite.verizon.net/miketreseler/block_ram.vhd


    I finally made it work properly. At first, I was trying to infer the
    blockRAM as a ROM from a single VHDL module (the single-process
    module). I made it like this:

    p_main: process(clk)
    type rom_t is array(0 to 2**NR_BITS_ADDR - 1) of
    std_logic_vector(NR_BITS_DATA - 1 downto 0);
    constant rom_c: rom_t := // ... (read from a file)
    variable data_read_v: unsigned(NR_BITS_DATA - 1 downto 0);
    variable addr_read_v: std_logic_vector(NR_BITS_ADDR - 1 downto 0);

    procedure update_regs is
    begin
    // ...
    data_read_v := rom_c(addr_read_v); // USE addr_read_v before
    updating it (since blocRAM is registered)
    // ...
    addr_read_v := addr_read_v + 1; // UPDATE addr_read_v after
    using it
    // ...
    end procedure update_regs;


    begin
    if rising_edge(clk) then
    if rst = '1' then
    init_regs;
    else
    update_regs;
    end if;
    end if;
    update_ports;
    end process p_main;

    Logic simulation with Modelsim was ok, but post-synthesis simulation
    (ISE) added and additional registering stage.

    Then, I tried to infer the blockRAM (ROM) from an independent VHDL
    module following the template suggested by Mike. At first it did not
    work because of my fault. I did not take into consideration the
    additional register stage added when the addr_read_v goes out to a
    port from the 'one-process style' module towards the blockRAM module.
    After debugging that, I worked ok.

    Thank you all for your help,
    César
     
    Cesar, Jun 27, 2010
    #7
  8. Cesar wrote:
    > ...
    > Then, I tried to infer the blockRAM (ROM) from an independent VHDL
    > module following the template suggested by Mike. At first it did not
    > work because of my fault. I did not take into consideration the
    > additional register stage added when the addr_read_v goes out to a
    > port from the 'one-process style' module towards the blockRAM module.
    > After debugging that, I worked ok.


    Thanks for reporting your results.

    It is good to simulate everything when learning variables.
    (or any other time actually ;)
    For vendor - specific structures like this, trial synthesis is needed also.

    -- Mike Treseler
     
    Mike Treseler, Jun 30, 2010
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. jtw
    Replies:
    1
    Views:
    618
    Mike Treseler
    Mar 9, 2006
  2. Replies:
    1
    Views:
    662
  3. Peppe
    Replies:
    3
    Views:
    3,697
  4. Guy_Sweden
    Replies:
    2
    Views:
    784
    David R Brooks
    Oct 21, 2006
  5. ashu
    Replies:
    1
    Views:
    482
Loading...

Share This Page