Benchmark results unrealistic?

Discussion in 'C++' started by Hans Mull, Feb 11, 2008.

  1. Hans Mull

    Hans Mull Guest

    Hi!
    I've created a benchmark tool which uses Agner Fog's asmlib to count the
    clockcycles a function takes. I 'm using it to measure the
    MersenneTwister.h speed.
    Sourcecode is here:
    http://code.google.com/p/multirng/source/browse/trunk/benchmarks/benchmarks.h
    (the main function just calls this functions)
    When I run on a P4 Prescott (MinGW with GCC4, Win XP MediaCenter) with
    -O3 -fexpensive-optimizations and Prescott-specific optimizations, it
    shows me that e.g. mtr.rand() takes ~1200 clockcycles. I think this is
    realistic.
    But when I write something like

    time[0] = ReadTSC();
    for(int i = 0;i < NUMTESTS;i++) rand();
    time[1] = ReadTSC();

    and

    cout << "rand() time:" << (time[1]-time[0])/NUMTESTS << endl;

    it shows me that it (and the other functions too) takes only 20
    clockcycles. Is this realistic? I think it's OK that when you call the
    function it takes more clockcycles than in the average, but 20
    clockcycles for creating a random number? However, even if I set
    NUMTESTS to higher or lower values, the result remains the same (except
    of a difference of about 3 or 4 clockcycles)

    Thanks in advance, Hans
    Hans Mull, Feb 11, 2008
    #1
    1. Advertising

  2. Hans Mull wrote:
    > I've created a benchmark tool which uses Agner Fog's asmlib to count
    > the clockcycles a function takes. [..]
    > when I write something like
    >
    > time[0] = ReadTSC();
    > for(int i = 0;i < NUMTESTS;i++) rand();
    > time[1] = ReadTSC();
    >
    > and
    >
    > cout << "rand() time:" << (time[1]-time[0])/NUMTESTS << endl;
    >
    > it shows me that it (and the other functions too) takes only 20
    > clockcycles. Is this realistic? I think it's OK that when you call the
    > function it takes more clockcycles than in the average, but 20
    > clockcycles for creating a random number? However, even if I set
    > NUMTESTS to higher or lower values, the result remains the same
    > (except of a difference of about 3 or 4 clockcycles)


    Your code between assigning to time[0] and time[1] have no side
    effects, most likely. It is entirely conceivable that the compiler
    knows that and optimizes the loop with its non-sensical body to nil.
    You need to look at the assembly output (disassembly in a debugger
    would do) in order to understand what exactly you're measuring.

    V
    --
    Please remove capital 'A's when replying by e-mail
    I do not respond to top-posted replies, please don't ask
    Victor Bazarov, Feb 11, 2008
    #2
    1. Advertising

  3. Hans Mull

    James Kanze Guest

    On Feb 11, 3:30 pm, Hans Mull <> wrote:

    > I've created a benchmark tool which uses Agner Fog's asmlib to count the
    > clockcycles a function takes. I 'm using it to measure the
    > MersenneTwister.h speed.
    > Sourcecode is here:http://code.google.com/p/multirng/source/browse/trunk/benchmarks/benc...
    > (the main function just calls this functions)
    > When I run on a P4 Prescott (MinGW with GCC4, Win XP MediaCenter) with
    > -O3 -fexpensive-optimizations and Prescott-specific optimizations, it
    > shows me that e.g. mtr.rand() takes ~1200 clockcycles. I think this is
    > realistic.
    > But when I write something like


    > time[0] = ReadTSC();
    > for(int i = 0;i < NUMTESTS;i++) rand();
    > time[1] = ReadTSC();


    > and


    > cout << "rand() time:" << (time[1]-time[0])/NUMTESTS << endl;


    > it shows me that it (and the other functions too) takes only
    > 20 clockcycles. Is this realistic? I think it's OK that when
    > you call the function it takes more clockcycles than in the
    > average, but 20 clockcycles for creating a random number?


    That sounds a bit high for the usual implementations of rand(),
    yes. But maybe your platform uses something better than the
    usual implementations. Which aren't always that good, although
    on a 64 bit machine, you can implement a reasonable good RGN
    with only 2 cycles of computation. And of course, since it is a
    function, you have the overhead of a function call in there. On
    some machines, that can be several clock cycles in itself. Plus
    the stores to memory, etc.

    Of course, clock cycles don't really mean much on a modern
    machine anyway. Most modern machines are capable of executing
    several instructions in parallel, in a single clock, if there
    are no dependencies, where as a rapid sequence of memory
    accesses may lead to the memory pipeline staturating, and
    several clocks in which no instructions can be executed. The
    time it takes to execute rand() in a loop like this is probably
    not typical of the time it would take to execute it in normal
    program flow.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Feb 12, 2008
    #3
  4. Hans Mull

    James Kanze Guest

    On Feb 11, 3:39 pm, "Victor Bazarov" <> wrote:
    > Hans Mull wrote:
    > > I've created a benchmark tool which uses Agner Fog's asmlib to count
    > > the clockcycles a function takes. [..]
    > > when I write something like


    > > time[0] = ReadTSC();
    > > for(int i = 0;i < NUMTESTS;i++) rand();
    > > time[1] = ReadTSC();


    > > and


    > > cout << "rand() time:" << (time[1]-time[0])/NUMTESTS << endl;


    > > it shows me that it (and the other functions too) takes only 20
    > > clockcycles. Is this realistic? I think it's OK that when you call the
    > > function it takes more clockcycles than in the average, but 20
    > > clockcycles for creating a random number? However, even if I set
    > > NUMTESTS to higher or lower values, the result remains the same
    > > (except of a difference of about 3 or 4 clockcycles)


    > Your code between assigning to time[0] and time[1] have no side
    > effects, most likely.


    The function rand() has side effects. Otherwise, it would
    always return the same value. (Technically conform, but not
    what we'd expect from QoI considerations.)

    As I said in my answer, 20 clocks sounds a bit high, supposing a
    linear congruent generator (which is by far the most common
    implementation), but is not out of order, especially on a 32 bit
    machine.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Feb 12, 2008
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Dang Griffith

    Re: any benchmark results for python?

    Dang Griffith, Jun 25, 2003, in forum: Python
    Replies:
    0
    Views:
    1,276
    Dang Griffith
    Jun 25, 2003
  2. Daniel Berger

    StringIO affecting Benchmark results

    Daniel Berger, Aug 25, 2004, in forum: Ruby
    Replies:
    0
    Views:
    151
    Daniel Berger
    Aug 25, 2004
  3. Juan Alvarez

    Help interpreting benchmark results

    Juan Alvarez, Feb 24, 2009, in forum: Ruby
    Replies:
    3
    Views:
    155
    Sandor Szücs
    Feb 25, 2009
  4. Replies:
    3
    Views:
    150
  5. Jorge
    Replies:
    16
    Views:
    175
    Jorge
    Jun 22, 2008
Loading...

Share This Page