Benchmark results unrealistic?

H

Hans Mull

Hi!
I've created a benchmark tool which uses Agner Fog's asmlib to count the
clockcycles a function takes. I 'm using it to measure the
MersenneTwister.h speed.
Sourcecode is here:
http://code.google.com/p/multirng/source/browse/trunk/benchmarks/benchmarks.h
(the main function just calls this functions)
When I run on a P4 Prescott (MinGW with GCC4, Win XP MediaCenter) with
-O3 -fexpensive-optimizations and Prescott-specific optimizations, it
shows me that e.g. mtr.rand() takes ~1200 clockcycles. I think this is
realistic.
But when I write something like

time[0] = ReadTSC();
for(int i = 0;i < NUMTESTS;i++) rand();
time[1] = ReadTSC();

and

cout << "rand() time:" << (time[1]-time[0])/NUMTESTS << endl;

it shows me that it (and the other functions too) takes only 20
clockcycles. Is this realistic? I think it's OK that when you call the
function it takes more clockcycles than in the average, but 20
clockcycles for creating a random number? However, even if I set
NUMTESTS to higher or lower values, the result remains the same (except
of a difference of about 3 or 4 clockcycles)

Thanks in advance, Hans
 
V

Victor Bazarov

Hans said:
I've created a benchmark tool which uses Agner Fog's asmlib to count
the clockcycles a function takes. [..]
when I write something like

time[0] = ReadTSC();
for(int i = 0;i < NUMTESTS;i++) rand();
time[1] = ReadTSC();

and

cout << "rand() time:" << (time[1]-time[0])/NUMTESTS << endl;

it shows me that it (and the other functions too) takes only 20
clockcycles. Is this realistic? I think it's OK that when you call the
function it takes more clockcycles than in the average, but 20
clockcycles for creating a random number? However, even if I set
NUMTESTS to higher or lower values, the result remains the same
(except of a difference of about 3 or 4 clockcycles)

Your code between assigning to time[0] and time[1] have no side
effects, most likely. It is entirely conceivable that the compiler
knows that and optimizes the loop with its non-sensical body to nil.
You need to look at the assembly output (disassembly in a debugger
would do) in order to understand what exactly you're measuring.

V
 
J

James Kanze

I've created a benchmark tool which uses Agner Fog's asmlib to count the
clockcycles a function takes. I 'm using it to measure the
MersenneTwister.h speed.
Sourcecode is here:http://code.google.com/p/multirng/source/browse/trunk/benchmarks/benc...
(the main function just calls this functions)
When I run on a P4 Prescott (MinGW with GCC4, Win XP MediaCenter) with
-O3 -fexpensive-optimizations and Prescott-specific optimizations, it
shows me that e.g. mtr.rand() takes ~1200 clockcycles. I think this is
realistic.
But when I write something like
time[0] = ReadTSC();
for(int i = 0;i < NUMTESTS;i++) rand();
time[1] = ReadTSC();

cout << "rand() time:" << (time[1]-time[0])/NUMTESTS << endl;
it shows me that it (and the other functions too) takes only
20 clockcycles. Is this realistic? I think it's OK that when
you call the function it takes more clockcycles than in the
average, but 20 clockcycles for creating a random number?

That sounds a bit high for the usual implementations of rand(),
yes. But maybe your platform uses something better than the
usual implementations. Which aren't always that good, although
on a 64 bit machine, you can implement a reasonable good RGN
with only 2 cycles of computation. And of course, since it is a
function, you have the overhead of a function call in there. On
some machines, that can be several clock cycles in itself. Plus
the stores to memory, etc.

Of course, clock cycles don't really mean much on a modern
machine anyway. Most modern machines are capable of executing
several instructions in parallel, in a single clock, if there
are no dependencies, where as a rapid sequence of memory
accesses may lead to the memory pipeline staturating, and
several clocks in which no instructions can be executed. The
time it takes to execute rand() in a loop like this is probably
not typical of the time it would take to execute it in normal
program flow.
 
J

James Kanze

Hans said:
I've created a benchmark tool which uses Agner Fog's asmlib to count
the clockcycles a function takes. [..]
when I write something like
time[0] = ReadTSC();
for(int i = 0;i < NUMTESTS;i++) rand();
time[1] = ReadTSC();
and
cout << "rand() time:" << (time[1]-time[0])/NUMTESTS << endl;
it shows me that it (and the other functions too) takes only 20
clockcycles. Is this realistic? I think it's OK that when you call the
function it takes more clockcycles than in the average, but 20
clockcycles for creating a random number? However, even if I set
NUMTESTS to higher or lower values, the result remains the same
(except of a difference of about 3 or 4 clockcycles)
Your code between assigning to time[0] and time[1] have no side
effects, most likely.

The function rand() has side effects. Otherwise, it would
always return the same value. (Technically conform, but not
what we'd expect from QoI considerations.)

As I said in my answer, 20 clocks sounds a bit high, supposing a
linear congruent generator (which is by far the most common
implementation), but is not out of order, especially on a 32 bit
machine.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top