Benchmarking multiplications and additions

Michel Rouzic · Mar 18, 2006

I need to determine how long does an addition take and how long does a
multiplication takes. So far I've been trying to use the clock()
function in my programs in order to find out how long it took the CPU
to compute it, the only problem is that I get fairly inconsistent
results, inconsistent enough not to know between two codes which code
runs faster without running each about ten times and making the average
of the reported CPU times.

Right now my main problem is to determine whether 3 float additions are
slower or faster than one float multiplication. I tried making small
programs looping some additions or some multiplications to try to find
out, but not only do I still get inconsistent results, I'm also not
sure if it is even right (i'm not sure that performing 3*5 about one
billion times is relevant)

So how do I determine reliably how long it takes for my CPU to compute
an addition or a multiplication (or even other opeartions)?

Walter Roberson · Mar 18, 2006

I need to determine how long does an addition take and how long does a
multiplication takes.

The C language standards have no specification about timings of
operations, nor about methods to measure those timings. Thus,
comp.lang.c is not the best newsgroup to discuss such matters in.

Right now my main problem is to determine whether 3 float additions are
slower or faster than one float multiplication. I tried making small
programs looping some additions or some multiplications to try to find
out, but not only do I still get inconsistent results, I'm also not
sure if it is even right (i'm not sure that performing 3*5 about one
billion times is relevant)

There are a lot of factors that could be affecting the timing,
including matters of what else the CPU is working on, and including
cache effects and branch timing, and (on some CPUs) branch prediction,
and including how your compiler handles loop unrolling.

A benchmarks newsgroup could likely be of assistance in writing
more accurate measuring code.

Vladimir S. Oka · Mar 18, 2006

Michel Rouzic opined:

I need to determine how long does an addition take and how
long does a multiplication takes. So far I've been trying to
use the clock() function in my programs in order to find out
how long it took the CPU to compute it, the only problem is
that I get fairly inconsistent results, inconsistent enough
not to know between two codes which code runs faster without
running each about ten times and making the average of the
reported CPU times.

said:
So how do I determine reliably how long it takes for my CPU to
compute an addition or a multiplication (or even other
opeartions)?

As was noted by others, this is off topic here.

Ever so slightly reply is: `clock()` may not provide the
granularity you require. It is probably much better to use
execution profiler. Inquire in group covering your tool chain
and/or OS for a good one (e.g. `gprof` is available on Linux).

Remco Rijnders · Mar 19, 2006

Michel said:
I need to determine how long does an addition take and how long does a
multiplication takes. So far I've been trying to use the clock()
function in my programs in order to find out how long it took the CPU
to compute it, the only problem is that I get fairly inconsistent
results, inconsistent enough not to know between two codes which code
runs faster without running each about ten times and making the average
of the reported CPU times.

If the result of doing this test once is too irregular to draw conclusions
from, why not do it a million times and time that instead? Make sure to
take random numbers so the compiler it won't optimise the computations out
by replacing it with a constant value.

Remco

Michel Rouzic · Mar 20, 2006

Remco said:
If the result of doing this test once is too irregular to draw conclusions
from, why not do it a million times and time that instead? Make sure to
take random numbers so the compiler it won't optimise the computations out
by replacing it with a constant value.

Yeah but the problem with making the test a million times is that it's
pretty much what I'm trying to avoid. Well that's ok if it's about
multiplications and additions, but when it comes to testing a program
or peice of a program, it makes you have to make the thing run for
several minutes, as if you had a regular measurement of the computation
time, it would have to run for a veyr few seconds (well ok, this is
slightly beyond my original mutliplication vs. addition problem, but
it's still quite the same problem, it's all about measuring how
efficient something is compared to another)

Walter Roberson · Mar 20, 2006

Yeah but the problem with making the test a million times is that it's
pretty much what I'm trying to avoid. Well that's ok if it's about
multiplications and additions, but when it comes to testing a program
or peice of a program, it makes you have to make the thing run for
several minutes, as if you had a regular measurement of the computation
time, it would have to run for a veyr few seconds (well ok, this is
slightly beyond my original mutliplication vs. addition problem, but
it's still quite the same problem, it's all about measuring how
efficient something is compared to another)

If it isn't worth running a test for several minutes, then that
suggests that the difference in timing between the two varients
is not expected to total more than a few seconds per run. In such a
case, I would have to wonder whether it is worth even considering
the optimization.

Like people say around here: don't bother optimizing until you
have good reason to believe that the section is a bottleneck
(or there are -very- good reasons why performance is critical -- and
if it -is-, then you need someone on your team who understands
performance analysis very well!)

Eric Sosman · Mar 20, 2006

Michel said:
Yeah but the problem with making the test a million times is that it's
pretty much what I'm trying to avoid. Well that's ok if it's about
multiplications and additions, but when it comes to testing a program
or peice of a program, it makes you have to make the thing run for
several minutes, as if you had a regular measurement of the computation
time, it would have to run for a veyr few seconds (well ok, this is
slightly beyond my original mutliplication vs. addition problem, but
it's still quite the same problem, it's all about measuring how
efficient something is compared to another)

Question: How does the "several minutes" of testing time
compare to the amount of time you've already spent reading
and writing messages in this thread? To put it another way,
how many test runs must you avoid in the future to recoup the
time you've already spent?

Michel Rouzic · Mar 20, 2006

Walter said:
If it isn't worth running a test for several minutes, then that
suggests that the difference in timing between the two varients
is not expected to total more than a few seconds per run. In such a
case, I would have to wonder whether it is worth even considering
the optimization.

The problem is that I'm not just comparing to clock_t values, but
comparing them in a loop and summing them in some value. I've tried
doing some stuff like that around a bit of code in a loop that I knew
it wasn't taking anything, and it appeared with clock() to take many
times more time than I knew it actually took.

Like people say around here: don't bother optimizing until you
have good reason to believe that the section is a bottleneck
(or there are -very- good reasons why performance is critical -- and
if it -is-, then you need someone on your team who understands
performance analysis very well!)

Yeah well, the whole thing about the 3 additions vs 1 multiplication
wasn't so important in the problem i'm trying to solve right now
(although i'd be glad to know how the two compare), but you know my
goal is to implement real-time convolution of sound in Windows, so
testing how to do to make it use 1.3% of the CPU instead of 15.4% is
quite important, I guess we can say performance is critical, and I'm
the only one on my team, so i gotta deal with this performance analysis
thing.

So far making a set of clock_t variables and using them properly has
proven usefull to tell which parts are the most critical, but now up to
tellin which setting makes it run the fastest, it's somewhat harder.

Is there some library out there that can be used in C programs to
measure the performance accuratly?

Michel Rouzic · Mar 20, 2006

Eric said:
Question: How does the "several minutes" of testing time
compare to the amount of time you've already spent reading
and writing messages in this thread? To put it another way,
how many test runs must you avoid in the future to recoup the
time you've already spent?

Two or three, alright, maybe four. I think convolving a 882 million
sample signal compares well with reading a message and replying to it.
Anyways, it's not like the ultimate time i'll ever meet this
performance measuring problem. I've met it before (but for programs
where performance was more about comfort than something critical), I'm
dealing with it now and I'll undoubtfully meet it again many times in
the future.

Eric Sosman · Mar 20, 2006

Michel Rouzic wrote On 03/20/06 13:06,:

Two or three, alright, maybe four. I think convolving a 882 million
sample signal compares well with reading a message and replying to it.
Anyways, it's not like the ultimate time i'll ever meet this
performance measuring problem. I've met it before (but for programs
where performance was more about comfort than something critical), I'm
dealing with it now and I'll undoubtfully meet it again many times in
the future.

Then your worry wasn't about the "several minutes"
after all, was it?

<off-topic>

You're going to have a difficult time getting a useful
answer out of a micro-benchmark. CPUs these days are rather
more complex than they were in the time of Babbage, and
notions like "the time required for a multiplication" are
too simplistic to describe their behavior. For starters,
most CPUs built in the past decade or so can crunch the data
faster than memory can supply operands or consume results.
(Compare your current machine's CPU clock frequency to the
speed of the memory chips, and you'll see what I mean -- for
example, the reciprocal of 50 ns is 0.02 GHz.)

The speed disparity means that memory effects dominate
the performance of large-scale numerical calculations. The
question isn't how fast the CPU runs, but how to keep it
busy as opposed to twiddling its thumbs waiting for memory.
The performance of the various caches that stand between the
CPU and the memory will have more to do with overall throughput
than will the choice of what operation to perform on the data
once you've got hold of it. If you want to keep the operands
flowing in and the answers flowing out, you need to pay close
attention to organizing your data in cache-friendly ways --
all of which are system-specific and outside the scope of the
C language, of course. The only semi-portable hint I can
offer is that you might try to do as much work as possible
with each value once you've fetched it to the CPU; pushing an
intermediate result back to main memory and then re-fetching
it to complete the computation later is likely to slow you down.

This isn't to say that the question of multiplication
speed vs. addition speed is wholly without meaning, just that
in the context of your larger problem it'll be at best a
second-order consideration.

</off-topic>

Michel Rouzic · Mar 21, 2006

Eric said:
Michel Rouzic wrote On 03/20/06 13:06,:

Then your worry wasn't about the "several minutes"
after all, was it?

Well yes it was, in a way, and in another way, its more about accuracy.
See, if I had a reliable way of measuring the performance of my stuff,
I would only need to perform short computation to get reliable and
consistant figures. But instead I get meaningless numbers when
performing a short computation, and get reliable numbers only when the
computation lasts in average about 25 minutes. You know, when you're
looking for which settings offer the best performance, you don't even
wanna wait 5 minutes between each tests.

<off-topic>

You're going to have a difficult time getting a useful
answer out of a micro-benchmark. CPUs these days are rather
more complex than they were in the time of Babbage, and
notions like "the time required for a multiplication" are
too simplistic to describe their behavior. For starters,
most CPUs built in the past decade or so can crunch the data
faster than memory can supply operands or consume results.
(Compare your current machine's CPU clock frequency to the
speed of the memory chips, and you'll see what I mean -- for
example, the reciprocal of 50 ns is 0.02 GHz.)

Yeah, I understand that, but the way my problem has evolved since I
posted it, at last it's more about reliably determining the performance
of an algorithm meant to be used as such depending on the settings than
determining something as abstract as multiplication vs. addition
without a clearly defined concept. The nature of the problem still
remains quite the same tho.

The speed disparity means that memory effects dominate
the performance of large-scale numerical calculations. The
question isn't how fast the CPU runs, but how to keep it
busy as opposed to twiddling its thumbs waiting for memory.
The performance of the various caches that stand between the
CPU and the memory will have more to do with overall throughput
than will the choice of what operation to perform on the data
once you've got hold of it. If you want to keep the operands
flowing in and the answers flowing out, you need to pay close
attention to organizing your data in cache-friendly ways --
all of which are system-specific and outside the scope of the
C language, of course. The only semi-portable hint I can
offer is that you might try to do as much work as possible
with each value once you've fetched it to the CPU; pushing an
intermediate result back to main memory and then re-fetching
it to complete the computation later is likely to slow you down.

Well, i don't have the feeling I can do much about organizing my data,
since i'm dealing with one-dimensional arrays (well idk how i could
organize that differently) and mostly that the most CPU hungry part of
the whole thing is based on a library.

This isn't to say that the question of multiplication
speed vs. addition speed is wholly without meaning, just that
in the context of your larger problem it'll be at best a
second-order consideration.

</off-topic>

Do you have an idea of where I should ask about how to measure the
performance reliably? (considered that it can be a Windows-speficic
answer)

Benchmarking	0	Jun 20, 2004
a timer for benchmarking	2	Jun 28, 2004
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
Hello and Help please :-)	1	Jul 16, 2022
Tasks	1	Nov 29, 2022
Benchmarking, part 2	8	Jun 21, 2004
complexity of trigonometric functions	25	Sep 2, 2010
Find and count strings of text from multiple files	17	Dec 16, 2021

Benchmarking multiplications and additions

Michel Rouzic

Walter Roberson

Vladimir S. Oka

Remco Rijnders

Michel Rouzic

Walter Roberson

Eric Sosman

Michel Rouzic

Michel Rouzic

Eric Sosman

Michel Rouzic

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads