Square a float: pow or f*f?

C

chrisstankevitz

Any on consensus on which of these is faster?

inline float Square1(float f) { return std::pow(f, 2.0f); }
inline float Square2(float f) { return f*f; }

Chris
 
C

chrisstankevitz

For optimizations, follow the accepted principles (1) don't do it, (2) don't do
it yet, and (3) if you still feel an irresistible urge, then measure, measure,
measure, and finally, don't do it.

Alf,

Thanks for the response. I am reorganizing a function that, according
to my profiler, is the bottleneck. I suppose the answer is I need to
try both methods and see which is faster according to the profiler. I
was hoping someone here would know for the special case of squaring.

Thanks again for your help,

Chris
 
J

James Kanze

Thanks for the response. I am reorganizing a function that,
according to my profiler, is the bottleneck. I suppose the
answer is I need to try both methods and see which is faster
according to the profiler. I was hoping someone here would
know for the special case of squaring.

The answer is that it will depend on the machine and the
compiler (although typically, I would expect a*a to be faster).
The answer is also that even in a tight loop which does nothing
else, the difference is likely to be insignificant.
 
J

Juha Nieminen

James said:
The answer is that it will depend on the machine and the
compiler (although typically, I would expect a*a to be faster).
The answer is also that even in a tight loop which does nothing
else, the difference is likely to be insignificant.

At least on Intel processors a pow() will be inherently slower than a
multiplication. However, many compilers are able to optimize a
"std::pow(d, 2.0)" call into "d*d".

I tested this on my computer, using the program below, using gcc 4.1.2
with the compiler options "-O3 -march=pentium4 -s" and I got these results:

Time: 1.69 s, result = 2.66667e+18
Time: 1.69 s, result = 2.66667e+18
Time: 47.72 s, result = 2.69852e+18

The first and second tests show no difference, so clearly gcc is
optimizing the pow() call away. The third version forces gcc to perform
a true pow() call, and it's a lot slower.


#include <cmath>
#include <ctime>
#include <iostream>

inline double square1(double d) { return d*d; }
inline double square2(double d) { return std::pow(d, 2.0); }
inline double square3(double d) { return std::pow(d, 2.001); }

template<typename F>
void test(F f)
{
clock_t t1 = std::clock();
double res = 0, d = .001;
for(int i = 0; i < 200000000; ++i)
{
res += f(d);
d += .001;
}
clock_t t2 = std::clock();

std::cout << "Time: " << int((t2-t1)*100.0/CLOCKS_PER_SEC)/100.0
<< " s, result = " << res << std::endl;
}

int main()
{
test(square1);
test(square2);
test(square3);
}
 
C

chrisstankevitz

The first and second tests show no difference, so clearly gcc is
optimizing the pow() call away. The third version forces gcc to perform
a true pow() call, and it's a lot slower.

Juha,

Thanks for your help and test results!

Chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,043
Latest member
CannalabsCBDReview

Latest Threads

Top