Square a float: pow or f*f?

chrisstankevitz · Jun 22, 2008

Any on consensus on which of these is faster?

inline float Square1(float f) { return std:

ow(f, 2.0f); }
inline float Square2(float f) { return f*f; }

Chris

chrisstankevitz · Jun 22, 2008

For optimizations, follow the accepted principles (1) don't do it, (2) don't do
it yet, and (3) if you still feel an irresistible urge, then measure, measure,
measure, and finally, don't do it.

Alf,

Thanks for the response. I am reorganizing a function that, according
to my profiler, is the bottleneck. I suppose the answer is I need to
try both methods and see which is faster according to the profiler. I
was hoping someone here would know for the special case of squaring.

Thanks again for your help,

Chris

James Kanze · Jun 22, 2008

Thanks for the response. I am reorganizing a function that,
according to my profiler, is the bottleneck. I suppose the
answer is I need to try both methods and see which is faster
according to the profiler. I was hoping someone here would
know for the special case of squaring.

The answer is that it will depend on the machine and the
compiler (although typically, I would expect a*a to be faster).
The answer is also that even in a tight loop which does nothing
else, the difference is likely to be insignificant.

Juha Nieminen · Jun 22, 2008

James said:
The answer is that it will depend on the machine and the
compiler (although typically, I would expect a*a to be faster).
The answer is also that even in a tight loop which does nothing
else, the difference is likely to be insignificant.

At least on Intel processors a pow() will be inherently slower than a
multiplication. However, many compilers are able to optimize a
"std:

ow(d, 2.0)" call into "d*d".

I tested this on my computer, using the program below, using gcc 4.1.2
with the compiler options "-O3 -march=pentium4 -s" and I got these results:

Time: 1.69 s, result = 2.66667e+18
Time: 1.69 s, result = 2.66667e+18
Time: 47.72 s, result = 2.69852e+18

The first and second tests show no difference, so clearly gcc is
optimizing the pow() call away. The third version forces gcc to perform
a true pow() call, and it's a lot slower.

#include <cmath>
#include <ctime>
#include <iostream>

inline double square1(double d) { return d*d; }
inline double square2(double d) { return std:

ow(d, 2.0); }
inline double square3(double d) { return std:

ow(d, 2.001); }

template<typename F>
void test(F f)
{
clock_t t1 = std::clock();
double res = 0, d = .001;
for(int i = 0; i < 200000000; ++i)
{
res += f(d);
d += .001;
}
clock_t t2 = std::clock();

std::cout << "Time: " << int((t2-t1)*100.0/CLOCKS_PER_SEC)/100.0
<< " s, result = " << res << std::endl;
}

int main()
{
test(square1);
test(square2);
test(square3);
}

chrisstankevitz · Jun 22, 2008

The first and second tests show no difference, so clearly gcc is
optimizing the pow() call away. The third version forces gcc to perform
a true pow() call, and it's a lot slower.

Juha,

Thanks for your help and test results!

Chris

Java OpenJDK Floating Point Dare	3	Jan 17, 2023
Minimising chi square to fit two parameters	1	Dec 11, 2022
Drawing missing in bitmap in a pure C win32 program	4	Jun 3, 2023
float* f vs float *f	7	Nov 8, 2008
Functions	2	Oct 5, 2022
Trying to build a SARIMAX model to forecast the S&P500 trend	0	Nov 5, 2023
Issues with writing pytest	0	Sep 9, 2022
Multiply float by -1: fast or slow?	10	Jan 25, 2007

Square a float: pow or f*f?

chrisstankevitz

chrisstankevitz

James Kanze

Juha Nieminen

chrisstankevitz

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads