fast power function

Chris Forone · Jan 21, 2008

hello group,

is there some reference implementation of a fast power function? how
fast is the power function in <cmath>?

thanks & hand, chris

Tomás Ó hÉilidhe · Jan 21, 2008

Chris Forone:

hello group,

is there some reference implementation of a fast power function? how
fast is the power function in <cmath>?

thanks & hand, chris

Very probably the fastest method of calculating powers for that platform.

Juha Nieminen · Jan 21, 2008

Tomás Ó hÉilidhe said:
Very probably the fastest method of calculating powers for that platform.

Not necessarily in all cases.

For example in intel architectures pow() is rather slow because
there's no such FPU opcode in 387. We are probably talking about many
hundreds, if not even over a thousand clock cycles even on a Pentium.
While compilers may try to optimize the pow() call away if they can,
they often can't.

In some cases it may be faster to "open up" a calculation than
calling pow(). For example, in many cases it may be slower to perform a
"pow(x, 1.5)" than a "x*sqrt(x)" (many compilers are unable to optimize
the former into the latter).

But of course this is more related to architectures and compilers than
to C++, and thus slightly off-topic.

Tomás Ó hÉilidhe · Jan 21, 2008

Would it not make sense for the standard library to have a pow function
that deals only with integer exponents?

unsigned pow_int(unsigned const x,unsigned exp)
{
unsigned retval = 1;

while ( ; exp; --exp) retval *= x;

return retval;
}

Pete Becker · Jan 21, 2008

Would it not make sense for the standard library to have a pow function
that deals only with integer exponents?

It has three: pow(float, int), pow(double, int), and pow(long double, int).

Juha Nieminen · Jan 21, 2008

Tomás Ó hÉilidhe said:
Would it not make sense for the standard library to have a pow function
that deals only with integer exponents?

Compilers do have specialized pow() functions when the exponent is
integer.

unsigned pow_int(unsigned const x,unsigned exp)
{
unsigned retval = 1;

while ( ; exp; --exp) retval *= x;

return retval;
}

That's *not* the fastest way of calculating pow(x, integer).

Jerry Coffin · Jan 22, 2008

[ ... ]

For example in intel architectures pow() is rather slow because
there's no such FPU opcode in 387. We are probably talking about many
hundreds, if not even over a thousand clock cycles even on a Pentium.
While compilers may try to optimize the pow() call away if they can,
they often can't.

I don't know of any processor that would let you implement pow in a
single instruction -- but Intel floating point includes instructions for
logarithm and inverse logarithm, which make pow pretty easy to
implement.

As far as speed goes, you should be looking at about 150-190 CPU cycles
on a reasonably modern CPU (depending somewhat on data). I haven't found
any data for which it takes 200 cycles on my machine, and it's a few
years old -- I believe current CPUs are typically at least 20-30% faster
at the same clock speed.

Jerry Coffin · Jan 22, 2008

hello group,

is there some reference implementation of a fast power function? how
fast is the power function in <cmath>?

The implementation in the library is likely to be oriented more toward
being general purpose than the fastest for a given situation. You can
probably do substantially better if (and only if) you know a fair amount
about the data you're working with, so you can write something more
specialized.

Juha Nieminen · Jan 22, 2008

Jerry said:
I don't know of any processor that would let you implement pow in a
single instruction -- but Intel floating point includes instructions for
logarithm and inverse logarithm, which make pow pretty easy to
implement.

Actually it's not "pretty easy", as there's a twist.

pow(x, y) = pow(2, y*log2(x)), and the 387 happens to have both an
opcode for calculating y*log2(x), fyl2x, and one for calculating
pow(2, x)-1, f2xm1. The twist is, however, that with the latter x must
be inside the range from -1.0 to +1.0. This means that the x in the
original pow(x, y) cannot be used as it is with f2xm1, but it must be
split using the property 2^(a+b) = 2^a*2^b.

The complete asm code for calculating pow(x, y), assuming x and y have
already been loaded into the FPU, is something like (with reported clock
cycles for the pentium):

fyl2x ; 22-111
fld st ; 1
frndint ; 9-20
fsub st(1), st ; 3
fxch st(2) ; 0-1
f2xm1 ; 13-57
fld1 ; 2
fadd ; 3
fscale ; 20-31
fstp st(1) ; 1

I suppose the absolute best case scenario is a total of 74 clock
cycles, and the worst is 230 clock cycles, with possible stalls.

I suppose I overestimated the required clock cycles.

James Kanze · Jan 22, 2008

Compilers do have specialized pow() functions when the exponent is
integer.

That's *not* the fastest way of calculating pow(x, integer).

Nor the most accurate.

Lionel B · Jan 22, 2008

It may or may not be...

Nor the most accurate.

Surely it's guaranteed 100% accurate* - it's integer arithmetic.

*provided retval doesn't overflow.

Jerry Coffin · Jan 22, 2008

Actually it's not "pretty easy", as there's a twist.

Actually, I'm well aware of the "twists". At least by my figuring, it
still qualifies as "pretty easy".

pow(x, y) = pow(2, y*log2(x)), and the 387 happens to have both an
opcode for calculating y*log2(x), fyl2x, and one for calculating
pow(2, x)-1, f2xm1. The twist is, however, that with the latter x must
be inside the range from -1.0 to +1.0. This means that the x in the
original pow(x, y) cannot be used as it is with f2xm1, but it must be
split using the property 2^(a+b) = 2^a*2^b.

Yes -- and that doesn't stop it from being pretty easy.

The complete asm code for calculating pow(x, y), assuming x and y have
already been loaded into the FPU, is something like (with reported clock
cycles for the pentium):

fyl2x ; 22-111
fld st ; 1
frndint ; 9-20
fsub st(1), st ; 3
fxch st(2) ; 0-1
f2xm1 ; 13-57
fld1 ; 2
fadd ; 3
fscale ; 20-31
fstp st(1) ; 1

That's certainly similar to the code I've always used:

fyl2x
fld st(0)
frndint
fld st(1)
fsub st(0),st(1)
f2xm1
fld1
faddp st(1),st(0)
fxch st(1)
fld1
fscale
fxch st(1)
fstp st(0)
fmulp st(1),st(0)

I suppose the absolute best case scenario is a total of 74 clock
cycles, and the worst is 230 clock cycles, with possible stalls.

I suppose I overestimated the required clock cycles.

Especially since you seem to be basing your clock counts on the Pentium,
which is an _ancient_ processor. The Pentium was discontinued more than
a decade ago. Even if you obtain your computers from other people's
trash, you can something a lot newer by now.

To put things in perspective, using the code above, it took about half
again longer to load X and Y from memory into the FPU than it did to
compute X^Y after they were loaded.

Jerry Coffin · Jan 22, 2008

Would it not make sense for the standard library to have a pow function
that deals only with integer exponents?

It already has such overloads, though there's no guarantee that they
work any differently than for floating point exponents.

unsigned pow_int(unsigned const x,unsigned exp)
{
unsigned retval = 1;

while ( ; exp; --exp) retval *= x;

return retval;
}

This isn't a particularly good algorithm unless exp is _quite_ small
(low double digits at most). For an N-bit exponent, this takes a maximum
of 2^N-1 multiplications. Assuming values of exp were randomly
distributed, it would take about 2^(N-1) multiplications on average.

You can do a lot better than that, taking roughly 2N multiplications in
the worst case, and something like 1.5N multiplications in the typical
case (actually, it should be somewhat less than that, but I haven't
bothered to figure up the exact number -- in any case, it's linear on
the number of bits instead of exponential.

Here's an implementation with some test code and instrumentation:

#include <vector>
#include <numeric>

unsigned mults;

template <class T>
T mult(T a, T b) {
mults++;
return a * b;
}

#define bits(object) (sizeof(object)*CHAR_BIT)

template <class T>
T my_pow(T X, unsigned Y) {
std::vector<T> sq;

sq.reserve(bits(Y));

T current_square(X);
mults = 0;

while (Y) {
if (Y&1)
sq.push_back(current_square);
Y>>=1;
current_square *= current_square;
mults++;
}
T result = std::accumulate(sq.begin(), sq.end(), T(1), mult<T>);

return result;
}

#ifdef TEST

#include <iostream>
#include <cmath>

int main() {
unsigned max_mults = 0;
unsigned total_mults = 0;
unsigned numbers = 0;

for (double i=0; i<10; i++)
for (int j=0; j<300; j++) {
std::cout << i << "^" << j << "=" << my_pow(i, j);
std::cout << "\t" << mults << " multiplications\n";
if (std:

ow(i,j) != my_pow(i,j))
std::cout << "^Error!\n";
if (mults > max_mults)
max_mults = mults;
numbers ++;
total_mults += mults;
}
std::cout << "Maximum Multiplications: " << max_mults;
std::cout << "\nAverage Multipliations: " << (double)total_mults /
numbers;
return 0;
}

#endif

When I run this, I get a worst-case of 16 and an average of 11.23
multiplications per call. In every case, the result matches the standard
library exactly -- which gives a strong indication that the standard
library is implementing the code the same way. I'd expect to see at
least a few minor differences if the standard library were doing the job
by taking the logarithm and multiplying.

Chris Forone · Jan 23, 2008

Thaks a lot!

yours sincerely, chris

James Kanze · Jan 23, 2008

It may or may not be...

It's not, by far.

Surely it's guaranteed 100% accurate* - it's integer arithmetic.

Oops. I missed that; I read the line "Compilers do have
specialized pow() functions when the exponent is integer", and
didn't look at the actual function parameters.

Using this function if the base is a floating point type, of
course, will result in very poor accuracy.

Version of Pythons eval() function in Java	0	Jan 15, 2023
Productivity Power Tools 2017/2019 extension stops copy operation in Visual Studio 2017.	1	Feb 13, 2022
ACS37800 SPI Interface	0	Nov 19, 2022
Struct Member Variables Problem	0	Jun 21, 2023
pipeline and low power relationship	1	Feb 18, 2014
Hover state on an element stuttering when I'm close to the edge, or move my mouse really fast	1	Feb 2, 2023
How to host data visualization beginner friendly?	1	Aug 10, 2023
Desmos is insanely fast	8	Jul 12, 2023

fast power function

Chris Forone

Tomás Ó hÉilidhe

Juha Nieminen

Tomás Ó hÉilidhe

Pete Becker

Juha Nieminen

Jerry Coffin

Jerry Coffin

Juha Nieminen

James Kanze

Lionel B

Jerry Coffin

Jerry Coffin

Chris Forone

James Kanze

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads