fast power function

Discussion in 'C++' started by Chris Forone, Jan 21, 2008.

  1. Chris Forone

    Chris Forone Guest

    hello group,

    is there some reference implementation of a fast power function? how
    fast is the power function in <cmath>?

    thanks & hand, chris
     
    Chris Forone, Jan 21, 2008
    #1
    1. Advertising

  2. Chris Forone:

    > hello group,
    >
    > is there some reference implementation of a fast power function? how
    > fast is the power function in <cmath>?
    >
    > thanks & hand, chris



    Very probably the fastest method of calculating powers for that platform.


    --
    Tomás Ó hÉilidhe
     
    Tomás Ó hÉilidhe, Jan 21, 2008
    #2
    1. Advertising

  3. Tomás Ó hÉilidhe wrote:
    > Very probably the fastest method of calculating powers for that platform.


    Not necessarily in all cases.

    For example in intel architectures pow() is rather slow because
    there's no such FPU opcode in 387. We are probably talking about many
    hundreds, if not even over a thousand clock cycles even on a Pentium.
    While compilers may try to optimize the pow() call away if they can,
    they often can't.

    In some cases it may be faster to "open up" a calculation than
    calling pow(). For example, in many cases it may be slower to perform a
    "pow(x, 1.5)" than a "x*sqrt(x)" (many compilers are unable to optimize
    the former into the latter).

    But of course this is more related to architectures and compilers than
    to C++, and thus slightly off-topic.
     
    Juha Nieminen, Jan 21, 2008
    #3
  4. Would it not make sense for the standard library to have a pow function
    that deals only with integer exponents?

    unsigned pow_int(unsigned const x,unsigned exp)
    {
    unsigned retval = 1;

    while ( ; exp; --exp) retval *= x;

    return retval;
    }

    --
    Tomás Ó hÉilidhe
     
    Tomás Ó hÉilidhe, Jan 21, 2008
    #4
  5. Chris Forone

    Pete Becker Guest

    On 2008-01-21 11:25:39 -0500, "Tomás Ó hÉilidhe" <> said:

    >
    > Would it not make sense for the standard library to have a pow function
    > that deals only with integer exponents?
    >


    It has three: pow(float, int), pow(double, int), and pow(long double, int).

    --
    Pete
    Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
    Standard C++ Library Extensions: a Tutorial and Reference
    (www.petebecker.com/tr1book)
     
    Pete Becker, Jan 21, 2008
    #5
  6. Tomás Ó hÉilidhe wrote:
    > Would it not make sense for the standard library to have a pow function
    > that deals only with integer exponents?


    Compilers do have specialized pow() functions when the exponent is
    integer.

    > unsigned pow_int(unsigned const x,unsigned exp)
    > {
    > unsigned retval = 1;
    >
    > while ( ; exp; --exp) retval *= x;
    >
    > return retval;
    > }


    That's *not* the fastest way of calculating pow(x, integer).
     
    Juha Nieminen, Jan 21, 2008
    #6
  7. Chris Forone

    Jerry Coffin Guest

    In article <4794bdad$0$23838$>,
    lid says...

    [ ... ]

    > For example in intel architectures pow() is rather slow because
    > there's no such FPU opcode in 387. We are probably talking about many
    > hundreds, if not even over a thousand clock cycles even on a Pentium.
    > While compilers may try to optimize the pow() call away if they can,
    > they often can't.


    I don't know of any processor that would let you implement pow in a
    single instruction -- but Intel floating point includes instructions for
    logarithm and inverse logarithm, which make pow pretty easy to
    implement.

    As far as speed goes, you should be looking at about 150-190 CPU cycles
    on a reasonably modern CPU (depending somewhat on data). I haven't found
    any data for which it takes 200 cycles on my machine, and it's a few
    years old -- I believe current CPUs are typically at least 20-30% faster
    at the same clock speed.

    --
    Later,
    Jerry.

    The universe is a figment of its own imagination.
     
    Jerry Coffin, Jan 22, 2008
    #7
  8. Chris Forone

    Jerry Coffin Guest

    In article <fn1ra4$4qp$>, says...
    > hello group,
    >
    > is there some reference implementation of a fast power function? how
    > fast is the power function in <cmath>?


    The implementation in the library is likely to be oriented more toward
    being general purpose than the fastest for a given situation. You can
    probably do substantially better if (and only if) you know a fair amount
    about the data you're working with, so you can write something more
    specialized.

    --
    Later,
    Jerry.

    The universe is a figment of its own imagination.
     
    Jerry Coffin, Jan 22, 2008
    #8
  9. Jerry Coffin wrote:
    > I don't know of any processor that would let you implement pow in a
    > single instruction -- but Intel floating point includes instructions for
    > logarithm and inverse logarithm, which make pow pretty easy to
    > implement.


    Actually it's not "pretty easy", as there's a twist.

    pow(x, y) = pow(2, y*log2(x)), and the 387 happens to have both an
    opcode for calculating y*log2(x), fyl2x, and one for calculating
    pow(2, x)-1, f2xm1. The twist is, however, that with the latter x must
    be inside the range from -1.0 to +1.0. This means that the x in the
    original pow(x, y) cannot be used as it is with f2xm1, but it must be
    split using the property 2^(a+b) = 2^a*2^b.

    The complete asm code for calculating pow(x, y), assuming x and y have
    already been loaded into the FPU, is something like (with reported clock
    cycles for the pentium):

    fyl2x ; 22-111
    fld st ; 1
    frndint ; 9-20
    fsub st(1), st ; 3
    fxch st(2) ; 0-1
    f2xm1 ; 13-57
    fld1 ; 2
    fadd ; 3
    fscale ; 20-31
    fstp st(1) ; 1

    I suppose the absolute best case scenario is a total of 74 clock
    cycles, and the worst is 230 clock cycles, with possible stalls.

    I suppose I overestimated the required clock cycles.
     
    Juha Nieminen, Jan 22, 2008
    #9
  10. Chris Forone

    James Kanze Guest

    On Jan 21, 11:09 pm, Juha Nieminen <> wrote:
    > Tomás Ó hÉilidhe wrote:
    > > Would it not make sense for the standard library to have a pow function
    > > that deals only with integer exponents?


    > Compilers do have specialized pow() functions when the exponent is
    > integer.


    > > unsigned pow_int(unsigned const x,unsigned exp)
    > > {
    > > unsigned retval = 1;


    > > while ( ; exp; --exp) retval *= x;
    > > return retval;
    > > }


    > That's *not* the fastest way of calculating pow(x, integer).


    Nor the most accurate.

    --
    James Kanze (GABI Software) mailto:
    Conseils en informatique orient�e objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place S�mard, 78210 St.-Cyr-l'�cole, France, +33 (0)1 30 23 00 34
     
    James Kanze, Jan 22, 2008
    #10
  11. Chris Forone

    Lionel B Guest

    On Tue, 22 Jan 2008 05:50:47 -0800, James Kanze wrote:

    > On Jan 21, 11:09 pm, Juha Nieminen <> wrote:
    >> Tomás Ó hÉilidhe wrote:
    >> > Would it not make sense for the standard library to have a pow
    >> > function
    >> > that deals only with integer exponents?

    >
    >> Compilers do have specialized pow() functions when the exponent is
    >> integer.

    >
    >> > unsigned pow_int(unsigned const x,unsigned exp) {
    >> > unsigned retval = 1;

    >
    >> > while ( ; exp; --exp) retval *= x;
    >> > return retval;
    >> > }

    >
    >> That's *not* the fastest way of calculating pow(x, integer).


    It may or may not be...

    > Nor the most accurate.


    Surely it's guaranteed 100% accurate* - it's integer arithmetic.

    *provided retval doesn't overflow.

    --
    Lionel B
     
    Lionel B, Jan 22, 2008
    #11
  12. Chris Forone

    Jerry Coffin Guest

    In article <4795e4d9$0$14991$>,
    lid says...
    > Jerry Coffin wrote:
    > > I don't know of any processor that would let you implement pow in a
    > > single instruction -- but Intel floating point includes instructions for
    > > logarithm and inverse logarithm, which make pow pretty easy to
    > > implement.

    >
    > Actually it's not "pretty easy", as there's a twist.


    Actually, I'm well aware of the "twists". At least by my figuring, it
    still qualifies as "pretty easy".

    > pow(x, y) = pow(2, y*log2(x)), and the 387 happens to have both an
    > opcode for calculating y*log2(x), fyl2x, and one for calculating
    > pow(2, x)-1, f2xm1. The twist is, however, that with the latter x must
    > be inside the range from -1.0 to +1.0. This means that the x in the
    > original pow(x, y) cannot be used as it is with f2xm1, but it must be
    > split using the property 2^(a+b) = 2^a*2^b.


    Yes -- and that doesn't stop it from being pretty easy.

    > The complete asm code for calculating pow(x, y), assuming x and y have
    > already been loaded into the FPU, is something like (with reported clock
    > cycles for the pentium):
    >
    > fyl2x ; 22-111
    > fld st ; 1
    > frndint ; 9-20
    > fsub st(1), st ; 3
    > fxch st(2) ; 0-1
    > f2xm1 ; 13-57
    > fld1 ; 2
    > fadd ; 3
    > fscale ; 20-31
    > fstp st(1) ; 1


    That's certainly similar to the code I've always used:

    fyl2x
    fld st(0)
    frndint
    fld st(1)
    fsub st(0),st(1)
    f2xm1
    fld1
    faddp st(1),st(0)
    fxch st(1)
    fld1
    fscale
    fxch st(1)
    fstp st(0)
    fmulp st(1),st(0)

    > I suppose the absolute best case scenario is a total of 74 clock
    > cycles, and the worst is 230 clock cycles, with possible stalls.
    >
    > I suppose I overestimated the required clock cycles.


    Especially since you seem to be basing your clock counts on the Pentium,
    which is an _ancient_ processor. The Pentium was discontinued more than
    a decade ago. Even if you obtain your computers from other people's
    trash, you can something a lot newer by now.

    To put things in perspective, using the code above, it took about half
    again longer to load X and Y from memory into the FPU than it did to
    compute X^Y after they were loaded.

    --
    Later,
    Jerry.

    The universe is a figment of its own imagination.
     
    Jerry Coffin, Jan 22, 2008
    #12
  13. Chris Forone

    Jerry Coffin Guest

    In article <Xns9A2CA71CBAD72toelavabitcom@194.125.133.14>,
    says...
    >
    > Would it not make sense for the standard library to have a pow function
    > that deals only with integer exponents?


    It already has such overloads, though there's no guarantee that they
    work any differently than for floating point exponents.

    > unsigned pow_int(unsigned const x,unsigned exp)
    > {
    > unsigned retval = 1;
    >
    > while ( ; exp; --exp) retval *= x;
    >
    > return retval;
    > }


    This isn't a particularly good algorithm unless exp is _quite_ small
    (low double digits at most). For an N-bit exponent, this takes a maximum
    of 2^N-1 multiplications. Assuming values of exp were randomly
    distributed, it would take about 2^(N-1) multiplications on average.

    You can do a lot better than that, taking roughly 2N multiplications in
    the worst case, and something like 1.5N multiplications in the typical
    case (actually, it should be somewhat less than that, but I haven't
    bothered to figure up the exact number -- in any case, it's linear on
    the number of bits instead of exponential.

    Here's an implementation with some test code and instrumentation:

    #include <vector>
    #include <numeric>

    unsigned mults;

    template <class T>
    T mult(T a, T b) {
    mults++;
    return a * b;
    }

    #define bits(object) (sizeof(object)*CHAR_BIT)

    template <class T>
    T my_pow(T X, unsigned Y) {
    std::vector<T> sq;

    sq.reserve(bits(Y));

    T current_square(X);
    mults = 0;

    while (Y) {
    if (Y&1)
    sq.push_back(current_square);
    Y>>=1;
    current_square *= current_square;
    mults++;
    }
    T result = std::accumulate(sq.begin(), sq.end(), T(1), mult<T>);

    return result;
    }

    #ifdef TEST

    #include <iostream>
    #include <cmath>

    int main() {
    unsigned max_mults = 0;
    unsigned total_mults = 0;
    unsigned numbers = 0;

    for (double i=0; i<10; i++)
    for (int j=0; j<300; j++) {
    std::cout << i << "^" << j << "=" << my_pow(i, j);
    std::cout << "\t" << mults << " multiplications\n";
    if (std::pow(i,j) != my_pow(i,j))
    std::cout << "^Error!\n";
    if (mults > max_mults)
    max_mults = mults;
    numbers ++;
    total_mults += mults;
    }
    std::cout << "Maximum Multiplications: " << max_mults;
    std::cout << "\nAverage Multipliations: " << (double)total_mults /
    numbers;
    return 0;
    }

    #endif

    When I run this, I get a worst-case of 16 and an average of 11.23
    multiplications per call. In every case, the result matches the standard
    library exactly -- which gives a strong indication that the standard
    library is implementing the code the same way. I'd expect to see at
    least a few minor differences if the standard library were doing the job
    by taking the logarithm and multiplying.

    --
    Later,
    Jerry.

    The universe is a figment of its own imagination.
     
    Jerry Coffin, Jan 22, 2008
    #13
  14. Chris Forone

    Chris Forone Guest

    Thaks a lot!

    yours sincerely, chris
     
    Chris Forone, Jan 23, 2008
    #14
  15. Chris Forone

    James Kanze Guest

    On Jan 22, 3:49 pm, Lionel B <> wrote:
    > On Tue, 22 Jan 2008 05:50:47 -0800, James Kanze wrote:
    > > On Jan 21, 11:09 pm, Juha Nieminen <> wrote:
    > >> Tomás Ó hÉilidhe wrote:
    > >> > Would it not make sense for the standard library to have a pow
    > >> > function
    > >> > that deals only with integer exponents?


    > >> Compilers do have specialized pow() functions when the exponent is
    > >> integer.


    > >> > unsigned pow_int(unsigned const x,unsigned exp) {
    > >> > unsigned retval = 1;


    > >> > while ( ; exp; --exp) retval *= x;
    > >> > return retval;
    > >> > }


    > >> That's *not* the fastest way of calculating pow(x, integer).


    > It may or may not be...


    It's not, by far.

    > > Nor the most accurate.


    > Surely it's guaranteed 100% accurate* - it's integer arithmetic.


    Oops. I missed that; I read the line "Compilers do have
    specialized pow() functions when the exponent is integer", and
    didn't look at the actual function parameters.

    Using this function if the base is a floating point type, of
    course, will result in very poor accuracy.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Jan 23, 2008
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    669
  2. Michele Simionato

    Python is darn fast (was: How fast is Python)

    Michele Simionato, Aug 23, 2003, in forum: Python
    Replies:
    13
    Views:
    567
  3. Juha Nieminen
    Replies:
    22
    Views:
    1,033
    Kai-Uwe Bux
    Oct 12, 2007
  4. Replies:
    8
    Views:
    371
    Mark Dickinson
    Apr 17, 2008
  5. Sean
    Replies:
    8
    Views:
    1,788
    Antoninus Twink
    Dec 9, 2009
Loading...

Share This Page