more hand written integer pow() functions (LONG POST)

Discussion in 'C Programming' started by silly, Nov 2, 2003.

  1. silly

    silly Guest

    /* hello, I have some fairly naive queries here related to optimising code!
    I know the first answer is 'don't' but leave that to one side for the
    moment.

    1) I'm looking for constructive comments on the mul_a() and pow_a() below
    comments on style/clarity/portability/obvious efficiency issues are welcome
    - any better ways to write them without changing the algorithm.

    2) Comments on mul_b() and pow_b() below which attempt to optimise them.

    3) any better algorithms - code examples preferred,
    - I don't have access to knuth at the moment!

    4) I have assumed x and y are non negative integers. If x were allowed to be
    negative, should be no portability issues as there are no right shifts on x,
    right?

    Obviously for the multiply routines only, assume there isn't an in-built
    multiply
    but you can double, halve and use mod (%) with a 2. A bit artificial perhaps
    but
    I'm trying to understand when and where you might use shifts and ANDs, etc!

    Also assume pow(0,0)=1.

    Thanks very much in advance
    ---
    BTW its not a homework problem, if it were I'd go for a variation of

    double ipow(double x, int n)
    {
    return (!n)?1:(n<0)?ipow(1/x,-n):(n&1)?x*ipow(x,n-1):ipow(x*x,n/2);
    }

    from Mark Stephen on comp.lang.c.moderated a few years back!

    which would give:

    int kpow(int x, int n)
    {
    return (!n)?1:(n&1)?x*kpow(x,n-1):kpow(x*x,n/2);
    }

    Presumably with tail recursion this version is quite efficient?
    */

    /*--------------------start of code---------------*/
    /*Tested using Borland C++ 5.6.*/

    #include <stdio.h>
    /*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
    /* basic versions */
    /*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
    /* peasant multiply */
    int mul_a(int x, int y)
    {
    int t=0;

    for(;;)
    {
    if (y%2) t+=x; /* if y is odd... */
    if (!(y/=2))break; /* halve y and if result is zero... */
    x*=2; /* double x */
    }

    return t;
    }
    /*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
    /*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
    /* peasant power */
    int pow_a(int x, int y)
    {
    int t=1;

    for(;;)
    {
    if (y%2) t*=x; /* if y is odd... */
    if (!(y/=2))break; /* halve y and if result is zero... */
    x*=x; /* square x */
    }

    return t;
    }
    /*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
    /* better(?) versions */
    /*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
    /* peasant multiply - attempt to optimise */
    int mul_b(int x, int y)
    {
    int t=0;

    for(;;)
    {
    if (y&1) t+=x; /* if y is odd... */
    if (!(y>>=1))break; /* halve y and if result is zero... */
    x<<=1; /* double x */
    }

    return t;
    }
    /*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
    /*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
    /* peasant power - attempt to optimise */
    int pow_b(int x, int y)
    {
    int t=1;

    for(;;)
    {
    if (y&1) t*=x; /* if y is odd... */
    if (!(y>>=1))break; /* halve y and if result is zero... */
    x*=x; /* square x */
    }

    return t;
    }
    /*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
    /*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
    void main() {
    int n;

    putchar('\n');

    for( n =0; n<=10; n++)
    printf("%5d, %5d, %5d, %5d\n", n, n<<1, n>>1, n&1);

    putchar('\n');

    for( n =0; n<=5; n++)
    printf("%5d, %5d, %5d\n", n, mul_a(n,n), pow_a(n,n));

    putchar('\n');

    for( n =0; n<=5; n++)
    printf("%5d, %5d, %5d\n", n, mul_b(n,n), pow_b(n,n));

    putchar('\n');
    }
    /*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
    /*--------------------end of code-----------------*/
    silly, Nov 2, 2003
    #1
    1. Advertising

  2. Greetings.

    In article <>, silly wrote:
    > comments on style/clarity/portability/obvious efficiency issues are
    > welcome


    > void main() {

    ....
    > putchar('\n');
    > }


    For a start, the definition should be int main(void), and the function must
    return a value (e.g., return(EXIT_SUCCESS);).

    --
    _
    _V.-o Tristan Miller [en,(fr,de,ia)] >< Space is limited
    / |`-' -=-=-=-=-=-=-=-=-=-=-=-=-=-=-= <> In a haiku, so it's hard
    (7_\\ http://www.nothingisreal.com/ >< To finish what you
    Tristan Miller, Nov 2, 2003
    #2
    1. Advertising

  3. On Sun, 02 Nov 2003 13:14:18 +0000, silly wrote:

    > hello, I have some fairly naive queries here related to optimising code!


    I'm going to pay attention to the pow() functions and ignore the mul()
    functions.

    > /* peasant power */
    > int pow_a(int x, int y)
    > {
    > int t=1;
    >
    > for(;;)
    > {
    > if (y%2) t*=x; /* if y is odd... */
    > if (!(y/=2))break; /* halve y and if result is zero... */
    > x*=x; /* square x */
    > }
    >
    > return t;
    > }
    >
    > /* peasant power - attempt to optimise */
    > int pow_b(int x, int y)
    > {
    > int t=1;
    >
    > for(;;)
    > {
    > if (y&1) t*=x; /* if y is odd... */
    > if (!(y>>=1))break; /* halve y and if result is zero... */
    > x*=x; /* square x */
    > }
    >
    > return t;
    > }


    This code is the C equivalent of some IBM OS/360 code that Glen
    Herrmannsfeldt posted here last week.

    PLUS LD FACTOR,ONE LOAD FACTOR OF ONE IN FACTOR REG
    LOOP SRDL EXPN,1 SHIFT LOW BIT EXPN REG INTO ADDR REG
    LTR ADDR,ADDR TEST SIGN POS ADDR REG FOR MINUS BIT
    BC 10,JUMP IF SIGN BIT NOT MINUS,BRANCH TO JUMP
    MDR FACTOR,BASE MULTIPLY FACTOR REG BY BASE NO REG
    JUMP LTR EXPN,EXPN CHECK IF EXPONENT PLUS,MINUS,OR ZERO
    BC 8,NEXT IF EXPONENT NOW ZERO, BRANCH TO NEXT
    MDR BASE,BASE MULTIPLY BASE NO BY DOUBLING ITSELF
    BC 15,LOOP BRANCH TO LOOP TO TEST NEXT EXPN BIT

    In any case, you haven't done much optimization here. You have
    simply replaced arithmetic operators with equivalent (assuming
    positive values) bitwise operators. This sort of optimization is
    usually done by the compiler anyway and there's not much point
    worrying about it yourself.

    In this case, the replacement of y/=2 with y>>=1 really does make
    the function faster. Because the compiler can't assume that y > 0,
    it has to generate code for y/=2 that works properly for negative
    numbers, which adds a few instructions in the loop.

    If you add a check for negative y at the beginning of both
    functions:

    if (y < 0) return 0;

    then the compiler has enough information to optimize the y/=2 into
    the exact same code as y>>=1, theoretically at least. In practice
    the compiler I'm using doesn't perform the optimization even with
    that extra information.

    It is worthwhile to put the check in anyway so that the function
    doesn't return the wrong answer for negative exponents.

    I do have an optimization for this code. It's not guaranteed to
    make the code faster, but it does make it faster on my machine.

    > BTW its not a homework problem, if it were I'd go for a variation of
    >
    > double ipow(double x, int n)
    > {
    > return (!n)?1:(n<0)?ipow(1/x,-n):(n&1)?x*ipow(x,n-1):ipow(x*x,n/2);
    > }
    >
    > from Mark Stephen on comp.lang.c.moderated a few years back!
    > which would give:
    >
    > int kpow(int x, int n)
    > {
    > return (!n)?1:(n&1)?x*kpow(x,n-1):kpow(x*x,n/2);
    > }
    >
    > Presumably with tail recursion this version is quite efficient?


    You can't rely on tail recursion being optimized in C compilers,
    although it's not uncommon for there to be a switch that enables
    it.

    Did you try timing kpow() and your functions and see what is
    faster? On my system (Athlon XP 1800+, gcc 3.3.1) your
    functions are twice as fast as kpow(). In fact, pow_b() is
    slightly faster on my system than James Hu's approach that
    unrolled the inner loop.

    I came up with code pretty much exactly like your pow_b() a few
    days ago by translating the OS/360 assembler above into C. One
    small change then made it a bit faster. You'll be able to see the
    change easily in the code below.

    Here are several different functions and their execution times.
    The main() used for testing was:

    static volatile int dest;
    int main (void)
    {
    int y;

    for (y = 0; y < 16; ++y)
    {
    int x = 10000000; /* ten million */
    while (--x)
    dest = powfunction(3,y);
    }

    return 0;
    }

    The testing program, including the various pow functions was
    compiled with the following command line in gcc 3.3.1:

    $ gcc -Wall -W -O2 -fomit-frame-pointer -march=athlon -o ipows ipows.c -lm

    Now for the various functions and the results, starting with the slowest
    and moving to the fastest. The program was compiled replacing "powfunction"
    with each of the different contenders and then was run 3 times. The
    best time of the 3 trials is given below:

    1) Compact recursive version (ala Mark Stephen)

    static int kpow(int x, int n)
    {
    if (n < 0) return 0;
    return (!n)?1:(n&1)?x*kpow(x,n-1):kpow(x*x,n/2);
    }

    $ time ./ipows

    real 0m7.403s
    user 0m6.630s
    sys 0m0.030s

    2) Your pow_a()

    static int pow_a(int x, int y)
    {
    int t=1;

    if (y < 0) return 0;
    for(;;)
    {
    if (y%2) t*=x; /* if y is odd... */
    if (!(y/=2))break; /* halve y and if result is zero... */
    x*=x; /* square x */
    }

    return t;
    }

    $ time ./ipows

    real 0m4.395s
    user 0m3.920s
    sys 0m0.020s

    3) Unrolled version (ala James Hu)

    static signed char hbit[32] = {
    -1,0,1,1,2,2,2,2,3,3,3,3,3,3,3,3,
    4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4
    };

    static int hupow (int x, int n)
    {
    int t = 1;

    if (n < 0) return 0;
    if (n == 0) return 1;
    if (n == 1) return x;
    if (x == 0) return 0;

    switch (hbit[n]) {
    case 4: if (n & 1) t *= x; n >>= 1; x *= x;
    case 3: if (n & 1) t *= x; n >>= 1; x *= x;
    case 2: if (n & 1) t *= x; n >>= 1; x *= x;
    case 1: if (n & 1) t *= x; n >>= 1; x *= x;
    default: t *= x;
    }
    return t;
    }

    $ time ./ipows

    real 0m3.332s
    user 0m2.990s
    sys 0m0.010s

    4) Your pow_b()

    static int pow_b(int x, int y)
    {
    int t=1;

    if (y < 0) return 0;
    for(;;)
    {
    if (y&1) t*=x; /* if y is odd... */
    if (!(y>>=1))break; /* halve y and if result is zero... */
    x*=x; /* square x */
    }

    return t;
    }

    $ time ./ipows

    real 0m3.208s
    user 0m2.870s
    sys 0m0.020s

    5) My translation of the OS/360 code

    static int ibmpow (int x, int y)
    {
    int factor = 1;

    if (y < 0) return 0;

    if (y & 1) factor *= x;
    y >>= 1;

    while (y)
    {
    x *= x;
    if (y & 1) factor *= x;
    y >>= 1;
    }

    return factor;
    }

    $ time ./ipows

    real 0m2.845s
    user 0m2.480s
    sys 0m0.080s

    -Sheldon
    Sheldon Simms, Nov 2, 2003
    #3
  4. silly

    James Hu Guest

    Sheldon Simms <> wrote in message news:<>...
    > On Sun, 02 Nov 2003 13:14:18 +0000, silly wrote:
    > Did you try timing kpow() and your functions and see what is
    > faster? On my system (Athlon XP 1800+, gcc 3.3.1) your
    > functions are twice as fast as kpow(). In fact, pow_b() is
    > slightly faster on my system than James Hu's approach that
    > unrolled the inner loop.


    kpow() is not really tail recursion, so there is no way for
    the compiler to optimize it into a loop.

    My unrolled implementation would probably need a computed
    goto for the array to improve the time. But realize that
    the version you used as the basis for you experiment was
    trying to avoid implementation defined behavior.

    > I came up with code pretty much exactly like your pow_b() a few
    > days ago by translating the OS/360 assembler above into C. One
    > small change then made it a bit faster. You'll be able to see the
    > change easily in the code below.


    It was unclear to me why your small change should make any
    difference. Do you have a theory?

    Please compare your routine with the routine below based on the
    refinement suggested by Tim Woodall and the implementation by
    Eric Sosman:

    int tim_eric_pow(int x, unsigned n)
    {
    int t = 1;
    if (n == 0) return 1;
    while (n ^ 1) {
    n >>= 1; x *= x;
    }
    t = x;
    while (n >>= 1) {
    x *= x;
    if (n & 1) t *= x;
    }

    return t;
    }

    > 5) My translation of the OS/360 code
    >
    > static int ibmpow (int x, int y)
    > {
    > int factor = 1;
    >
    > if (y < 0) return 0;
    >
    > if (y & 1) factor *= x;
    > y >>= 1;
    >
    > while (y)
    > {
    > x *= x;
    > if (y & 1) factor *= x;
    > y >>= 1;
    > }
    >
    > return factor;
    > }


    The tail recursive form for this would be:

    static int r_ibmpow2(int x, int y, int factor)
    {
    if (y == 0) return factor;
    x *= x;
    if (y & 1) factor *= x;
    return r_ibmpow2(x, y >> 1, factor);
    }

    static int ibmpow2(int x, int y)
    {
    if (y < 0) return 0;
    if (y & 1) return r_ibmpow2(x, y, x);
    return r_ibmpow2(x, y, 1);
    }

    But, you would have to use -O3 to get gcc to remove the tail recursion.

    -- James
    James Hu, Nov 3, 2003
    #4
  5. silly

    Tom St Denis Guest

    "James Hu" <> wrote in message
    news:...
    > Sheldon Simms <> wrote in message

    news:<>...
    > > On Sun, 02 Nov 2003 13:14:18 +0000, silly wrote:
    > > Did you try timing kpow() and your functions and see what is
    > > faster? On my system (Athlon XP 1800+, gcc 3.3.1) your
    > > functions are twice as fast as kpow(). In fact, pow_b() is
    > > slightly faster on my system than James Hu's approach that
    > > unrolled the inner loop.

    >
    > kpow() is not really tail recursion, so there is no way for
    > the compiler to optimize it into a loop.


    Just inserting my two-bits...

    You guys may also want to try a sliding window approach. You're using a
    binary exponentiation which is fast but sliding windows are cooler. :)

    Tom
    Tom St Denis, Nov 3, 2003
    #5
  6. On Mon, 03 Nov 2003 02:40:05 -0800, James Hu wrote:

    > Sheldon Simms <> wrote in message news:<>...
    >
    > My unrolled implementation would probably need a computed
    > goto for the array to improve the time. But realize that
    > the version you used as the basis for you experiment was
    > trying to avoid implementation defined behavior.
    >
    >> I came up with code pretty much exactly like your pow_b() a few
    >> days ago by translating the OS/360 assembler above into C. One
    >> small change then made it a bit faster. You'll be able to see the
    >> change easily in the code below.

    >
    > It was unclear to me why your small change should make any
    > difference. Do you have a theory?


    I know why it's faster, but that comes from looking at the
    assembly code generated. I actually only made the change to
    avoid the infinite-loop-with-break construct that comes
    out of the direct translation of the OS/360 code. Here's the
    code again:

    static int ibmpow (int x, int y)
    {
    int factor = 1;
    if (y < 0) return 0;

    if (y & 1) factor *= x;
    y >>= 1;

    while (y)
    {
    x *= x;
    if (y & 1) factor *= x;
    y >>= 1;
    }

    return factor;
    }

    One reason it is faster is that the generated code avoids the
    first multiplication for odd y by replacing the if-statement
    before the loop with this:

    if (y & 1) factor = x;
    y >>= 1;

    But that's not everything. I tested by replacing the test loop
    with

    for (y = 1; y < 32; y <<= 1)

    And the ibmpow() version is still faster than pow_b().

    I can see that the generated code for the loop in ibmpow() is
    a bit better than that for pow_b(). That's interesting since
    the statements are the same, only rearranged.

    > Please compare your routine with the routine below based on the
    > refinement suggested by Tim Woodall and the implementation by
    > Eric Sosman:
    >
    > int tim_eric_pow(int x, unsigned n)
    > {
    > int t = 1;
    > if (n == 0) return 1;
    > while (n ^ 1) {
    > n >>= 1; x *= x;
    > }
    > t = x;
    > while (n >>= 1) {
    > x *= x;
    > if (n & 1) t *= x;
    > }
    >
    > return t;
    > }


    I must have missed that one in the thread. After small changes so
    that it conforms to the same interface as the others (signed n, check
    for n < 0), it's the fastest so far:

    $ time ./ipows

    real 0m2.032s
    user 0m1.980s
    sys 0m0.000s

    > The tail recursive form for this would be:

    ....
    > But, you would have to use -O3 to get gcc to remove the tail recursion.


    I avoided O3 because it inlined the power function, sometimes.

    -Sheldon
    Sheldon Simms, Nov 3, 2003
    #6
  7. silly

    James Hu Guest

    On 2003-11-03, Sheldon Simms <> wrote:
    > On Mon, 03 Nov 2003 02:40:05 -0800, James Hu wrote:
    >
    >> int tim_eric_pow(int x, unsigned n)
    >> {
    >> int t = 1;
    >> if (n == 0) return 1;
    >> while (n ^ 1) {
    >> n >>= 1; x *= x;
    >> }
    >> t = x;
    >> while (n >>= 1) {
    >> x *= x;
    >> if (n & 1) t *= x;
    >> }
    >>
    >> return t;
    >> }


    Doh! That's what I get for composing from memory. The first while loop
    needs to test for a leading 0 bit. The original by Eric was right:

    while (!(n&1)) {
    ...

    But I guess you can compare that with:

    while ((n&1)^1) {
    ...

    -- James
    James Hu, Nov 3, 2003
    #7
  8. silly

    Tim Woodall Guest

    (James Hu) wrote in message >
    > int tim_eric_pow(int x, unsigned n)
    > {
    > int t = 1;
    > if (n == 0) return 1;
    > while (n ^ 1) {

    ^^^^^
    What is this? Did I really write something like that? Apart from the
    fact that it is wrong, I HATE clever tricks like that even when they
    are right! Using xor to perform an XOR is fine, using it to test a bit
    is HORRIBLE! (YMMV :)

    > n >>= 1; x *= x;
    > }
    > t = x;
    > while (n >>= 1) {
    > x *= x;
    > if (n & 1) t *= x;
    > }
    >
    > return t;
    > }
    >


    Tim.
    Tim Woodall, Nov 3, 2003
    #8
  9. On Mon, 03 Nov 2003 11:31:21 -0600, James Hu wrote:

    > On 2003-11-03, Sheldon Simms <> wrote:
    >> On Mon, 03 Nov 2003 02:40:05 -0800, James Hu wrote:
    >>
    >>> int tim_eric_pow(int x, unsigned n)
    >>> {
    >>> int t = 1;
    >>> if (n == 0) return 1;
    >>> while (n ^ 1) {
    >>> n >>= 1; x *= x;
    >>> }
    >>> t = x;
    >>> while (n >>= 1) {
    >>> x *= x;
    >>> if (n & 1) t *= x;
    >>> }
    >>>
    >>> return t;
    >>> }

    >
    > Doh! That's what I get for composing from memory. The first while loop
    > needs to test for a leading 0 bit. The original by Eric was right:


    Here I am, a trusting soul, not bothering to make sure that the
    function actually works. So anyway the time I gave earlier is wrong.
    This (corrected) function is still the fastest, but just barely.

    > while (!(n&1)) {
    > ...


    $ time ./ipows

    real 0m2.431s
    user 0m2.400s
    sys 0m0.010s


    > But I guess you can compare that with:
    >
    > while ((n&1)^1) {
    > ...


    $ time ./ipows

    real 0m2.435s
    user 0m2.410s
    sys 0m0.000s
    Sheldon Simms, Nov 3, 2003
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Dan Pop
    Replies:
    0
    Views:
    1,169
    Dan Pop
    Jun 24, 2003
  2. Clueless Moron

    math.pow vs pow

    Clueless Moron, Nov 27, 2003, in forum: Python
    Replies:
    5
    Views:
    911
    John J. Lee
    Nov 28, 2003
  3. Michel Rouzic

    pow(2, 1/2) != pow(2, 0.5) problem

    Michel Rouzic, Jun 15, 2005, in forum: C Programming
    Replies:
    52
    Views:
    1,630
    Alan Balmer
    Jun 20, 2005
  4. Replies:
    22
    Views:
    3,207
    JosephKK
    Mar 23, 2009
  5. Suresh V
    Replies:
    5
    Views:
    3,685
    SaticCaster
    Jul 5, 2010
Loading...

Share This Page