# more hand written integer pow() functions (LONG POST)

Discussion in 'C Programming' started by silly, Nov 2, 2003.

1. ### sillyGuest

/* hello, I have some fairly naive queries here related to optimising code!
I know the first answer is 'don't' but leave that to one side for the
moment.

1) I'm looking for constructive comments on the mul_a() and pow_a() below
comments on style/clarity/portability/obvious efficiency issues are welcome
- any better ways to write them without changing the algorithm.

2) Comments on mul_b() and pow_b() below which attempt to optimise them.

3) any better algorithms - code examples preferred,

4) I have assumed x and y are non negative integers. If x were allowed to be
negative, should be no portability issues as there are no right shifts on x,
right?

Obviously for the multiply routines only, assume there isn't an in-built
multiply
but you can double, halve and use mod (%) with a 2. A bit artificial perhaps
but
I'm trying to understand when and where you might use shifts and ANDs, etc!

Also assume pow(0,0)=1.

---
BTW its not a homework problem, if it were I'd go for a variation of

double ipow(double x, int n)
{
return (!n)?1n<0)?ipow(1/x,-n)n&1)?x*ipow(x,n-1):ipow(x*x,n/2);
}

from Mark Stephen on comp.lang.c.moderated a few years back!

which would give:

int kpow(int x, int n)
{
return (!n)?1n&1)?x*kpow(x,n-1):kpow(x*x,n/2);
}

Presumably with tail recursion this version is quite efficient?
*/

/*--------------------start of code---------------*/
/*Tested using Borland C++ 5.6.*/

#include <stdio.h>
/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
/* basic versions */
/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
/* peasant multiply */
int mul_a(int x, int y)
{
int t=0;

for(;
{
if (y%2) t+=x; /* if y is odd... */
if (!(y/=2))break; /* halve y and if result is zero... */
x*=2; /* double x */
}

return t;
}
/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
/* peasant power */
int pow_a(int x, int y)
{
int t=1;

for(;
{
if (y%2) t*=x; /* if y is odd... */
if (!(y/=2))break; /* halve y and if result is zero... */
x*=x; /* square x */
}

return t;
}
/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
/* better(?) versions */
/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
/* peasant multiply - attempt to optimise */
int mul_b(int x, int y)
{
int t=0;

for(;
{
if (y&1) t+=x; /* if y is odd... */
if (!(y>>=1))break; /* halve y and if result is zero... */
x<<=1; /* double x */
}

return t;
}
/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
/* peasant power - attempt to optimise */
int pow_b(int x, int y)
{
int t=1;

for(;
{
if (y&1) t*=x; /* if y is odd... */
if (!(y>>=1))break; /* halve y and if result is zero... */
x*=x; /* square x */
}

return t;
}
/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
void main() {
int n;

putchar('\n');

for( n =0; n<=10; n++)
printf("%5d, %5d, %5d, %5d\n", n, n<<1, n>>1, n&1);

putchar('\n');

for( n =0; n<=5; n++)
printf("%5d, %5d, %5d\n", n, mul_a(n,n), pow_a(n,n));

putchar('\n');

for( n =0; n<=5; n++)
printf("%5d, %5d, %5d\n", n, mul_b(n,n), pow_b(n,n));

putchar('\n');
}
/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
/*--------------------end of code-----------------*/

silly, Nov 2, 2003

2. ### Tristan MillerGuest

Greetings.

In article <>, silly wrote:
> comments on style/clarity/portability/obvious efficiency issues are
> welcome

> void main() {

....
> putchar('\n');
> }

For a start, the definition should be int main(void), and the function must
return a value (e.g., return(EXIT_SUCCESS).

--
_
_V.-o Tristan Miller [en,(fr,de,ia)] >< Space is limited
/ |`-' -=-=-=-=-=-=-=-=-=-=-=-=-=-=-= <> In a haiku, so it's hard
(7_\\ http://www.nothingisreal.com/ >< To finish what you

Tristan Miller, Nov 2, 2003

3. ### Sheldon SimmsGuest

On Sun, 02 Nov 2003 13:14:18 +0000, silly wrote:

> hello, I have some fairly naive queries here related to optimising code!

I'm going to pay attention to the pow() functions and ignore the mul()
functions.

> /* peasant power */
> int pow_a(int x, int y)
> {
> int t=1;
>
> for(;
> {
> if (y%2) t*=x; /* if y is odd... */
> if (!(y/=2))break; /* halve y and if result is zero... */
> x*=x; /* square x */
> }
>
> return t;
> }
>
> /* peasant power - attempt to optimise */
> int pow_b(int x, int y)
> {
> int t=1;
>
> for(;
> {
> if (y&1) t*=x; /* if y is odd... */
> if (!(y>>=1))break; /* halve y and if result is zero... */
> x*=x; /* square x */
> }
>
> return t;
> }

This code is the C equivalent of some IBM OS/360 code that Glen
Herrmannsfeldt posted here last week.

PLUS LD FACTOR,ONE LOAD FACTOR OF ONE IN FACTOR REG
LOOP SRDL EXPN,1 SHIFT LOW BIT EXPN REG INTO ADDR REG
BC 10,JUMP IF SIGN BIT NOT MINUS,BRANCH TO JUMP
MDR FACTOR,BASE MULTIPLY FACTOR REG BY BASE NO REG
JUMP LTR EXPN,EXPN CHECK IF EXPONENT PLUS,MINUS,OR ZERO
BC 8,NEXT IF EXPONENT NOW ZERO, BRANCH TO NEXT
MDR BASE,BASE MULTIPLY BASE NO BY DOUBLING ITSELF
BC 15,LOOP BRANCH TO LOOP TO TEST NEXT EXPN BIT

In any case, you haven't done much optimization here. You have
simply replaced arithmetic operators with equivalent (assuming
positive values) bitwise operators. This sort of optimization is
usually done by the compiler anyway and there's not much point

In this case, the replacement of y/=2 with y>>=1 really does make
the function faster. Because the compiler can't assume that y > 0,
it has to generate code for y/=2 that works properly for negative
numbers, which adds a few instructions in the loop.

If you add a check for negative y at the beginning of both
functions:

if (y < 0) return 0;

then the compiler has enough information to optimize the y/=2 into
the exact same code as y>>=1, theoretically at least. In practice
the compiler I'm using doesn't perform the optimization even with
that extra information.

It is worthwhile to put the check in anyway so that the function
doesn't return the wrong answer for negative exponents.

I do have an optimization for this code. It's not guaranteed to
make the code faster, but it does make it faster on my machine.

> BTW its not a homework problem, if it were I'd go for a variation of
>
> double ipow(double x, int n)
> {
> return (!n)?1n<0)?ipow(1/x,-n)n&1)?x*ipow(x,n-1):ipow(x*x,n/2);
> }
>
> from Mark Stephen on comp.lang.c.moderated a few years back!
> which would give:
>
> int kpow(int x, int n)
> {
> return (!n)?1n&1)?x*kpow(x,n-1):kpow(x*x,n/2);
> }
>
> Presumably with tail recursion this version is quite efficient?

You can't rely on tail recursion being optimized in C compilers,
although it's not uncommon for there to be a switch that enables
it.

Did you try timing kpow() and your functions and see what is
faster? On my system (Athlon XP 1800+, gcc 3.3.1) your
functions are twice as fast as kpow(). In fact, pow_b() is
slightly faster on my system than James Hu's approach that
unrolled the inner loop.

I came up with code pretty much exactly like your pow_b() a few
days ago by translating the OS/360 assembler above into C. One
small change then made it a bit faster. You'll be able to see the
change easily in the code below.

Here are several different functions and their execution times.
The main() used for testing was:

static volatile int dest;
int main (void)
{
int y;

for (y = 0; y < 16; ++y)
{
int x = 10000000; /* ten million */
while (--x)
dest = powfunction(3,y);
}

return 0;
}

The testing program, including the various pow functions was
compiled with the following command line in gcc 3.3.1:

\$ gcc -Wall -W -O2 -fomit-frame-pointer -march=athlon -o ipows ipows.c -lm

Now for the various functions and the results, starting with the slowest
and moving to the fastest. The program was compiled replacing "powfunction"
with each of the different contenders and then was run 3 times. The
best time of the 3 trials is given below:

1) Compact recursive version (ala Mark Stephen)

static int kpow(int x, int n)
{
if (n < 0) return 0;
return (!n)?1n&1)?x*kpow(x,n-1):kpow(x*x,n/2);
}

\$ time ./ipows

real 0m7.403s
user 0m6.630s
sys 0m0.030s

static int pow_a(int x, int y)
{
int t=1;

if (y < 0) return 0;
for(;
{
if (y%2) t*=x; /* if y is odd... */
if (!(y/=2))break; /* halve y and if result is zero... */
x*=x; /* square x */
}

return t;
}

\$ time ./ipows

real 0m4.395s
user 0m3.920s
sys 0m0.020s

3) Unrolled version (ala James Hu)

static signed char hbit[32] = {
-1,0,1,1,2,2,2,2,3,3,3,3,3,3,3,3,
4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4
};

static int hupow (int x, int n)
{
int t = 1;

if (n < 0) return 0;
if (n == 0) return 1;
if (n == 1) return x;
if (x == 0) return 0;

switch (hbit[n]) {
case 4: if (n & 1) t *= x; n >>= 1; x *= x;
case 3: if (n & 1) t *= x; n >>= 1; x *= x;
case 2: if (n & 1) t *= x; n >>= 1; x *= x;
case 1: if (n & 1) t *= x; n >>= 1; x *= x;
default: t *= x;
}
return t;
}

\$ time ./ipows

real 0m3.332s
user 0m2.990s
sys 0m0.010s

static int pow_b(int x, int y)
{
int t=1;

if (y < 0) return 0;
for(;
{
if (y&1) t*=x; /* if y is odd... */
if (!(y>>=1))break; /* halve y and if result is zero... */
x*=x; /* square x */
}

return t;
}

\$ time ./ipows

real 0m3.208s
user 0m2.870s
sys 0m0.020s

5) My translation of the OS/360 code

static int ibmpow (int x, int y)
{
int factor = 1;

if (y < 0) return 0;

if (y & 1) factor *= x;
y >>= 1;

while (y)
{
x *= x;
if (y & 1) factor *= x;
y >>= 1;
}

return factor;
}

\$ time ./ipows

real 0m2.845s
user 0m2.480s
sys 0m0.080s

-Sheldon

Sheldon Simms, Nov 2, 2003
4. ### James HuGuest

Sheldon Simms <> wrote in message news:<>...
> On Sun, 02 Nov 2003 13:14:18 +0000, silly wrote:
> Did you try timing kpow() and your functions and see what is
> faster? On my system (Athlon XP 1800+, gcc 3.3.1) your
> functions are twice as fast as kpow(). In fact, pow_b() is
> slightly faster on my system than James Hu's approach that
> unrolled the inner loop.

kpow() is not really tail recursion, so there is no way for
the compiler to optimize it into a loop.

My unrolled implementation would probably need a computed
goto for the array to improve the time. But realize that
the version you used as the basis for you experiment was
trying to avoid implementation defined behavior.

> I came up with code pretty much exactly like your pow_b() a few
> days ago by translating the OS/360 assembler above into C. One
> small change then made it a bit faster. You'll be able to see the
> change easily in the code below.

It was unclear to me why your small change should make any
difference. Do you have a theory?

refinement suggested by Tim Woodall and the implementation by
Eric Sosman:

int tim_eric_pow(int x, unsigned n)
{
int t = 1;
if (n == 0) return 1;
while (n ^ 1) {
n >>= 1; x *= x;
}
t = x;
while (n >>= 1) {
x *= x;
if (n & 1) t *= x;
}

return t;
}

> 5) My translation of the OS/360 code
>
> static int ibmpow (int x, int y)
> {
> int factor = 1;
>
> if (y < 0) return 0;
>
> if (y & 1) factor *= x;
> y >>= 1;
>
> while (y)
> {
> x *= x;
> if (y & 1) factor *= x;
> y >>= 1;
> }
>
> return factor;
> }

The tail recursive form for this would be:

static int r_ibmpow2(int x, int y, int factor)
{
if (y == 0) return factor;
x *= x;
if (y & 1) factor *= x;
return r_ibmpow2(x, y >> 1, factor);
}

static int ibmpow2(int x, int y)
{
if (y < 0) return 0;
if (y & 1) return r_ibmpow2(x, y, x);
return r_ibmpow2(x, y, 1);
}

But, you would have to use -O3 to get gcc to remove the tail recursion.

-- James

James Hu, Nov 3, 2003
5. ### Tom St DenisGuest

"James Hu" <> wrote in message
news:...
> Sheldon Simms <> wrote in message

news:<>...
> > On Sun, 02 Nov 2003 13:14:18 +0000, silly wrote:
> > Did you try timing kpow() and your functions and see what is
> > faster? On my system (Athlon XP 1800+, gcc 3.3.1) your
> > functions are twice as fast as kpow(). In fact, pow_b() is
> > slightly faster on my system than James Hu's approach that
> > unrolled the inner loop.

>
> kpow() is not really tail recursion, so there is no way for
> the compiler to optimize it into a loop.

Just inserting my two-bits...

You guys may also want to try a sliding window approach. You're using a
binary exponentiation which is fast but sliding windows are cooler.

Tom

Tom St Denis, Nov 3, 2003
6. ### Sheldon SimmsGuest

On Mon, 03 Nov 2003 02:40:05 -0800, James Hu wrote:

> Sheldon Simms <> wrote in message news:<>...
>
> My unrolled implementation would probably need a computed
> goto for the array to improve the time. But realize that
> the version you used as the basis for you experiment was
> trying to avoid implementation defined behavior.
>
>> I came up with code pretty much exactly like your pow_b() a few
>> days ago by translating the OS/360 assembler above into C. One
>> small change then made it a bit faster. You'll be able to see the
>> change easily in the code below.

>
> It was unclear to me why your small change should make any
> difference. Do you have a theory?

I know why it's faster, but that comes from looking at the
assembly code generated. I actually only made the change to
avoid the infinite-loop-with-break construct that comes
out of the direct translation of the OS/360 code. Here's the
code again:

static int ibmpow (int x, int y)
{
int factor = 1;
if (y < 0) return 0;

if (y & 1) factor *= x;
y >>= 1;

while (y)
{
x *= x;
if (y & 1) factor *= x;
y >>= 1;
}

return factor;
}

One reason it is faster is that the generated code avoids the
first multiplication for odd y by replacing the if-statement
before the loop with this:

if (y & 1) factor = x;
y >>= 1;

But that's not everything. I tested by replacing the test loop
with

for (y = 1; y < 32; y <<= 1)

And the ibmpow() version is still faster than pow_b().

I can see that the generated code for the loop in ibmpow() is
a bit better than that for pow_b(). That's interesting since
the statements are the same, only rearranged.

> Please compare your routine with the routine below based on the
> refinement suggested by Tim Woodall and the implementation by
> Eric Sosman:
>
> int tim_eric_pow(int x, unsigned n)
> {
> int t = 1;
> if (n == 0) return 1;
> while (n ^ 1) {
> n >>= 1; x *= x;
> }
> t = x;
> while (n >>= 1) {
> x *= x;
> if (n & 1) t *= x;
> }
>
> return t;
> }

I must have missed that one in the thread. After small changes so
that it conforms to the same interface as the others (signed n, check
for n < 0), it's the fastest so far:

\$ time ./ipows

real 0m2.032s
user 0m1.980s
sys 0m0.000s

> The tail recursive form for this would be:

....
> But, you would have to use -O3 to get gcc to remove the tail recursion.

I avoided O3 because it inlined the power function, sometimes.

-Sheldon

Sheldon Simms, Nov 3, 2003
7. ### James HuGuest

On 2003-11-03, Sheldon Simms <> wrote:
> On Mon, 03 Nov 2003 02:40:05 -0800, James Hu wrote:
>
>> int tim_eric_pow(int x, unsigned n)
>> {
>> int t = 1;
>> if (n == 0) return 1;
>> while (n ^ 1) {
>> n >>= 1; x *= x;
>> }
>> t = x;
>> while (n >>= 1) {
>> x *= x;
>> if (n & 1) t *= x;
>> }
>>
>> return t;
>> }

Doh! That's what I get for composing from memory. The first while loop
needs to test for a leading 0 bit. The original by Eric was right:

while (!(n&1)) {
...

But I guess you can compare that with:

while ((n&1)^1) {
...

-- James

James Hu, Nov 3, 2003
8. ### Tim WoodallGuest

(James Hu) wrote in message >
> int tim_eric_pow(int x, unsigned n)
> {
> int t = 1;
> if (n == 0) return 1;
> while (n ^ 1) {

^^^^^
What is this? Did I really write something like that? Apart from the
fact that it is wrong, I HATE clever tricks like that even when they
are right! Using xor to perform an XOR is fine, using it to test a bit
is HORRIBLE! (YMMV

> n >>= 1; x *= x;
> }
> t = x;
> while (n >>= 1) {
> x *= x;
> if (n & 1) t *= x;
> }
>
> return t;
> }
>

Tim.

Tim Woodall, Nov 3, 2003
9. ### Sheldon SimmsGuest

On Mon, 03 Nov 2003 11:31:21 -0600, James Hu wrote:

> On 2003-11-03, Sheldon Simms <> wrote:
>> On Mon, 03 Nov 2003 02:40:05 -0800, James Hu wrote:
>>
>>> int tim_eric_pow(int x, unsigned n)
>>> {
>>> int t = 1;
>>> if (n == 0) return 1;
>>> while (n ^ 1) {
>>> n >>= 1; x *= x;
>>> }
>>> t = x;
>>> while (n >>= 1) {
>>> x *= x;
>>> if (n & 1) t *= x;
>>> }
>>>
>>> return t;
>>> }

>
> Doh! That's what I get for composing from memory. The first while loop
> needs to test for a leading 0 bit. The original by Eric was right:

Here I am, a trusting soul, not bothering to make sure that the
function actually works. So anyway the time I gave earlier is wrong.
This (corrected) function is still the fastest, but just barely.

> while (!(n&1)) {
> ...

\$ time ./ipows

real 0m2.431s
user 0m2.400s
sys 0m0.010s

> But I guess you can compare that with:
>
> while ((n&1)^1) {
> ...

\$ time ./ipows

real 0m2.435s
user 0m2.410s
sys 0m0.000s

Sheldon Simms, Nov 3, 2003