# RE: strange exponentiation performance

Discussion in 'Python' started by Tim Peters, Nov 23, 2003.

1. ### Tim PetersGuest

[Jeff Davis]
> I was doing some thinking about exponentiation algorithms along with a
> friend, and I happened upon some interesting results. Particularly, I
> was able to outperform the ** operator in at least one case, with a
> recursive algorithm. This leads me to believe that perhaps the **
> operator should tune it's algorithm based on inputs or some such
> thing. Here is my data:

I'm taking the liberty of rewriting your functions to stop trying to squash
as much as they can on each line. It's important to do that if you want to
modify the code quickly when experimenting.

First you rediscovered the "left to right" binary exponentiation algorithm:

def h(b, e):
if e == 0:
return 1
if e == 1:
return b
t = h(b, e >> 1)
return t*t * h(b, e & 1)

When you tried to remove recursion from it, you rediscovered the "right to
left" binary exponentiation algorithm:

def f(b, e):
n = 1
while e > 0:
if e & 1:
n *= b
e >>= 1
b *= b
return n

That's close to what Python's builtin ** does, but is missing an important
speed trick (explained later). Note that h and f aren't really the *same*
algorithm! f squares the current value of b on every trip through the loop.
h does not. If you print out all the inputs to all the multiplications,
you'll see that they're not at all the same.

Finally (I'll skip g(), as it isn't interesting):

def o(b, e):
return b**e

> ...
> now, g() was blown out of the water, as expected, but the others were
> close enough for another test at a higher "e" value.
>
> >>> test(f,19,500000)

> 8.02366995811
> >>> test(h,19,500000)

> 3.66968798637
> >>> test(o,19,500000)

> 5.29517292976
> >>>

Your box is faster than mine. Here under 2.3.2 on Windows:

f 14.6648669941
h 6.53576912117
o 9.59630399437

> Now, that is the interesting part. How did ** not measure up to h()?

The left-to-right exponentiation algorithm can be faster than the
right-to-left one. The former is clumsier to code in C, though, and nobody
ever bothered to endure that pain in Python's implementation. If you search
Christian Tismer.

Neither algorithm is optimal for all inputs, BTW. Turns out optimal
exponentiation is a very hard problem; Knuth Volume 2 has an extensive
discussion of this ("Evaluation of Powers"); another algorithm based on
first factoring the exponent (expressing it as a product of primes) is
better "on average" than either binary method, but sometimes loses to them.

> It's also interesting that f(), which is supposed to be a more
> efficient version of h(), is lagging behind.

The biggest trick it's missing is that it *always* does

b *= b

even on the last trip through the loop. The value of b it computes on the
last trip is never used, and b has grown very large by that time so
computing b*b is *very* expensive then. Change f like so:

def f(b, e):
n = 1
while e > 0:
if e & 1:
n *= b
e >>= 1
if e: # new guard: only square b if the result will be used
b *= b
return n

and then it's essentially identical to the builtin **:

f 9.5959586986
h 6.54231046447
o 9.59677081413

> I would like help explaining the following:
> (1) How did my less-than-perfectly-optimized recursive algorithm win
> against the ** operator?

It used left-to-right instead of right-to-left, which on this pair of inputs
is a better approach.

> (2) How can I unwrap and optimize h()?

Make it work left-to-right instead of right-to-left. That's tedious to do.
You can't get the same sequence of multiplication inputs by going
right-to-left, so the code will have to be more complicated. Start by
finding the highest bit set in e. For example,

def q(b, e):
if e == 0:
return 1
if e == 1:
return b
e2, numbits = e, 0
while e2:
e2 >>= 1
numbits += 1
assert numbits >= 2
result = b
mask = 1L << (numbits - 2)
for dummy in range(numbits - 1):
result *= result
if e & mask:
result *= b
return result

That's bound to be a little faster than h on most inputs, because it also
optimizes away multiplications by 1 (well, unless b happens to be 1). Let's
see:

f 9.53526793946
o 9.53408287098
h 6.54592768903
q 6.11897031462

Yup, those matter, but not a whale of a lot. The savings in skipping
multiplications by 1 is proportional to the number of 0 bits in the
exponent. What *didn't* matter is whether it's recursive or iterative.

> From what I understand,recursion is never supposed to be the most
> efficient. I suspect there are some hidden inefficiencies in using
> while(), but I'd like to know the specifics.

Na, none of that matters. All these algorithms do a number of
multiplications proportional to log(e, 2), which is insignificantly tiny
compared to the magnitude of the result. Long-int multiplication is the
only thing that consumes significant time here, so the only thing that
matters to the time is the exact sequence of inputs fed to *. Even whether
you write the *driver* in Python or C or assembly language is insignificant
compared to that.

> If my algorithm h() is better, why can't ** use a quick test to change
> algorithms based on inputs? Or is mine better in all cases?

See Knuth <wink>.

> ...
> Also note that I'm not exactly an algorithms expert, I just happened
> upon these results while chatting with a friend.

You did good! The first couple things you tried only nick the surface of
what's known about this problem. You *could* spend a year learning
everything that's known about it. In the end, if you implemented all that,
you could easily end up with 1000x more code, which sometimes ran faster.
To become an algorithm expert, you should do that <wink>. In real life, a
crucial complementary skill is learning when to say "good enough", and move
on. That's indeed why Python's implementation settled for the
easy-to-code-in-C right-to-left binary exponentiation method. If that part
of Python had been written in Python instead, I would have used the
easy-to-code-in-Python recursive left-to-right method instead. Sticking to
easy-to-code methods also has good influence on minimizing the # of bugs, of
course.

Tim Peters, Nov 23, 2003

2. ### Jeff DavisGuest

> I'm taking the liberty of rewriting your functions to stop trying to squash
> as much as they can on each line. It's important to do that if you want to
> modify the code quickly when experimenting.

Fair enough.

>
> First you rediscovered the "left to right" binary exponentiation algorithm:
>

Interesting.

<snip>

> When you tried to remove recursion from it, you rediscovered the "right to
> left" binary exponentiation algorithm:

Interesting. I suspected it wasn't a perfect unwrapping of the recursion,
because I didn't take the time to examine it carefully. Classic case of a
wonderful idea I had at 3:30 in the morning

<snip>

> That's close to what Python's builtin ** does, but is missing an important
> speed trick (explained later). Note that h and f aren't really the *same*
> algorithm! f squares the current value of b on every trip through the loop.
> h does not. If you print out all the inputs to all the multiplications,
> you'll see that they're not at all the same.
>

Also interesting.

<snip>

>
> Your box is faster than mine. Here under 2.3.2 on Windows:
>

That means I win, right

<snip>

> exponentiation is a very hard problem; Knuth Volume 2 has an extensive
> discussion of this ("Evaluation of Powers"); another algorithm based on
> first factoring the exponent (expressing it as a product of primes) is
> better "on average" than either binary method, but sometimes loses to
> them.

Ahh... I'm on vol. 1. I saw that it would be a while before I made it to
page 100, so I actually loaned out vol. 2. I'll be getting that back now I
guess

<snip>

>
> def q(b, e):
> if e == 0:
> return 1
> if e == 1:
> return b
> e2, numbits = e, 0
> while e2:
> e2 >>= 1
> numbits += 1
> assert numbits >= 2
> result = b
> mask = 1L << (numbits - 2)
> for dummy in range(numbits - 1):
> result *= result
> if e & mask:
> result *= b
> mask >>= 1
> return result
>

Now I understand what you mean about the right-left and left-right
versions. I only had a vague understanding before.

> That's bound to be a little faster than h on most inputs, because it
> also optimizes away multiplications by 1 (well, unless b happens to be
> 1). Let's see:

Might as well eliminate the multiplication by 1, for some reason I
assumed that could be optimized out somehow by the computer.

<snip>
> Yup, those matter, but not a whale of a lot. The savings in

skipping
> multiplications by 1 is proportional to the number of 0 bits in the
> exponent. What *didn't* matter is whether it's recursive or iterative.

Yeah, I guess it's always the case: algorithm first, then optimize. I know
the rule, and I follow it when being paid (so as not to waste anyone's
money). When doing something more academic I always feel like I need to
break it down to the most simple operations so I understand what's going
on. To me, recursion hides what's going on to an extent, and I felt like I
didn't entirely understand the algorithm.

<snip>

>> If my algorithm h() is better, why can't ** use a quick test to change
>> algorithms based on inputs? Or is mine better in all cases?

>
> See Knuth <wink>.
>

Uh-oh, another problem that's more difficult than meets the eye. I'll be
back with more questions in a few years I guess

Thanks,
Jeff

Jeff Davis, Nov 24, 2003