Confusing math problem

S

Schizoid Man

Hi there,

I run the following code in Python 3.3.0 (on a Windows 7 machine) and Python
2.7.3 on a Mac and I get two different results:

result1 = []
result2 = []
for a in range(2,101):
for b in range(2,101):
result1.append(math.pow(a,b))
result2.append(a**b)
result1 = list(set(result1))
result2 = list(set(result2))
print (len(result1))
print (len(result2))

On the Windows box, I get 9183 for on both lines. However, on the Mac I get
9220 and 9183. Why this difference? Is there some sort of precision subtlety
I'm missing between ** and math.pow()?

Thank you.
 
D

Dave Angel

Hi there,

I run the following code in Python 3.3.0 (on a Windows 7 machine) and
Python 2.7.3 on a Mac and I get two different results:

result1 = []
result2 = []
for a in range(2,101):
for b in range(2,101):
result1.append(math.pow(a,b))
result2.append(a**b)
result1 = list(set(result1))
result2 = list(set(result2))
print (len(result1))
print (len(result2))

On the Windows box, I get 9183 for on both lines. However, on the Mac I
get 9220 and 9183. Why this difference? Is there some sort of precision
subtlety I'm missing between ** and math.pow()?

Thank you.

I assumed this was some difference between Python 2.x and 3.x.
However, on my 2.7.3 on Linux, I also get 9183 both times.

However, there is an important inaccuracy in math.pow, because it uses
floats to do the work. If you have very large integers, that means some
of them won't be correct. The following are some examples for 2.7.3 on
Linux:

a b math.pow(a,b) a**b
3 34 1.66771816997e+16 16677181699666569
3 35 5.0031545099e+16 50031545098999707
....
5 23 1.19209289551e+16 11920928955078125

The built-in pow, on the other hand, seems to get identical answers for
all these cases. So use pow() instead of math.pow()

One other test:

diff = set(map(int, result1)).symmetric_difference(set(result2))
if diff:
print diff
print len(diff)

shows me a diff set of 15656 members. One such member:

13552527156068805425093160010874271392822265625000000000000000000000000000000000000000000000000000000000000000000L

Notice how using floats truncated lots of the digits in the value?
 
I

Ian Kelly

Hi there,

I run the following code in Python 3.3.0 (on a Windows 7 machine) and Python
2.7.3 on a Mac and I get two different results:

result1 = []
result2 = []
for a in range(2,101):
for b in range(2,101):
result1.append(math.pow(a,b))
result2.append(a**b)
result1 = list(set(result1))
result2 = list(set(result2))
print (len(result1))
print (len(result2))

On the Windows box, I get 9183 for on both lines. However, on the Mac I get
9220 and 9183. Why this difference? Is there some sort of precision subtlety
I'm missing between ** and math.pow()?

math.pow is basically a wrapper for the C standard pow function, which
operates on doubles. The difference you're seeing is probably a
difference in implementation in the platform's C library. The **
operator on the other hand is implemented separately as a Python
built-in and operates on any numeric data type.
 
D

Dave Angel

a b math.pow(a,b) a**b
3 34 1.66771816997e+16 16677181699666569
3 35 5.0031545099e+16 50031545098999707
...
5 23 1.19209289551e+16 11920928955078125

The built-in pow, on the other hand, seems to get identical answers for
all these cases. So use pow() instead of math.pow()

One other test:

diff = set(map(int, result1)).symmetric_difference(set(result2))
if diff:
print diff
print len(diff)

shows me a diff set of 15656 members. One such member:

13552527156068805425093160010874271392822265625000000000000000000000000000000000000000000000000000000000000000000L


Notice how using floats truncated lots of the digits in the value?

Sorry, I just rechecked, and that value is correct for 50**66 power.

However, if I do:

print 3**60, "\n", int(math.pow(3,60)), "\n", pow(3,60)


I get:

42391158275216203514294433201
42391158275216203520420085760
42391158275216203514294433201


and the middle one is the one that's wrong. You can tell by casting out
9's. The middle one gets 1 instead of zero, showing that it's NOT
divisible by 3.
 
C

Chris Angelico

However, if I do:

print 3**60, "\n", int(math.pow(3,60)), "\n", pow(3,60)


I get:

42391158275216203514294433201
42391158275216203520420085760
42391158275216203514294433201


and the middle one is the one that's wrong.

In theory, a float should hold the nearest representable value to the
exact result. Considering that only one operation is being performed,
there should be no accumulation of error. The integer results show a
small number (618) of collisions, eg 2**16 and 4**8; why should some
of those NOT collide when done with floating point? My initial thought
was "Oh, this is comparing floats for equality", but after one single
operation, that should be not a problem.
You can tell by casting out 9's. The middle one gets 1
instead of zero, showing that it's NOT divisible by 3.

Which I thought so cool and magical and awesome, until I started
exploring other bases and found that you could cast out F's in hex and
7's in octal... and you can cast out 1's in binary to find out if it's
a multiple of 1, too.

ChrisA
 
P

Peter Pearson

In theory, a float should hold the nearest representable value to the
exact result. Considering that only one operation is being performed,
there should be no accumulation of error. The integer results show a
small number (618) of collisions, eg 2**16 and 4**8; why should some
of those NOT collide when done with floating point? My initial thought
was "Oh, this is comparing floats for equality", but after one single
operation, that should be not a problem.

Does this help explain it?
0x88f924eeceeda80000000000L
0x88f924eeceeda7fe92e1f5b1L
 
C

Chris Angelico

Does this help explain it?

0x88f924eeceeda80000000000L
0x88f924eeceeda7fe92e1f5b1L

I understand how the inaccuracy works, but I'm expecting it to be as
consistent as Mr Grossmith's entertainments. It doesn't matter that
math.pow(3,60) != 3**60, but the number of collisions is different
when done with floats on the OP's Mac. Here's what I'm talking about:
{4.23911582752162e+28}

Note how, in each case, calculating three powers that have the same
real-number result gives a one-element set. Three to the sixtieth
power can't be perfectly rendered with a 53-bit mantissa, but it's
rendered the same way whichever route is used to calculate it.

ChrisA
 
D

Dave Angel

Note how, in each case, calculating three powers that have the same
real-number result gives a one-element set. Three to the sixtieth
power can't be perfectly rendered with a 53-bit mantissa, but it's
rendered the same way whichever route is used to calculate it.

But you don't know how the floating point math library (note, it's the
machine's C-library, not Python's that used) actually calculates that.

For example, if they were to calculate 2**64 by squaring the number 6
times, that's likely to give a different answer than multiplying by 2 63
times. And you don't know how the library does it. For any integer
power up to 128, you can do a combination of square and multiply so that
the total operations are never more than 13, more or less. But if you
then figure a = a*a and b = b/2, and do the same optimization, you
might not do them exactly in the same order, and therefore might not get
exactly the same answer.

Even if it's being done in the coprocessor inside the Pentium, we don't
have a documented algorithm for it. Professor Kahn helped with the
8087, but I know they've tweaked their algorithms over the years (as
well as repairing bugs). So it might not be a difference between Python
versions, nor between OS's, but between processor chips.
 
S

Schizoid Man

Dave Angel said:
On 02/21/2013 02:33 PM, Schizoid Man wrote:
However, there is an important inaccuracy in math.pow, because it uses
floats to do the work. If you have very large integers, that means some
of them won't be correct. The following are some examples for 2.7.3 on
Linux:

a b math.pow(a,b) a**b
3 34 1.66771816997e+16 16677181699666569
3 35 5.0031545099e+16 50031545098999707
...
5 23 1.19209289551e+16 11920928955078125

The built-in pow, on the other hand, seems to get identical answers for
all these cases. So use pow() instead of math.pow()

I see. I thought using the ** was shorthand for math.pow() and didn't think
that one would be integer operations and the other floats. I'm performing
some large integer arithmetic operations. I would normally do this my
writing my own multiplication class and storing results as strings, but a
friend suggested that I look at Python.

I ran this one example and was quite surprised at the difference, since 9183
is the correct answer.
One other test:

diff = set(map(int, result1)).symmetric_difference(set(result2))
if diff:
print diff
print len(diff)

shows me a diff set of 15656 members. One such member:

13552527156068805425093160010874271392822265625000000000000000000000000000000000000000000000000000000000000000000L

Notice how using floats truncated lots of the digits in the value?

I'm running this test now, but the Mac's fan has kicked in (it's a slightly
older machine) so might it let run through the night.

I appreciate the help.
 
S

Schizoid Man

First, are you aware that ** will return int (or sometimes long on
2.7.3), while math.pow() will return a float? That may tell you why
you're seeing differences. That said, though, I wasn't able to
replicate your result using 2.7.3 and 3.3.0 both on Windows - always
9183, indicating 618 of the powers are considered equal. But in
theory, at least, what you're seeing is that 37 of them compare
different in floating point on your Mac build. Something to consider:

print(set(result1)-set(result2))

No, I was aware to be honest. I thought ** was just short hand for
math.pow(). Since ** is the integer operation, I suppose ^ doesn't work as
an exponent function in Python?

I compared the difference and got a large blob of numbers. To make a proper
comparison I'll need to compare the base and exponent for which the numbers
are different rather than the numbers themselves. I'm following Dave's
suggestion of determining the symmetric difference of the sets.

Thanks for the help.
 
O

Oscar Benjamin

I see. I thought using the ** was shorthand for math.pow() and didn't think
that one would be integer operations and the other floats.

Then you want operator.pow:
9

math.pow is basically the (double)pow(double, double) function from
the underlying C library. operator.pow(a, b) is precisely the same as
a**b.
I'm performing
some large integer arithmetic operations. I would normally do this my
writing my own multiplication class and storing results as strings, but a
friend suggested that I look at Python.

There's no need to use strings if you're working with integers in
Python. The results with int (not float) will be exact and will not
overflow since Python's ints have unlimited range (unless your machine
runs out of memory but that only happens with *really* big integers).

If you want to do computations with non-integers and high precision,
take a look at the decimal and fractions modules.
http://docs.python.org/2/library/decimal.html
http://docs.python.org/2/library/fractions.html

There is also a good third party library, sympy, for more complicated
exact algebra:
http://sympy.org/en/index.html


Oscar
 
C

Chris Angelico

But you don't know how the floating point math library (note, it's the
machine's C-library, not Python's that used) actually calculates that.

For example, if they were to calculate 2**64 by squaring the number 6 times,
that's likely to give a different answer than multiplying by 2 63 times.
And you don't know how the library does it. For any integer power up to
128, you can do a combination of square and multiply so that the total
operations are never more than 13, more or less. But if you then figure a =
a*a and b = b/2, and do the same optimization, you might not do them
exactly in the same order, and therefore might not get exactly the same
answer.

Even if it's being done in the coprocessor inside the Pentium, we don't have
a documented algorithm for it. Professor Kahn helped with the 8087, but I
know they've tweaked their algorithms over the years (as well as repairing
bugs). So it might not be a difference between Python versions, nor between
OS's, but between processor chips.

I was under the impression that, on most modern FPUs, calculations
were done inside the FPU with more precision than the 53-bit that gets
stored. But in any case, I'd find it _extremely_ surprising if the
calculation actually resulted in something that wasn't one of the two
nearest possible representable values to the correct result. And I'd
call it a CPU/FPU bug.

Of course, as we know, Intel's *never* had an FPU bug before...

ChrisA
 
S

Schizoid Man

Dave Angel said:
One other test:

diff = set(map(int, result1)).symmetric_difference(set(result2))
if diff:
print diff
print len(diff)

shows me a diff set of 15656 members. One such member:

13552527156068805425093160010874271392822265625000000000000000000000000000000000000000000000000000000000000000000L

These are the results I got for len(diff): Windows machines: 15656, Mac:
15693. Someone mentioned something about processor algorithms on this thread
and I do have a x86 Mac, so in terms of processors it's a like-for-like
comparison.

I have a follow-up question: are ** and pow() identical?
 
S

Schizoid Man

Oscar Benjamin said:
Then you want operator.pow:

9

math.pow is basically the (double)pow(double, double) function from
the underlying C library. operator.pow(a, b) is precisely the same as
a**b.

So how is operator.pow() different from just pow()?
There's no need to use strings if you're working with integers in
Python. The results with int (not float) will be exact and will not
overflow since Python's ints have unlimited range (unless your machine
runs out of memory but that only happens with *really* big integers).

Yes, that's the idea. I haven't really used Python before, so was just
kicking the tyres a bit and got some odd results. This forum has been great
though.
If you want to do computations with non-integers and high precision,
take a look at the decimal and fractions modules.
http://docs.python.org/2/library/decimal.html
http://docs.python.org/2/library/fractions.html

There is also a good third party library, sympy, for more complicated
exact algebra:
http://sympy.org/en/index.html

Thanks a lot for the references, I'll definitely check them out.
 
O

Oscar Benjamin

So how is operator.pow() different from just pow()?

operator.pow(a, b) calls a.__pow__(b). This is also what a**b does. If
a and b are ints then the __pow__ method will create the appropriate
int exactly without overflow and return it. You can make your own
class and give it a __pow__ function that does whatever you like:
.... def __pow__(self, other):
.... print('__pow__ called, returning 3')
.... return 3
....3

operator.pow will call the method:
__pow__ called, returning 3
3

math.pow will not:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: a float is required

The math module essentially exposes the functions that would be
available in C after importing math.h. So when you call math.pow(a,
b), a and b are if possible converted to float (which corresponds to a
double in C) and then the result is computed and returned as a float.


Oscar
 
I

Ian Kelly

So how is operator.pow() different from just pow()?

math.pow() is a wrapper around the C library function.

** and operator.pow() are the same thing; the latter is just a
function version of the former.

The built-in pow() is a mutant version that takes either 2 or 3
arguments. With 2 arguments, pow(a, b) is again equivalent to a ** b.
With 3 arguments, pow(a, b, c) is equivalent to but more efficient
than a ** b % c, but a and b are restricted to integers.
 
C

Chris Angelico

No, I was aware to be honest. I thought ** was just short hand for
math.pow(). Since ** is the integer operation, I suppose ^ doesn't work as
an exponent function in Python?

^ is bitwise XOR, completely different.

ChrisA
 
D

Dave Angel

<snip>


No, I was aware to be honest. I thought ** was just short hand for
math.pow(). Since ** is the integer operation

It's an integer operation because you started with two ints. Unlike
math.pow, which converts to floats, whatever you feed it.
I compared the difference and got a large blob of numbers. To make a
proper comparison I'll need to compare the base and exponent for which
the numbers are different rather than the numbers themselves. I'm
following Dave's suggestion of determining the symmetric difference of
the sets.

But once you have a few that are just plain way off, it doesn't really
matter whether some others differ in the 17th place. All the other
discussion is interesting, but don't forget the main point, that trying
to represent large integers (over 17 or so digits) as floats is going to
lose precision. That can happen a number of ways, and math.pow is just
one of them.

Floats use a finite precision to store their value, and the radix is
binary, not decimal. So figuring where they start to lose precision is
tricky. If you're doing a calculation where all intermediate values are
integers, you're usually better off sticking with int/long.

There are many other kinds of constraints that come up in programming,
and Python usually has an answer for each. But in a machine of finite
size, and when we care at least a little about performance, we
frequently have to pick our algorithm, our set of functions, and our
data format carefully.

Someone else has mentioned the decimal package and the fractions. Each
of those has a lot to offer in specific situations. But none is a panacea.
 
S

Steven D'Aprano

Hi there,

I run the following code in Python 3.3.0 (on a Windows 7 machine) and
Python 2.7.3 on a Mac and I get two different results:

Others have already explained that math.pow and the ** exponentiation
operator are subtly different. However I wish to discuss your code:

result1 = []
result2 = []
for a in range(2,101):
for b in range(2,101):
result1.append(math.pow(a,b))
result2.append(a**b)
result1 = list(set(result1))
result2 = list(set(result2))
print (len(result1))
print (len(result2))

This is more simply written as:

result1 = set()
result2 = set()
for a in range(2, 101):
for b in range(2, 101):
result1.add(a**b)
result2.add(math.pow(a, b))

print(len(result1))
print(len(result2))

No need for the pointless conversion from list to set to list again, if
all you want is the number of unique values.

More interesting is to gather the pairs of values that differ:

results = []
for a in range(2, 101):
for b in range(2, 101):
if a**b != math.pow(a, b): results.append((a, b))

You will see that integer exponentiation and floating point
exponentiation are more frequently different than not.

(8105 of the calculations differ, 1696 of them are the same.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top