# Monte Carlo Method and pi

Discussion in 'Python' started by Karl Pech, Jul 8, 2004.

1. ### Karl PechGuest

Hi,

I got a task there I have to compute pi using the
Method above.

So I wrote the following program:
---
import random
import math

n = long(raw_input("Please enter the number of iterations: "))
sy = 0 # sum of the function-values

for i in range(0, n):
x = random.random() # coordinates
sy += math.sqrt(1-math.sqrt(x)) # calculate y and add to sy

print 4*sy/n # compute pi
---

Unfortunately, even for n = 2000000 the result is ~2,13... .
It converges very slow to pi and I don't know why.

Regards
Karl

Karl Pech, Jul 8, 2004

2. ### Nick SmallboneGuest

"Karl Pech" <> wrote in message
news:ccke8p\$c0h\$06\$-online.com...
> Hi,
>
>
> I got a task there I have to compute pi using the
> Method above.
>
> So I wrote the following program:
> ---
> import random
> import math
>
> n = long(raw_input("Please enter the number of iterations: "))
> sy = 0 # sum of the function-values
>
> for i in range(0, n):
> x = random.random() # coordinates
> sy += math.sqrt(1-math.sqrt(x)) # calculate y and add to sy

This line is wrong. I think it should be sy += math.sqrt(1 - x ** 2). Doing
that gives me ~3.1 for 1000 iterations.

Nick

Nick Smallbone, Jul 8, 2004

3. ### Jeff EplerGuest

Others spotted the problem with your implementation of the approximatoin
method.

You can greatly speed things up with numarray. It gets "3.141..." as
the approximation using n=2000000 in about 4 seconds on this sub-GHz
laptop. Of course, a method with faster convergence would get to 3.141
in a lot less than 4 seconds, even if written in pure Python.

Jeff

import numarray
import numarray.random_array

def approx_pi(n):
a = numarray.random_array.uniform(0, 1, n)
return 4 * numarray.sum(numarray.sqrt(1 - a**2)) / n

if __name__ == '__main__':
n = int(raw_input("Please enter the number of iterations: "))
print approx_pi(n)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFA7cbRJd01MZaTXX0RAoxMAJ4wCeN/z6C7LY7uXg21Y/TYYkpApACZAfBa
8w3R5vITWyFxLNBKG4dS10E=
=Ch/k
-----END PGP SIGNATURE-----

Jeff Epler, Jul 8, 2004
4. ### Paul RubinGuest

"Karl Pech" <> writes:
> I got a task there I have to compute pi using the
> Method above.

This sounds like a homework problem...
> for i in range(0, n):
> x = random.random() # coordinates
> sy += math.sqrt(1-math.sqrt(x)) # calculate y and add to sy
>
> print 4*sy/n # compute pi
> ---
>
> Unfortunately, even for n = 2000000 the result is ~2,13... .
> It converges very slow to pi and I don't know why.

Better make sure the formulas are correct, by asking yourself why the
simulation is supposed to work at all.

Paul Rubin, Jul 8, 2004
5. ### Karl PechGuest

Re: Monte Carlo Method and pi (thanks)

Thank you very much! It's working now!

Karl Pech, Jul 9, 2004
6. ### Dan ChristensenGuest

something like:

from random import random
from math import sqrt

def v1(n = 500000):
rand = random
sqr = sqrt
sm = 0
for i in xrange(n):
sm += sqr(1-rand()**2)
return 4*sm/n

This takes about 0.68s on a 2.66GHz P4 (using Python 2.4a) and
about 0.29s with psyco (using Python 2.3.4).

One of the great things about python is that it's possible (and fun)
to move loops into the C level, e.g. with this implementation:

def v2(n = 500000):
a = numarray.random_array.uniform(0, 1, n)
return 4 * numarray.sum(numarray.sqrt(1 - a**2)) / n

which clocks in at 0.16s with Python 2.4a. (psyco doesn't help.)

The problem with the numarray approach is that it consumes large
amounts of memory for no good reason. So, now that generator
expressions have arrived, the next thing to try is:

def v6(n = 500000):
rand = random
sqr = sqrt
sm = sum(sqr(1-rand()**2) for i in xrange(n))
return 4*sm/n

Unfortunately, this takes 0.64s, not much better than the most direct
method. (I don't have psyco installed for 2.4, so I haven't tried
that, but I wouldn't be surprised if it was worse than the direct
method with psyco.)

Maybe the reason is that the loop is still in python? I tried some
awkward ways of eliminating the for loop, e.g.

from itertools import *
from operator import pow, sub, add

def v4(n = 500000):
rand = random
pw = pow
sqr = sqrt
sm = sum(imap(lambda dummy: sqr(1-rand()**2), repeat(None, n)))
return 4*sm/n

def v3(n = 500000):
rand = random
pw = pow
sqr = sqrt
sm = sum(imap(sqr,
imap(sub, repeat(1),
imap(pw,
imap(lambda dummy: rand(), repeat(None, n)),
repeat(2)))))
return 4*sm/n

but they are both slow (0.88s and 0.97s respectively).

Is there a faster way to get Python to compute a sequence of repeated
operations like this and sum them up?

Is there a more direct way to make an iterator like this?

itrand = imap(lambda dummy: rand(), repeat(None, n))

Wouldn't it be great if one could write iterator code as concisely as
the numarray code, e.g.

sm = sum(sqrt(1-itrand**2))

i.e. if operations like +, - and ** were overloaded to handle
iterators producing numbers ("numiterators"?) Could a method like
this be fast?

Dan

Dan Christensen, Jul 11, 2004
7. ### Fernando PerezGuest

Dan Christensen wrote:

> Is there a faster way to get Python to compute a sequence of repeated
> operations like this and sum them up?

Since the killer here is the dynamic type checks on the tight loops, until we
get optional static typing in python you'll need C for this. Fortunately,
weave makes this a breeze. Here's your v1() along with a trivially C-fied
version called v2:

#########################################################################
import random
import math
import weave

def v1(n = 100000):
rand = random.random
sqrt = math.sqrt
sm = 0.0
for i in xrange(n):
sm += sqrt(1.0-rand()**2)
return 4.0*sm/n

def v2(n = 100000):
"""Implement v1 above using weave for the C call"""

support = "#include <stdlib.h>"

code = """
double sm;
float rnd;
srand(1); // seed random number generator
sm = 0.0;
for(int i=0;i<n;++i) {
rnd = rand()/(RAND_MAX+1.0);
sm += sqrt(1.0-rnd*rnd);
}
return_val = 4.0*sm/n;"""
return weave.inline(code,('n'),support_code=support)
#########################################################################

Some timing info:

In [74]: run montecarlo_pi.py

In [75]: genutils.timings 1,v1
-------> genutils.timings(1,v1)
Out[75]: (1.4000000000000057, 1.4000000000000057)

In [76]: genutils.timings 10,v2
-------> genutils.timings(10,v2)
Out[76]: (0.13999999999998636, 0.013999999999998635)

In [77]: __[1]/_[1]
Out[77]: 100.00000000001016

A factor of 100 improvement is not bad for a trivial amount of work weave is
truly a cool piece of machinery. Just so you trust the values are OK:

In [80]: v1()
Out[80]: 3.1409358710711448

In [81]: v2()
Out[81]: 3.141602852608508

In [82]: abs(_-__)
Out[82]: 0.00066698153736322041

In [83]: 1.0/sqrt(100000)
Out[83]: 0.003162277660168379

The difference is within MonteCarlo tolerances (the usual 1/sqrt(N)).

Note that the above isn't 100% fair, since I'm calling the stdlib rand, a
C-coded simple generator, while v1 calls python's random.py, a Python-coded
Wichmann-Hill one. But still, for tight numerical loops this is pretty typical
of weave's results. Combined with the fact that via blitz, weave gives you
full access to Numeric arrays, it's a bit of a dream come true.

Cheers,

f

Fernando Perez, Jul 11, 2004
8. ### Paul RubinGuest

Fernando Perez <> writes:
> Note that the above isn't 100% fair, since I'm calling the stdlib
> rand, a C-coded simple generator, while v1 calls python's random.py,
> a Python-coded Wichmann-Hill one.

random.random in python 2.2 and later calls the Mersenne Twister generator,
a very fast generator implemented in C.

Paul Rubin, Jul 12, 2004
9. ### Fernando PerezGuest

Paul Rubin wrote:

> Fernando Perez <> writes:
>> Note that the above isn't 100% fair, since I'm calling the stdlib
>> rand, a C-coded simple generator, while v1 calls python's random.py,
>> a Python-coded Wichmann-Hill one.

>
> random.random in python 2.2 and later calls the Mersenne Twister generator,
> a very fast generator implemented in C.

Sure?

[python]> ipython
Python 2.2.3 (#1, Oct 15 2003, 23:33:35)

IPython 0.6.1.cvs -- An enhanced Interactive Python.
? -> Introduction to IPython's features.
@magic -> Information about IPython's 'magic' @ functions.
help -> Python's own help system.
object? -> Details about 'object'. ?object also works, ?? prints more.

In [1]: import random

In [2]: random.random??
Type: instance method
Base Class: <type 'instance method'>
String Form: <bound method Random.random of <random.Random instance at
0xa166834>>
Namespace: Interactive
File: /usr/lib/python2.2/random.py
Definition: random.random(self)
Source:
def random(self):
"""Get the next random number in the range [0.0, 1.0)."""

# Wichman-Hill random number generator.
#
# Wichmann, B. A. & Hill, I. D. (1982)
# Algorithm AS 183:
# An efficient and portable pseudo-random number generator
# Applied Statistics 31 (1982) 188-190
#
# Correction to Algorithm AS 183
# Applied Statistics 33 (1984) 123
#
# McLeod, A. I. (1985)
# A remark on Algorithm AS 183
# Applied Statistics 34 (1985),198-200

# BEGIN CRITICAL SECTION
x, y, z = self._seed
x = (171 * x) % 30269
y = (172 * y) % 30307
z = (170 * z) % 30323
self._seed = x, y, z
# END CRITICAL SECTION

# Note: on a platform using IEEE-754 double arithmetic, this can
# never return 0.0 (asserted by Tim; proof too long for a comment).
return (x/30269.0 + y/30307.0 + z/30323.0) % 1.0

In [3]:

This looks strangely like python to me

You are correct under 2.3:

[python]> python2.3 `which ipython`
Python 2.3.3 (#1, Mar 6 2004, 22:39:33)

IPython 0.6.1.cvs -- An enhanced Interactive Python.
? -> Introduction to IPython's features.
@magic -> Information about IPython's 'magic' @ functions.
help -> Python's own help system.
object? -> Details about 'object'. ?object also works, ?? prints more.

In [1]: import random

In [2]: random.random??
Type: builtin_function_or_method
Base Class: <type 'builtin_function_or_method'>
String Form: <built-in method random of Random object at 0x907614c>
Namespace: Interactive
Docstring:
random() -> x in the interval [0, 1).

In [3]:

The fact that ipython fails to show source under 2.3 is an indication that the
method is loaded from a C extension.

But the tests I posted where done using 2.2.

Cheers,

f

ps. My main point was to illustrate weave's usage, so this doesn't really
matter.

Fernando Perez, Jul 12, 2004
10. ### Paul RubinGuest

Fernando Perez <> writes:
> > random.random in python 2.2 and later calls the Mersenne Twister generator,
> > a very fast generator implemented in C.

>
> Sure?
> Python 2.2.3 (#1, Oct 15 2003, 23:33:35)
> Source:
> def random(self):
> """Get the next random number in the range [0.0, 1.0)."""
>
> # Wichman-Hill random number generator.

Hmm, I guess MT only became the default in 2.3. Thanks.

Paul Rubin, Jul 12, 2004
11. ### Guest

Dan Christensen <> wrote in message news:<>...
> something like:
>
> from random import random
> from math import sqrt
>
> def v1(n = 500000):
> rand = random
> sqr = sqrt
> sm = 0
> for i in xrange(n):
> sm += sqr(1-rand()**2)
> return 4*sm/n
>
> This takes about 0.68s on a 2.66GHz P4 (using Python 2.4a) and
> about 0.29s with psyco (using Python 2.3.4).
>
> One of the great things about python is that it's possible (and fun)
> to move loops into the C level, e.g. with this implementation:
>
> def v2(n = 500000):
> a = numarray.random_array.uniform(0, 1, n)
> return 4 * numarray.sum(numarray.sqrt(1 - a**2)) / n
>
> which clocks in at 0.16s with Python 2.4a. (psyco doesn't help.)

For comparison, on a 2.8GHz P4 the Fortran code to compute pi using
simulation runs in about 0.09s (measured using timethis.exe on Windows
XP), REGARDLESS of whether the calcuation is done with a loop or an
array. If 5e6 instead of 5e5 random numbers are used, the time taken
is about 0.4s, and the scaling with #iterations is about linear beyond
that. Here are the codes, which were compiled with the -optimize:4
option using Compaq Visual Fortran 6.6c.

program xpi_sim
! compute pi with simulation, using an array
implicit none
integer, parameter :: n = 500000
real (kind=kind(1.0d0)) :: xx(n),pi
call random_seed()
call random_number(xx)
pi = 4*sum(sqrt(1-xx**2))/n
print*,pi
end program xpi_sim

program xpi_sim
! compute pi with simulation, using a loop
implicit none
integer :: i
integer, parameter :: n = 500000
real (kind=kind(1.0d0)) :: xx,pi
call random_seed()
do i=1,n
call random_number(xx)
pi = pi + sqrt(1-xx**2)
end do
pi = 4*pi/n
print*,pi
end program xpi_sim

, Jul 12, 2004
12. ### Tim HochbergGuest

wrote:
> Dan Christensen <> wrote in message news:<>...
>
>>something like:
>>
>>from random import random
>>from math import sqrt
>>
>>def v1(n = 500000):
>> rand = random
>> sqr = sqrt
>> sm = 0
>> for i in xrange(n):
>> sm += sqr(1-rand()**2)
>> return 4*sm/n
>>
>>This takes about 0.68s on a 2.66GHz P4 (using Python 2.4a) and
>>about 0.29s with psyco (using Python 2.3.4).

I can more than double the speed of this under psyco by replacing **2
with x*x. I have inside information that pow is extra slow on psyco(*)
plus, as I understand it, x*x is preferred over **2 anyway.

from math import sqrty
from random import random

def v3(n = 500000):
sm = 0
for i in range(n):
x = random()
sm += sqrt(1-x*x)
return 4*sm/n
psyco.bind(v3)

(*) Psyco's support for floating point operations is considerable slower
than its support for integer operations. The latter are optimized all
the way to assembly when possible, but the latter are, at best,
optimized into C function calls that perform the operation. Thus there
is always some function call overhead. Fixing this would require someone
who groks both the guts of Psyco and the intracies of x86 floating point
machine code to step forward. In the case of pow, I believe that there
is always callback into the python interpreter, so it's extra slow.

>>
>>One of the great things about python is that it's possible (and fun)
>>to move loops into the C level, e.g. with this implementation:
>>
>>def v2(n = 500000):
>> a = numarray.random_array.uniform(0, 1, n)
>> return 4 * numarray.sum(numarray.sqrt(1 - a**2)) / n
>>
>>which clocks in at 0.16s with Python 2.4a. (psyco doesn't help.)

The same trick (replacing a**2 with a*a) also double the speed of this
version.

def v4(n = 500000):
a = numarray.random_array.uniform(0, 1, n)
return 4 * numarray.sum(numarray.sqrt(1 - a*a)) / n

>
> For comparison, on a 2.8GHz P4 the Fortran code to compute pi using
> simulation runs in about 0.09s (measured using timethis.exe on Windows
> XP), REGARDLESS of whether the calcuation is done with a loop or an
> array. If 5e6 instead of 5e5 random numbers are used, the time taken
> is about 0.4s, and the scaling with #iterations is about linear beyond
> that. Here are the codes, which were compiled with the -optimize:4
> option using Compaq Visual Fortran 6.6c.

For yet more comparison, here are times on a 2.4 GHz P4, timed with
timeit (10 loops):

v1 0.36123797857
v2 0.289118589094
v3 0.158556464579
v4 0.148470238536

If one takes into the accout the speed difference of the two CPUs this
puts the both the numarray and psyco solutions within about 50% of the
Fortran solution, which I'm impressed by. Of course, the Fortran code
uses x**2 instead of x*x, so it too, might be sped by some tweaks.

-tim

Tim Hochberg, Jul 12, 2004
13. ### Guest

Tim Hochberg <> wrote:
> wrote:
>> Dan Christensen <> wrote in message news:<>...
>>

is
>>>something like:
>>>
>>>from random import random
>>>from math import sqrt
>>>
>>>def v1(n = 500000):
>>> rand = random
>>> sqr = sqrt
>>> sm = 0
>>> for i in xrange(n):
>>> sm += sqr(1-rand()**2)
>>> return 4*sm/n
>>>
>>>This takes about 0.68s on a 2.66GHz P4 (using Python 2.4a) and
>>>about 0.29s with psyco (using Python 2.3.4).

>
>I can more than double the speed of this under psyco by replacing **2
>with x*x. I have inside information that pow is extra slow on psyco(*)
>plus, as I understand it, x*x is preferred over **2 anyway.
>
>from math import sqrty
>from random import random
>
>def v3(n = 500000):
> sm = 0
> for i in range(n):
> x = random()
> sm += sqrt(1-x*x)
> return 4*sm/n
>psyco.bind(v3)
>
>
>(*) Psyco's support for floating point operations is considerable slower

>than its support for integer operations. The latter are optimized all
>the way to assembly when possible, but the latter are, at best,
>optimized into C function calls that perform the operation. Thus there
>is always some function call overhead. Fixing this would require someone

>who groks both the guts of Psyco and the intracies of x86 floating point

>machine code to step forward. In the case of pow, I believe that there
>is always callback into the python interpreter, so it's extra slow.
>
>>>
>>>One of the great things about python is that it's possible (and fun)
>>>to move loops into the C level, e.g. with this implementation:
>>>
>>>def v2(n = 500000):
>>> a = numarray.random_array.uniform(0, 1, n)
>>> return 4 * numarray.sum(numarray.sqrt(1 - a**2)) / n
>>>
>>>which clocks in at 0.16s with Python 2.4a. (psyco doesn't help.)

>
>The same trick (replacing a**2 with a*a) also double the speed of this
>version.
>
>def v4(n = 500000):
> a = numarray.random_array.uniform(0, 1, n)
> return 4 * numarray.sum(numarray.sqrt(1 - a*a)) / n
>
>>
>> For comparison, on a 2.8GHz P4 the Fortran code to compute pi using
>> simulation runs in about 0.09s (measured using timethis.exe on Windows
>> XP), REGARDLESS of whether the calcuation is done with a loop or an
>> array. If 5e6 instead of 5e5 random numbers are used, the time taken
>> is about 0.4s, and the scaling with #iterations is about linear beyond
>> that. Here are the codes, which were compiled with the -optimize:4
>> option using Compaq Visual Fortran 6.6c.

>
>For yet more comparison, here are times on a 2.4 GHz P4, timed with
>timeit (10 loops):
>
>v1 0.36123797857
>v2 0.289118589094
>v3 0.158556464579
>v4 0.148470238536
>
>If one takes into the accout the speed difference of the two CPUs this
>puts the both the numarray and psyco solutions within about 50% of the
>Fortran solution, which I'm impressed by. Of course, the Fortran code
>uses x**2 instead of x*x, so it too, might be sped by some tweaks.
>

With Compaq Visual Fortran, the speed with x**2 and x*x are the same. Recently
on comp.lang.fortran, a Fortran expert opined that a compiler that did not
optimize the calculation of integer powers to be as fast as the hand-coded
equivalent would not be taken seriously.

Having to write out integer powers in Python is awkward, although the performance
improvement may be worth it. The code x*x*x*x is less legible than x**4,
especially if 'x' is replaced 'longname[1,2,3]'.

Often, what you want to compute powers of is itself an expression. Is Python
with Psyco able to run code such as

z = sin(x)*sin(x)

as fast as

y = sin(x)
z = y*y ?

Does it make two calls to sin()? There is a performance hit for 'y = sin(x)*sin(x)'
when not using Psyco.

----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---

, Jul 13, 2004
14. ### Dan ChristensenGuest

Tim Hochberg <> writes:

> I can more than double the speed of this under psyco by replacing **2
> with x*x. I have inside information that pow is extra slow on psyco(*)

Thanks for pointing this out. This change sped up all of the
methods I've tried. It's too bad that Python does optimize this,
but I case pow could be redefined at runtime.

> (*) Psyco's support for floating point operations is considerable
> slower than its support for integer operations. The latter are
> optimized all the way to assembly when possible, but the latter are,
> at best, optimized into C function calls that perform the
> operation.

That's unfortunate.

> Thus there is always some function call overhead. Fixing
> this would require someone who groks both the guts of Psyco and the
> intracies of x86 floating point machine code to step forward.

If someone volunteers, that would be great. I don't know anything
about x86 floating point; is it a real mess?

> If one takes into the accout the speed difference of the two CPUs this
> puts the both the numarray and psyco solutions within about 50% of the
> Fortran solution, which I'm impressed by.

I've now run a bunch of timings of all the methods that have been
proposed (except that I used C instead of Fortran) on one machine,
a 2.66GHz P4. Here are the results, in seconds:

Python 2.3.4 Python 2.4a1

naive loop: 0.657765 0.589741
naive loop psyco: 0.159085
naive loop weave: 0.084898 *
numarray: 0.115775 **
imap lots: 1.359815 0.994505
imap fn: 0.979589 0.758308
custom gen: 0.841540 0.681277
gen expr: 0.694567

naive loop C: 0.040 *

Only one of the tests used psyco. I did run the others with psyco
too, but the times were about the same or slower.

For me, psyco was about four times slower than C, and numarray was
almost 3 times slower. I'm surprised by how close psyco and numarray
were in your runs, and how slow Fortran was in beliavsky's test.
My C program measures user+sys cpu time for the loop itself, but
not start-up time for the program. In any case, getting within
a factor of four of C, with a random number generator that is probably
better, is pretty good!

One really should compare the random number generators more carefully,
since they could take a significant fraction of the time. The lines
with one * use C's rand(). The line with ** uses numarray's random
array function. Does anyone know what random number generator is used
by numarray?

By the way, note how well Python 2.4 performs compared with 2.3.4.
(Psyco, weave, numarray not shown because I don't have them for 2.4.)

I'm still curious about whether it could be possible to get
really fast loops in Python using iterators and expressions like
sum(1 + it1 - 2 * it2), where it1 and it2 are iterators that produce
numbers. Could Python be clever enough to implement that as a
for loop in C with just two or three C function calls in the loop?

Dan

Here's the code I used:

-----pi.py-----
from random import random
from itertools import *
from math import sqrt
from operator import pow, sub, add

from timeTest import *

try:
import weave
haveWeave = True
except:
haveWeave = False

try:
import numarray
import numarray.random_array
haveNumarray = True
except:
haveNumarray = False

def v1(n = 500000):
"naive loop"
rand = random
sqr = sqrt
sm = 0
for i in range(n):
r = rand()
sm += sqr(1-r*r)
return 4*sm/n

def v1weave(n = 500000):
"naive loop weave"

support = "#include <stdlib.h>"

code = """
double sm;
float rnd;
srand(1); // seed random number generator
sm = 0.0;
for(int i=0;i<n;++i) {
rnd = rand()/(RAND_MAX+1.0);
sm += sqrt(1.0-rnd*rnd);
}
return_val = 4.0*sm/n;"""
return weave.inline(code,('n'),support_code=support)

if(haveNumarray):
def v2(n = 500000):
"numarray"
a = numarray.random_array.uniform(0, 1, n)
return 4 * numarray.sum(numarray.sqrt(1 - a*a)) / n

def v3(n = 500000):
"imap lots"
rand = random
pw = pow
sqr = sqrt
sq = lambda x: x*x
sm = sum(imap(sqr,
imap(sub, repeat(1),
imap(sq,
imap(lambda dummy: rand(), repeat(None, n))))))
return 4*sm/n

def v4(n = 500000):
"imap fn"
rand = random
pw = pow
sqr = sqrt

def fn(dummy):
r = rand()
return sqr(1-r*r)

sm = sum(imap(fn, repeat(None, n)))
return 4*sm/n

def custom(n):
rand = random
for i in xrange(n):
yield(sqrt(1-rand()**2))

def v5(n = 500000):
"custom gen"
sm = sum(custom(n))
return 4*sm/n

# The commented out psyco runs don't show significant improvement
# (or are slower).

if haveWeave:
v1weave()

timeTest(v1)
timeTestWithPsyco(v1)
if haveWeave:
timeTest(v1weave)
if haveNumarray:
timeTest(v2)
# timeTestWithPsyco(v2)
timeTest(v3)
#timeTestWithPsyco(v3)
timeTest(v4)
#timeTestWithPsyco(v4)
timeTest(v5)
#psyco.bind(custom)
#timeTestWithPsyco(v5)

-----pi2.py----- (only for Python2.4)
from random import random
from math import sqrt

from timeTest import *

# Even if this is not executed, it is parsed, so Python 2.3 complains.

def v6(n = 500000):
"gen expr"
rand = random
sqr = sqrt
sm = sum(sqr(1-rand()**2) for i in xrange(n))
return 4*sm/n

timeTest(v6)

-----timeTest.py-----
import time

try:
import psyco
havePsyco = True
except:
havePsyco = False

def timeTest(f):
s = time.time()
f()
print "%18s: %f" % (f.__doc__, time.time()-s)

def timeTestWithPsyco(f):
if havePsyco:
g = psyco.proxy(f)
g.__doc__ = f.__doc__ + " psyco"
g()
timeTest(g)

-----pi.c-----
#include <stdlib.h>

#include "/home/spin/C/spin_time.h"

int main()
{
double sm;
float rnd;
int i;
int n = 500000;
spin_time start, end;

spin_time_set(start);
srand(1); // seed random number generator
sm = 0.0;

for(i=0;i<n;++i) {
rnd = rand()/(RAND_MAX+1.0);
sm += sqrt(1.0-rnd*rnd);
}
spin_time_set(end);
printf("%f\n", 4.0*sm/n);
printf("%18s: %.3f\n", "naive loop C", spin_time_both(start, end));
}

-----spin_time.h-----
#include <time.h>
#include <sys/times.h>
#include <stdio.h>

#define FACTOR (1.0/100.0)

typedef struct {
clock_t real;
struct tms t;
} spin_time;

#define spin_time_set(st) st.real = times(&(st.t))

// floating point number of seconds:
#define spin_time_both(st_start, st_end) \
((st_end.t.tms_utime-st_start.t.tms_utime)*FACTOR + \
(st_end.t.tms_stime-st_start.t.tms_stime)*FACTOR)

Dan Christensen, Jul 13, 2004
15. ### Tim HochbergGuest

wrote:
> Tim Hochberg <> wrote:

[HACK]
>>If one takes into the accout the speed difference of the two CPUs this
>>puts the both the numarray and psyco solutions within about 50% of the
>>Fortran solution, which I'm impressed by. Of course, the Fortran code
>>uses x**2 instead of x*x, so it too, might be sped by some tweaks.
>>

>
>
> With Compaq Visual Fortran, the speed with x**2 and x*x are the same. Recently
> on comp.lang.fortran, a Fortran expert opined that a compiler that did not
> optimize the calculation of integer powers to be as fast as the hand-coded
> equivalent would not be taken seriously.

I suppose that's not suprising. In that case, I'm more impressed with
the psyco results than I was. Particularly since Psycos floating point
support is incomplete.

> Having to write out integer powers in Python is awkward, although the performance
> improvement may be worth it. The code x*x*x*x is less legible than x**4,
> especially if 'x' is replaced 'longname[1,2,3]'.

I think that this is something that Psyco could optimize. However,
currently psyco just punts on pow, passing it off to Python.

> Often, what you want to compute powers of is itself an expression. Is Python
> with Psyco able to run code such as
>
> z = sin(x)*sin(x)
>
> as fast as
>
> y = sin(x)
> z = y*y ?

Dunno. I'll check...

> Does it make two calls to sin()? There is a performance hit for 'y = sin(x)*sin(x)'
> when not using Psyco.

It appears that it does not make this optimization. I suppose that's not
suprising since my understanding is that Psyco generates really simple
machine code.

-tim

Tim Hochberg, Jul 13, 2004
16. ### Tim HochbergGuest

Dan Christensen wrote:

> Tim Hochberg <> writes:
>
>
>>I can more than double the speed of this under psyco by replacing **2
>>with x*x. I have inside information that pow is extra slow on psyco(*)

>
>
> Thanks for pointing this out. This change sped up all of the
> methods I've tried. It's too bad that Python does optimize this,
> but I case pow could be redefined at runtime.

>>(*) Psyco's support for floating point operations is considerable
>>slower than its support for integer operations. The latter are
>>optimized all the way to assembly when possible, but the latter are,
>>at best, optimized into C function calls that perform the
>>operation.

>
>
> That's unfortunate.

Yes. There's still no psyco support for complex numbers at all, either.
Meaning that they run at standard Python speed. However, I expect that
to change any day now.

>>Thus there is always some function call overhead. Fixing
>>this would require someone who groks both the guts of Psyco and the
>>intracies of x86 floating point machine code to step forward.

>
>
> If someone volunteers, that would be great. I don't know anything
> about x86 floating point; is it a real mess?

From what I understand yes, but I haven't looked at assembly, let alone
machine code for many, many years and I was never particularly adept at
doing anything with it.

>
>>If one takes into the account the speed difference of the two CPUs this
>>puts the both the numarray and psyco solutions within about 50% of the
>>Fortran solution, which I'm impressed by.

>
>
> I've now run a bunch of timings of all the methods that have been
> proposed (except that I used C instead of Fortran) on one machine,
> a 2.66GHz P4. Here are the results, in seconds:
>
> Python 2.3.4 Python 2.4a1
>
> naive loop: 0.657765 0.589741
> naive loop psyco: 0.159085
> naive loop weave: 0.084898 *
> numarray: 0.115775 **
> imap lots: 1.359815 0.994505
> imap fn: 0.979589 0.758308
> custom gen: 0.841540 0.681277
> gen expr: 0.694567
>
> naive loop C: 0.040 *
>
> Only one of the tests used psyco. I did run the others with psyco
> too, but the times were about the same or slower.

FYI, generators are something that is not currently supported by Psyco.
For a more complete list, see
http://psyco.sourceforge.net/psycoguide/unsupported.html. If you
bug^H^H^H, plead with Armin, maybe he'll implement them.

> For me, psyco was about four times slower than C, and numarray was
> almost 3 times slower. I'm surprised by how close psyco and numarray
> were in your runs, and how slow Fortran was in beliavsky's test.
> My C program measures user+sys cpu time for the loop itself, but
> not start-up time for the program. In any case, getting within
> a factor of four of C, with a random number generator that is probably
> better, is pretty good!

I'm using a somewhat old version of numarray, which could be part of it.

>
> One really should compare the random number generators more carefully,
> since they could take a significant fraction of the time. The lines
> with one * use C's rand(). The line with ** uses numarray's random
> array function. Does anyone know what random number generator is used
> by numarray?
>
> By the way, note how well Python 2.4 performs compared with 2.3.4.
> (Psyco, weave, numarray not shown because I don't have them for 2.4.)
>
> I'm still curious about whether it could be possible to get
> really fast loops in Python using iterators and expressions like
> sum(1 + it1 - 2 * it2), where it1 and it2 are iterators that produce
> numbers. Could Python be clever enough to implement that as a
> for loop in C with just two or three C function calls in the loop?

That would require new syntax, would it not? That seems unlikely. It
seems that in 2.4, sum(1+i1-2*i2 for (i1, i2) in izip(i1, i2)) would
work. It also seems that with sufficient work psyco could be made to
optimize that in the manner you suggest. I wouldn't hold your breath though.

-tim

>
> Dan
>
> Here's the code I used:
>
> -----pi.py-----
> from random import random
> from itertools import *
> from math import sqrt
> from operator import pow, sub, add
>
> from timeTest import *
>
> try:
> import weave
> haveWeave = True
> except:
> haveWeave = False
>
> try:
> import numarray
> import numarray.random_array
> haveNumarray = True
> except:
> haveNumarray = False
>
> def v1(n = 500000):
> "naive loop"
> rand = random
> sqr = sqrt
> sm = 0
> for i in range(n):
> r = rand()
> sm += sqr(1-r*r)
> return 4*sm/n
>
> def v1weave(n = 500000):
> "naive loop weave"
>
> support = "#include <stdlib.h>"
>
> code = """
> double sm;
> float rnd;
> srand(1); // seed random number generator
> sm = 0.0;
> for(int i=0;i<n;++i) {
> rnd = rand()/(RAND_MAX+1.0);
> sm += sqrt(1.0-rnd*rnd);
> }
> return_val = 4.0*sm/n;"""
> return weave.inline(code,('n'),support_code=support)
>
> if(haveNumarray):
> def v2(n = 500000):
> "numarray"
> a = numarray.random_array.uniform(0, 1, n)
> return 4 * numarray.sum(numarray.sqrt(1 - a*a)) / n
>
> def v3(n = 500000):
> "imap lots"
> rand = random
> pw = pow
> sqr = sqrt
> sq = lambda x: x*x
> sm = sum(imap(sqr,
> imap(sub, repeat(1),
> imap(sq,
> imap(lambda dummy: rand(), repeat(None, n))))))
> return 4*sm/n
>
> def v4(n = 500000):
> "imap fn"
> rand = random
> pw = pow
> sqr = sqrt
>
> def fn(dummy):
> r = rand()
> return sqr(1-r*r)
>
> sm = sum(imap(fn, repeat(None, n)))
> return 4*sm/n
>
> def custom(n):
> rand = random
> for i in xrange(n):
> yield(sqrt(1-rand()**2))
>
> def v5(n = 500000):
> "custom gen"
> sm = sum(custom(n))
> return 4*sm/n
>
> # The commented out psyco runs don't show significant improvement
> # (or are slower).
>
> if haveWeave:
> v1weave()
>
> timeTest(v1)
> timeTestWithPsyco(v1)
> if haveWeave:
> timeTest(v1weave)
> if haveNumarray:
> timeTest(v2)
> # timeTestWithPsyco(v2)
> timeTest(v3)
> #timeTestWithPsyco(v3)
> timeTest(v4)
> #timeTestWithPsyco(v4)
> timeTest(v5)
> #psyco.bind(custom)
> #timeTestWithPsyco(v5)
>
> -----pi2.py----- (only for Python2.4)
> from random import random
> from math import sqrt
>
> from timeTest import *
>
> # Even if this is not executed, it is parsed, so Python 2.3 complains.
>
> def v6(n = 500000):
> "gen expr"
> rand = random
> sqr = sqrt
> sm = sum(sqr(1-rand()**2) for i in xrange(n))
> return 4*sm/n
>
> timeTest(v6)
>
> -----timeTest.py-----
> import time
>
> try:
> import psyco
> havePsyco = True
> except:
> havePsyco = False
>
> def timeTest(f):
> s = time.time()
> f()
> print "%18s: %f" % (f.__doc__, time.time()-s)
>
> def timeTestWithPsyco(f):
> if havePsyco:
> g = psyco.proxy(f)
> g.__doc__ = f.__doc__ + " psyco"
> g()
> timeTest(g)
>
> -----pi.c-----
> #include <stdlib.h>
>
> #include "/home/spin/C/spin_time.h"
>
> int main()
> {
> double sm;
> float rnd;
> int i;
> int n = 500000;
> spin_time start, end;
>
> spin_time_set(start);
> srand(1); // seed random number generator
> sm = 0.0;
>
> for(i=0;i<n;++i) {
> rnd = rand()/(RAND_MAX+1.0);
> sm += sqrt(1.0-rnd*rnd);
> }
> spin_time_set(end);
> printf("%f\n", 4.0*sm/n);
> printf("%18s: %.3f\n", "naive loop C", spin_time_both(start, end));
> }
>
> -----spin_time.h-----
> #include <time.h>
> #include <sys/times.h>
> #include <stdio.h>
>
> #define FACTOR (1.0/100.0)
>
> typedef struct {
> clock_t real;
> struct tms t;
> } spin_time;
>
> #define spin_time_set(st) st.real = times(&(st.t))
>
> // floating point number of seconds:
> #define spin_time_both(st_start, st_end) \
> ((st_end.t.tms_utime-st_start.t.tms_utime)*FACTOR + \
> (st_end.t.tms_stime-st_start.t.tms_stime)*FACTOR)

Tim Hochberg, Jul 13, 2004
17. ### Paul RubinGuest

Tim Hochberg <> writes:
> > If someone volunteers, that would be great. I don't know anything
> > about x86 floating point; is it a real mess?

>
> From what I understand yes, but I haven't looked at assembly, let
> alone machine code for many, many years and I was never particularly
> adept at doing anything with it.

x86 floating point is a simple stack-based instruction set (like an HP
calculator), plus some hacks that let you improve efficiency by
shuffling the elements of the stack around in parallel with doing
other operations (requiring careful instruction scheduling to use
effectively), plus several completely different and incompatible
vector-register oriented instruction sets (SSE2, 3DNow!) that are only
on some processor models and are sort of a bag on the side of the x86.
So x86 floating point is a mess in the sense that generating really
optimal code on every cpu model requires understanding all this crap.
But if you just use the basic stack-based stuff, it's pretty simple to
generate code for, and that code will certainly outperform interpreted
code by a wide margin.

Paul Rubin, Jul 13, 2004
18. ### Peter HansenGuest

Tim Hochberg wrote:

> wrote:
>> Does it make two calls to sin()? There is a performance hit for 'y =
>> sin(x)*sin(x)'
>> when not using Psyco.

>
> It appears that it does not make this optimization. I suppose that's not
> suprising since my understanding is that Psyco generates really simple
> machine code.

I suppose technically, since this is Python, two consecutive calls
to sin(x) could easily return different values... It would be
insane to write code that did this, but the dynamic nature of
Python surely allows it and it's even possible such a dynamic
substitution would be of value (or even improve performance) in
some other situations.

-Peter

Peter Hansen, Jul 13, 2004
19. ### Guest

Dan Christensen <> wrote:

> naive loop C: 0.040 *
>
>Only one of the tests used psyco. I did run the others with psyco
>too, but the times were about the same or slower.
>
>For me, psyco was about four times slower than C, and numarray was
>almost 3 times slower. I'm surprised by how close psyco and numarray
>were in your runs, and how slow Fortran was in beliavsky's test.
>My C program measures user+sys cpu time for the loop itself, but
>not start-up time for the program.

When I insert timing code within the Fortran program, the measured time is
0.04 s, so the C and Fortran speeds are the same, although different random
number generators are used.

----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---

, Jul 13, 2004