Monte Carlo Method and pi

Discussion in 'Python' started by Karl Pech, Jul 8, 2004.

  1. Karl Pech

    Karl Pech Guest

    Hi,


    I got a task there I have to compute pi using the
    Method above.

    So I wrote the following program:
    ---
    import random
    import math

    n = long(raw_input("Please enter the number of iterations: "))
    sy = 0 # sum of the function-values

    for i in range(0, n):
    x = random.random() # coordinates
    sy += math.sqrt(1-math.sqrt(x)) # calculate y and add to sy

    print 4*sy/n # compute pi
    ---

    Unfortunately, even for n = 2000000 the result is ~2,13... .
    It converges very slow to pi and I don't know why.

    Please help me!

    Regards
    Karl
    Karl Pech, Jul 8, 2004
    #1
    1. Advertising

  2. "Karl Pech" <> wrote in message
    news:ccke8p$c0h$06$-online.com...
    > Hi,
    >
    >
    > I got a task there I have to compute pi using the
    > Method above.
    >
    > So I wrote the following program:
    > ---
    > import random
    > import math
    >
    > n = long(raw_input("Please enter the number of iterations: "))
    > sy = 0 # sum of the function-values
    >
    > for i in range(0, n):
    > x = random.random() # coordinates
    > sy += math.sqrt(1-math.sqrt(x)) # calculate y and add to sy


    This line is wrong. I think it should be sy += math.sqrt(1 - x ** 2). Doing
    that gives me ~3.1 for 1000 iterations.

    Nick
    Nick Smallbone, Jul 8, 2004
    #2
    1. Advertising

  3. Karl Pech

    Jeff Epler Guest

    Others spotted the problem with your implementation of the approximatoin
    method.

    You can greatly speed things up with numarray. It gets "3.141..." as
    the approximation using n=2000000 in about 4 seconds on this sub-GHz
    laptop. Of course, a method with faster convergence would get to 3.141
    in a lot less than 4 seconds, even if written in pure Python.

    Jeff

    import numarray
    import numarray.random_array

    def approx_pi(n):
    a = numarray.random_array.uniform(0, 1, n)
    return 4 * numarray.sum(numarray.sqrt(1 - a**2)) / n

    if __name__ == '__main__':
    n = int(raw_input("Please enter the number of iterations: "))
    print approx_pi(n)

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.2.4 (GNU/Linux)

    iD8DBQFA7cbRJd01MZaTXX0RAoxMAJ4wCeN/z6C7LY7uXg21Y/TYYkpApACZAfBa
    8w3R5vITWyFxLNBKG4dS10E=
    =Ch/k
    -----END PGP SIGNATURE-----
    Jeff Epler, Jul 8, 2004
    #3
  4. Karl Pech

    Paul Rubin Guest

    "Karl Pech" <> writes:
    > I got a task there I have to compute pi using the
    > Method above.


    This sounds like a homework problem...
    > for i in range(0, n):
    > x = random.random() # coordinates
    > sy += math.sqrt(1-math.sqrt(x)) # calculate y and add to sy
    >
    > print 4*sy/n # compute pi
    > ---
    >
    > Unfortunately, even for n = 2000000 the result is ~2,13... .
    > It converges very slow to pi and I don't know why.


    Better make sure the formulas are correct, by asking yourself why the
    simulation is supposed to work at all.
    Paul Rubin, Jul 8, 2004
    #4
  5. Karl Pech

    Karl Pech Guest

    Re: Monte Carlo Method and pi (thanks)

    Thank you very much! It's working now! :)
    Karl Pech, Jul 9, 2004
    #5
  6. I got to thinking about this problem. The most direct implementation is
    something like:

    from random import random
    from math import sqrt

    def v1(n = 500000):
    rand = random
    sqr = sqrt
    sm = 0
    for i in xrange(n):
    sm += sqr(1-rand()**2)
    return 4*sm/n

    This takes about 0.68s on a 2.66GHz P4 (using Python 2.4a) and
    about 0.29s with psyco (using Python 2.3.4).

    One of the great things about python is that it's possible (and fun)
    to move loops into the C level, e.g. with this implementation:

    def v2(n = 500000):
    a = numarray.random_array.uniform(0, 1, n)
    return 4 * numarray.sum(numarray.sqrt(1 - a**2)) / n

    which clocks in at 0.16s with Python 2.4a. (psyco doesn't help.)

    The problem with the numarray approach is that it consumes large
    amounts of memory for no good reason. So, now that generator
    expressions have arrived, the next thing to try is:

    def v6(n = 500000):
    rand = random
    sqr = sqrt
    sm = sum(sqr(1-rand()**2) for i in xrange(n))
    return 4*sm/n

    Unfortunately, this takes 0.64s, not much better than the most direct
    method. (I don't have psyco installed for 2.4, so I haven't tried
    that, but I wouldn't be surprised if it was worse than the direct
    method with psyco.)

    Maybe the reason is that the loop is still in python? I tried some
    awkward ways of eliminating the for loop, e.g.

    from itertools import *
    from operator import pow, sub, add

    def v4(n = 500000):
    rand = random
    pw = pow
    sqr = sqrt
    sm = sum(imap(lambda dummy: sqr(1-rand()**2), repeat(None, n)))
    return 4*sm/n

    def v3(n = 500000):
    rand = random
    pw = pow
    sqr = sqrt
    sm = sum(imap(sqr,
    imap(sub, repeat(1),
    imap(pw,
    imap(lambda dummy: rand(), repeat(None, n)),
    repeat(2)))))
    return 4*sm/n

    but they are both slow (0.88s and 0.97s respectively).

    Is there a faster way to get Python to compute a sequence of repeated
    operations like this and sum them up?

    Is there a more direct way to make an iterator like this?

    itrand = imap(lambda dummy: rand(), repeat(None, n))

    Wouldn't it be great if one could write iterator code as concisely as
    the numarray code, e.g.

    sm = sum(sqrt(1-itrand**2))

    i.e. if operations like +, - and ** were overloaded to handle
    iterators producing numbers ("numiterators"?) Could a method like
    this be fast?

    Dan
    Dan Christensen, Jul 11, 2004
    #6
  7. Dan Christensen wrote:

    > Is there a faster way to get Python to compute a sequence of repeated
    > operations like this and sum them up?


    Since the killer here is the dynamic type checks on the tight loops, until we
    get optional static typing in python you'll need C for this. Fortunately,
    weave makes this a breeze. Here's your v1() along with a trivially C-fied
    version called v2:

    #########################################################################
    import random
    import math
    import weave

    def v1(n = 100000):
    rand = random.random
    sqrt = math.sqrt
    sm = 0.0
    for i in xrange(n):
    sm += sqrt(1.0-rand()**2)
    return 4.0*sm/n

    def v2(n = 100000):
    """Implement v1 above using weave for the C call"""

    support = "#include <stdlib.h>"

    code = """
    double sm;
    float rnd;
    srand(1); // seed random number generator
    sm = 0.0;
    for(int i=0;i<n;++i) {
    rnd = rand()/(RAND_MAX+1.0);
    sm += sqrt(1.0-rnd*rnd);
    }
    return_val = 4.0*sm/n;"""
    return weave.inline(code,('n'),support_code=support)
    #########################################################################

    Some timing info:

    In [74]: run montecarlo_pi.py

    In [75]: genutils.timings 1,v1
    -------> genutils.timings(1,v1)
    Out[75]: (1.4000000000000057, 1.4000000000000057)

    In [76]: genutils.timings 10,v2
    -------> genutils.timings(10,v2)
    Out[76]: (0.13999999999998636, 0.013999999999998635)

    In [77]: __[1]/_[1]
    Out[77]: 100.00000000001016

    A factor of 100 improvement is not bad for a trivial amount of work :) weave is
    truly a cool piece of machinery. Just so you trust the values are OK:

    In [80]: v1()
    Out[80]: 3.1409358710711448

    In [81]: v2()
    Out[81]: 3.141602852608508

    In [82]: abs(_-__)
    Out[82]: 0.00066698153736322041

    In [83]: 1.0/sqrt(100000)
    Out[83]: 0.003162277660168379

    The difference is within MonteCarlo tolerances (the usual 1/sqrt(N)).

    Note that the above isn't 100% fair, since I'm calling the stdlib rand, a
    C-coded simple generator, while v1 calls python's random.py, a Python-coded
    Wichmann-Hill one. But still, for tight numerical loops this is pretty typical
    of weave's results. Combined with the fact that via blitz, weave gives you
    full access to Numeric arrays, it's a bit of a dream come true.

    Cheers,

    f
    Fernando Perez, Jul 11, 2004
    #7
  8. Karl Pech

    Paul Rubin Guest

    Fernando Perez <> writes:
    > Note that the above isn't 100% fair, since I'm calling the stdlib
    > rand, a C-coded simple generator, while v1 calls python's random.py,
    > a Python-coded Wichmann-Hill one.


    random.random in python 2.2 and later calls the Mersenne Twister generator,
    a very fast generator implemented in C.
    Paul Rubin, Jul 12, 2004
    #8
  9. Paul Rubin wrote:

    > Fernando Perez <> writes:
    >> Note that the above isn't 100% fair, since I'm calling the stdlib
    >> rand, a C-coded simple generator, while v1 calls python's random.py,
    >> a Python-coded Wichmann-Hill one.

    >
    > random.random in python 2.2 and later calls the Mersenne Twister generator,
    > a very fast generator implemented in C.


    Sure?

    [python]> ipython
    Python 2.2.3 (#1, Oct 15 2003, 23:33:35)
    Type "copyright", "credits" or "license" for more information.

    IPython 0.6.1.cvs -- An enhanced Interactive Python.
    ? -> Introduction to IPython's features.
    @magic -> Information about IPython's 'magic' @ functions.
    help -> Python's own help system.
    object? -> Details about 'object'. ?object also works, ?? prints more.

    In [1]: import random

    In [2]: random.random??
    Type: instance method
    Base Class: <type 'instance method'>
    String Form: <bound method Random.random of <random.Random instance at
    0xa166834>>
    Namespace: Interactive
    File: /usr/lib/python2.2/random.py
    Definition: random.random(self)
    Source:
    def random(self):
    """Get the next random number in the range [0.0, 1.0)."""

    # Wichman-Hill random number generator.
    #
    # Wichmann, B. A. & Hill, I. D. (1982)
    # Algorithm AS 183:
    # An efficient and portable pseudo-random number generator
    # Applied Statistics 31 (1982) 188-190
    #
    # see also:
    # Correction to Algorithm AS 183
    # Applied Statistics 33 (1984) 123
    #
    # McLeod, A. I. (1985)
    # A remark on Algorithm AS 183
    # Applied Statistics 34 (1985),198-200

    # This part is thread-unsafe:
    # BEGIN CRITICAL SECTION
    x, y, z = self._seed
    x = (171 * x) % 30269
    y = (172 * y) % 30307
    z = (170 * z) % 30323
    self._seed = x, y, z
    # END CRITICAL SECTION

    # Note: on a platform using IEEE-754 double arithmetic, this can
    # never return 0.0 (asserted by Tim; proof too long for a comment).
    return (x/30269.0 + y/30307.0 + z/30323.0) % 1.0

    In [3]:

    This looks strangely like python to me :)

    You are correct under 2.3:

    [python]> python2.3 `which ipython`
    Python 2.3.3 (#1, Mar 6 2004, 22:39:33)
    Type "copyright", "credits" or "license" for more information.

    IPython 0.6.1.cvs -- An enhanced Interactive Python.
    ? -> Introduction to IPython's features.
    @magic -> Information about IPython's 'magic' @ functions.
    help -> Python's own help system.
    object? -> Details about 'object'. ?object also works, ?? prints more.

    In [1]: import random

    In [2]: random.random??
    Type: builtin_function_or_method
    Base Class: <type 'builtin_function_or_method'>
    String Form: <built-in method random of Random object at 0x907614c>
    Namespace: Interactive
    Docstring:
    random() -> x in the interval [0, 1).


    In [3]:

    The fact that ipython fails to show source under 2.3 is an indication that the
    method is loaded from a C extension.

    But the tests I posted where done using 2.2.

    Cheers,

    f

    ps. My main point was to illustrate weave's usage, so this doesn't really
    matter.
    Fernando Perez, Jul 12, 2004
    #9
  10. Karl Pech

    Paul Rubin Guest

    Fernando Perez <> writes:
    > > random.random in python 2.2 and later calls the Mersenne Twister generator,
    > > a very fast generator implemented in C.

    >
    > Sure?
    > Python 2.2.3 (#1, Oct 15 2003, 23:33:35)
    > Source:
    > def random(self):
    > """Get the next random number in the range [0.0, 1.0)."""
    >
    > # Wichman-Hill random number generator.


    Hmm, I guess MT only became the default in 2.3. Thanks.
    Paul Rubin, Jul 12, 2004
    #10
  11. Karl Pech

    Guest

    Dan Christensen <> wrote in message news:<>...
    > I got to thinking about this problem. The most direct implementation is
    > something like:
    >
    > from random import random
    > from math import sqrt
    >
    > def v1(n = 500000):
    > rand = random
    > sqr = sqrt
    > sm = 0
    > for i in xrange(n):
    > sm += sqr(1-rand()**2)
    > return 4*sm/n
    >
    > This takes about 0.68s on a 2.66GHz P4 (using Python 2.4a) and
    > about 0.29s with psyco (using Python 2.3.4).
    >
    > One of the great things about python is that it's possible (and fun)
    > to move loops into the C level, e.g. with this implementation:
    >
    > def v2(n = 500000):
    > a = numarray.random_array.uniform(0, 1, n)
    > return 4 * numarray.sum(numarray.sqrt(1 - a**2)) / n
    >
    > which clocks in at 0.16s with Python 2.4a. (psyco doesn't help.)


    For comparison, on a 2.8GHz P4 the Fortran code to compute pi using
    simulation runs in about 0.09s (measured using timethis.exe on Windows
    XP), REGARDLESS of whether the calcuation is done with a loop or an
    array. If 5e6 instead of 5e5 random numbers are used, the time taken
    is about 0.4s, and the scaling with #iterations is about linear beyond
    that. Here are the codes, which were compiled with the -optimize:4
    option using Compaq Visual Fortran 6.6c.

    program xpi_sim
    ! compute pi with simulation, using an array
    implicit none
    integer, parameter :: n = 500000
    real (kind=kind(1.0d0)) :: xx(n),pi
    call random_seed()
    call random_number(xx)
    pi = 4*sum(sqrt(1-xx**2))/n
    print*,pi
    end program xpi_sim

    program xpi_sim
    ! compute pi with simulation, using a loop
    implicit none
    integer :: i
    integer, parameter :: n = 500000
    real (kind=kind(1.0d0)) :: xx,pi
    call random_seed()
    do i=1,n
    call random_number(xx)
    pi = pi + sqrt(1-xx**2)
    end do
    pi = 4*pi/n
    print*,pi
    end program xpi_sim
    , Jul 12, 2004
    #11
  12. Karl Pech

    Tim Hochberg Guest

    wrote:
    > Dan Christensen <> wrote in message news:<>...
    >
    >>I got to thinking about this problem. The most direct implementation is
    >>something like:
    >>
    >>from random import random
    >>from math import sqrt
    >>
    >>def v1(n = 500000):
    >> rand = random
    >> sqr = sqrt
    >> sm = 0
    >> for i in xrange(n):
    >> sm += sqr(1-rand()**2)
    >> return 4*sm/n
    >>
    >>This takes about 0.68s on a 2.66GHz P4 (using Python 2.4a) and
    >>about 0.29s with psyco (using Python 2.3.4).


    I can more than double the speed of this under psyco by replacing **2
    with x*x. I have inside information that pow is extra slow on psyco(*)
    plus, as I understand it, x*x is preferred over **2 anyway.

    from math import sqrty
    from random import random

    def v3(n = 500000):
    sm = 0
    for i in range(n):
    x = random()
    sm += sqrt(1-x*x)
    return 4*sm/n
    psyco.bind(v3)


    (*) Psyco's support for floating point operations is considerable slower
    than its support for integer operations. The latter are optimized all
    the way to assembly when possible, but the latter are, at best,
    optimized into C function calls that perform the operation. Thus there
    is always some function call overhead. Fixing this would require someone
    who groks both the guts of Psyco and the intracies of x86 floating point
    machine code to step forward. In the case of pow, I believe that there
    is always callback into the python interpreter, so it's extra slow.

    >>
    >>One of the great things about python is that it's possible (and fun)
    >>to move loops into the C level, e.g. with this implementation:
    >>
    >>def v2(n = 500000):
    >> a = numarray.random_array.uniform(0, 1, n)
    >> return 4 * numarray.sum(numarray.sqrt(1 - a**2)) / n
    >>
    >>which clocks in at 0.16s with Python 2.4a. (psyco doesn't help.)


    The same trick (replacing a**2 with a*a) also double the speed of this
    version.

    def v4(n = 500000):
    a = numarray.random_array.uniform(0, 1, n)
    return 4 * numarray.sum(numarray.sqrt(1 - a*a)) / n

    >
    > For comparison, on a 2.8GHz P4 the Fortran code to compute pi using
    > simulation runs in about 0.09s (measured using timethis.exe on Windows
    > XP), REGARDLESS of whether the calcuation is done with a loop or an
    > array. If 5e6 instead of 5e5 random numbers are used, the time taken
    > is about 0.4s, and the scaling with #iterations is about linear beyond
    > that. Here are the codes, which were compiled with the -optimize:4
    > option using Compaq Visual Fortran 6.6c.


    For yet more comparison, here are times on a 2.4 GHz P4, timed with
    timeit (10 loops):

    v1 0.36123797857
    v2 0.289118589094
    v3 0.158556464579
    v4 0.148470238536

    If one takes into the accout the speed difference of the two CPUs this
    puts the both the numarray and psyco solutions within about 50% of the
    Fortran solution, which I'm impressed by. Of course, the Fortran code
    uses x**2 instead of x*x, so it too, might be sped by some tweaks.

    -tim
    Tim Hochberg, Jul 12, 2004
    #12
  13. Karl Pech

    Guest

    Tim Hochberg <> wrote:
    > wrote:
    >> Dan Christensen <> wrote in message news:<>...
    >>
    >>>I got to thinking about this problem. The most direct implementation

    is
    >>>something like:
    >>>
    >>>from random import random
    >>>from math import sqrt
    >>>
    >>>def v1(n = 500000):
    >>> rand = random
    >>> sqr = sqrt
    >>> sm = 0
    >>> for i in xrange(n):
    >>> sm += sqr(1-rand()**2)
    >>> return 4*sm/n
    >>>
    >>>This takes about 0.68s on a 2.66GHz P4 (using Python 2.4a) and
    >>>about 0.29s with psyco (using Python 2.3.4).

    >
    >I can more than double the speed of this under psyco by replacing **2
    >with x*x. I have inside information that pow is extra slow on psyco(*)
    >plus, as I understand it, x*x is preferred over **2 anyway.
    >
    >from math import sqrty
    >from random import random
    >
    >def v3(n = 500000):
    > sm = 0
    > for i in range(n):
    > x = random()
    > sm += sqrt(1-x*x)
    > return 4*sm/n
    >psyco.bind(v3)
    >
    >
    >(*) Psyco's support for floating point operations is considerable slower


    >than its support for integer operations. The latter are optimized all
    >the way to assembly when possible, but the latter are, at best,
    >optimized into C function calls that perform the operation. Thus there
    >is always some function call overhead. Fixing this would require someone


    >who groks both the guts of Psyco and the intracies of x86 floating point


    >machine code to step forward. In the case of pow, I believe that there
    >is always callback into the python interpreter, so it's extra slow.
    >
    >>>
    >>>One of the great things about python is that it's possible (and fun)
    >>>to move loops into the C level, e.g. with this implementation:
    >>>
    >>>def v2(n = 500000):
    >>> a = numarray.random_array.uniform(0, 1, n)
    >>> return 4 * numarray.sum(numarray.sqrt(1 - a**2)) / n
    >>>
    >>>which clocks in at 0.16s with Python 2.4a. (psyco doesn't help.)

    >
    >The same trick (replacing a**2 with a*a) also double the speed of this
    >version.
    >
    >def v4(n = 500000):
    > a = numarray.random_array.uniform(0, 1, n)
    > return 4 * numarray.sum(numarray.sqrt(1 - a*a)) / n
    >
    >>
    >> For comparison, on a 2.8GHz P4 the Fortran code to compute pi using
    >> simulation runs in about 0.09s (measured using timethis.exe on Windows
    >> XP), REGARDLESS of whether the calcuation is done with a loop or an
    >> array. If 5e6 instead of 5e5 random numbers are used, the time taken
    >> is about 0.4s, and the scaling with #iterations is about linear beyond
    >> that. Here are the codes, which were compiled with the -optimize:4
    >> option using Compaq Visual Fortran 6.6c.

    >
    >For yet more comparison, here are times on a 2.4 GHz P4, timed with
    >timeit (10 loops):
    >
    >v1 0.36123797857
    >v2 0.289118589094
    >v3 0.158556464579
    >v4 0.148470238536
    >
    >If one takes into the accout the speed difference of the two CPUs this
    >puts the both the numarray and psyco solutions within about 50% of the
    >Fortran solution, which I'm impressed by. Of course, the Fortran code
    >uses x**2 instead of x*x, so it too, might be sped by some tweaks.
    >


    With Compaq Visual Fortran, the speed with x**2 and x*x are the same. Recently
    on comp.lang.fortran, a Fortran expert opined that a compiler that did not
    optimize the calculation of integer powers to be as fast as the hand-coded
    equivalent would not be taken seriously.

    Having to write out integer powers in Python is awkward, although the performance
    improvement may be worth it. The code x*x*x*x is less legible than x**4,
    especially if 'x' is replaced 'longname[1,2,3]'.

    Often, what you want to compute powers of is itself an expression. Is Python
    with Psyco able to run code such as

    z = sin(x)*sin(x)

    as fast as

    y = sin(x)
    z = y*y ?

    Does it make two calls to sin()? There is a performance hit for 'y = sin(x)*sin(x)'
    when not using Psyco.



    ----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
    http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
    ---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---
    , Jul 13, 2004
    #13
  14. Tim Hochberg <> writes:

    > I can more than double the speed of this under psyco by replacing **2
    > with x*x. I have inside information that pow is extra slow on psyco(*)


    Thanks for pointing this out. This change sped up all of the
    methods I've tried. It's too bad that Python does optimize this,
    but I case pow could be redefined at runtime.

    > (*) Psyco's support for floating point operations is considerable
    > slower than its support for integer operations. The latter are
    > optimized all the way to assembly when possible, but the latter are,
    > at best, optimized into C function calls that perform the
    > operation.


    That's unfortunate.

    > Thus there is always some function call overhead. Fixing
    > this would require someone who groks both the guts of Psyco and the
    > intracies of x86 floating point machine code to step forward.


    If someone volunteers, that would be great. I don't know anything
    about x86 floating point; is it a real mess?

    > If one takes into the accout the speed difference of the two CPUs this
    > puts the both the numarray and psyco solutions within about 50% of the
    > Fortran solution, which I'm impressed by.


    I've now run a bunch of timings of all the methods that have been
    proposed (except that I used C instead of Fortran) on one machine,
    a 2.66GHz P4. Here are the results, in seconds:

    Python 2.3.4 Python 2.4a1

    naive loop: 0.657765 0.589741
    naive loop psyco: 0.159085
    naive loop weave: 0.084898 *
    numarray: 0.115775 **
    imap lots: 1.359815 0.994505
    imap fn: 0.979589 0.758308
    custom gen: 0.841540 0.681277
    gen expr: 0.694567

    naive loop C: 0.040 *

    Only one of the tests used psyco. I did run the others with psyco
    too, but the times were about the same or slower.

    For me, psyco was about four times slower than C, and numarray was
    almost 3 times slower. I'm surprised by how close psyco and numarray
    were in your runs, and how slow Fortran was in beliavsky's test.
    My C program measures user+sys cpu time for the loop itself, but
    not start-up time for the program. In any case, getting within
    a factor of four of C, with a random number generator that is probably
    better, is pretty good!

    One really should compare the random number generators more carefully,
    since they could take a significant fraction of the time. The lines
    with one * use C's rand(). The line with ** uses numarray's random
    array function. Does anyone know what random number generator is used
    by numarray?

    By the way, note how well Python 2.4 performs compared with 2.3.4.
    (Psyco, weave, numarray not shown because I don't have them for 2.4.)

    I'm still curious about whether it could be possible to get
    really fast loops in Python using iterators and expressions like
    sum(1 + it1 - 2 * it2), where it1 and it2 are iterators that produce
    numbers. Could Python be clever enough to implement that as a
    for loop in C with just two or three C function calls in the loop?

    Dan

    Here's the code I used:

    -----pi.py-----
    from random import random
    from itertools import *
    from math import sqrt
    from operator import pow, sub, add

    from timeTest import *

    try:
    import weave
    haveWeave = True
    except:
    haveWeave = False

    try:
    import numarray
    import numarray.random_array
    haveNumarray = True
    except:
    haveNumarray = False

    def v1(n = 500000):
    "naive loop"
    rand = random
    sqr = sqrt
    sm = 0
    for i in range(n):
    r = rand()
    sm += sqr(1-r*r)
    return 4*sm/n

    def v1weave(n = 500000):
    "naive loop weave"

    support = "#include <stdlib.h>"

    code = """
    double sm;
    float rnd;
    srand(1); // seed random number generator
    sm = 0.0;
    for(int i=0;i<n;++i) {
    rnd = rand()/(RAND_MAX+1.0);
    sm += sqrt(1.0-rnd*rnd);
    }
    return_val = 4.0*sm/n;"""
    return weave.inline(code,('n'),support_code=support)

    if(haveNumarray):
    def v2(n = 500000):
    "numarray"
    a = numarray.random_array.uniform(0, 1, n)
    return 4 * numarray.sum(numarray.sqrt(1 - a*a)) / n

    def v3(n = 500000):
    "imap lots"
    rand = random
    pw = pow
    sqr = sqrt
    sq = lambda x: x*x
    sm = sum(imap(sqr,
    imap(sub, repeat(1),
    imap(sq,
    imap(lambda dummy: rand(), repeat(None, n))))))
    return 4*sm/n

    def v4(n = 500000):
    "imap fn"
    rand = random
    pw = pow
    sqr = sqrt

    def fn(dummy):
    r = rand()
    return sqr(1-r*r)

    sm = sum(imap(fn, repeat(None, n)))
    return 4*sm/n

    def custom(n):
    rand = random
    for i in xrange(n):
    yield(sqrt(1-rand()**2))

    def v5(n = 500000):
    "custom gen"
    sm = sum(custom(n))
    return 4*sm/n

    # The commented out psyco runs don't show significant improvement
    # (or are slower).

    if haveWeave:
    v1weave()

    timeTest(v1)
    timeTestWithPsyco(v1)
    if haveWeave:
    timeTest(v1weave)
    if haveNumarray:
    timeTest(v2)
    # timeTestWithPsyco(v2)
    timeTest(v3)
    #timeTestWithPsyco(v3)
    timeTest(v4)
    #timeTestWithPsyco(v4)
    timeTest(v5)
    #psyco.bind(custom)
    #timeTestWithPsyco(v5)

    -----pi2.py----- (only for Python2.4)
    from random import random
    from math import sqrt

    from timeTest import *

    # Even if this is not executed, it is parsed, so Python 2.3 complains.

    def v6(n = 500000):
    "gen expr"
    rand = random
    sqr = sqrt
    sm = sum(sqr(1-rand()**2) for i in xrange(n))
    return 4*sm/n

    timeTest(v6)

    -----timeTest.py-----
    import time

    try:
    import psyco
    havePsyco = True
    except:
    havePsyco = False

    def timeTest(f):
    s = time.time()
    f()
    print "%18s: %f" % (f.__doc__, time.time()-s)

    def timeTestWithPsyco(f):
    if havePsyco:
    g = psyco.proxy(f)
    g.__doc__ = f.__doc__ + " psyco"
    g()
    timeTest(g)

    -----pi.c-----
    #include <stdlib.h>

    #include "/home/spin/C/spin_time.h"

    int main()
    {
    double sm;
    float rnd;
    int i;
    int n = 500000;
    spin_time start, end;

    spin_time_set(start);
    srand(1); // seed random number generator
    sm = 0.0;

    for(i=0;i<n;++i) {
    rnd = rand()/(RAND_MAX+1.0);
    sm += sqrt(1.0-rnd*rnd);
    }
    spin_time_set(end);
    printf("%f\n", 4.0*sm/n);
    printf("%18s: %.3f\n", "naive loop C", spin_time_both(start, end));
    }

    -----spin_time.h-----
    #include <time.h>
    #include <sys/times.h>
    #include <stdio.h>

    #define FACTOR (1.0/100.0)

    typedef struct {
    clock_t real;
    struct tms t;
    } spin_time;

    #define spin_time_set(st) st.real = times(&(st.t))

    // floating point number of seconds:
    #define spin_time_both(st_start, st_end) \
    ((st_end.t.tms_utime-st_start.t.tms_utime)*FACTOR + \
    (st_end.t.tms_stime-st_start.t.tms_stime)*FACTOR)
    Dan Christensen, Jul 13, 2004
    #14
  15. Karl Pech

    Tim Hochberg Guest

    wrote:
    > Tim Hochberg <> wrote:

    [HACK]
    >>If one takes into the accout the speed difference of the two CPUs this
    >>puts the both the numarray and psyco solutions within about 50% of the
    >>Fortran solution, which I'm impressed by. Of course, the Fortran code
    >>uses x**2 instead of x*x, so it too, might be sped by some tweaks.
    >>

    >
    >
    > With Compaq Visual Fortran, the speed with x**2 and x*x are the same. Recently
    > on comp.lang.fortran, a Fortran expert opined that a compiler that did not
    > optimize the calculation of integer powers to be as fast as the hand-coded
    > equivalent would not be taken seriously.


    I suppose that's not suprising. In that case, I'm more impressed with
    the psyco results than I was. Particularly since Psycos floating point
    support is incomplete.

    > Having to write out integer powers in Python is awkward, although the performance
    > improvement may be worth it. The code x*x*x*x is less legible than x**4,
    > especially if 'x' is replaced 'longname[1,2,3]'.


    I think that this is something that Psyco could optimize. However,
    currently psyco just punts on pow, passing it off to Python.

    > Often, what you want to compute powers of is itself an expression. Is Python
    > with Psyco able to run code such as
    >
    > z = sin(x)*sin(x)
    >
    > as fast as
    >
    > y = sin(x)
    > z = y*y ?


    Dunno. I'll check...

    > Does it make two calls to sin()? There is a performance hit for 'y = sin(x)*sin(x)'
    > when not using Psyco.


    It appears that it does not make this optimization. I suppose that's not
    suprising since my understanding is that Psyco generates really simple
    machine code.

    -tim
    Tim Hochberg, Jul 13, 2004
    #15
  16. Karl Pech

    Tim Hochberg Guest

    Dan Christensen wrote:

    > Tim Hochberg <> writes:
    >
    >
    >>I can more than double the speed of this under psyco by replacing **2
    >>with x*x. I have inside information that pow is extra slow on psyco(*)

    >
    >
    > Thanks for pointing this out. This change sped up all of the
    > methods I've tried. It's too bad that Python does optimize this,
    > but I case pow could be redefined at runtime.


    Your welcome.

    >>(*) Psyco's support for floating point operations is considerable
    >>slower than its support for integer operations. The latter are
    >>optimized all the way to assembly when possible, but the latter are,
    >>at best, optimized into C function calls that perform the
    >>operation.

    >
    >
    > That's unfortunate.


    Yes. There's still no psyco support for complex numbers at all, either.
    Meaning that they run at standard Python speed. However, I expect that
    to change any day now.

    >>Thus there is always some function call overhead. Fixing
    >>this would require someone who groks both the guts of Psyco and the
    >>intracies of x86 floating point machine code to step forward.

    >
    >
    > If someone volunteers, that would be great. I don't know anything
    > about x86 floating point; is it a real mess?


    From what I understand yes, but I haven't looked at assembly, let alone
    machine code for many, many years and I was never particularly adept at
    doing anything with it.

    >
    >>If one takes into the account the speed difference of the two CPUs this
    >>puts the both the numarray and psyco solutions within about 50% of the
    >>Fortran solution, which I'm impressed by.

    >
    >
    > I've now run a bunch of timings of all the methods that have been
    > proposed (except that I used C instead of Fortran) on one machine,
    > a 2.66GHz P4. Here are the results, in seconds:
    >
    > Python 2.3.4 Python 2.4a1
    >
    > naive loop: 0.657765 0.589741
    > naive loop psyco: 0.159085
    > naive loop weave: 0.084898 *
    > numarray: 0.115775 **
    > imap lots: 1.359815 0.994505
    > imap fn: 0.979589 0.758308
    > custom gen: 0.841540 0.681277
    > gen expr: 0.694567
    >
    > naive loop C: 0.040 *
    >
    > Only one of the tests used psyco. I did run the others with psyco
    > too, but the times were about the same or slower.


    FYI, generators are something that is not currently supported by Psyco.
    For a more complete list, see
    http://psyco.sourceforge.net/psycoguide/unsupported.html. If you
    bug^H^H^H, plead with Armin, maybe he'll implement them.

    > For me, psyco was about four times slower than C, and numarray was
    > almost 3 times slower. I'm surprised by how close psyco and numarray
    > were in your runs, and how slow Fortran was in beliavsky's test.
    > My C program measures user+sys cpu time for the loop itself, but
    > not start-up time for the program. In any case, getting within
    > a factor of four of C, with a random number generator that is probably
    > better, is pretty good!


    I'm using a somewhat old version of numarray, which could be part of it.

    >
    > One really should compare the random number generators more carefully,
    > since they could take a significant fraction of the time. The lines
    > with one * use C's rand(). The line with ** uses numarray's random
    > array function. Does anyone know what random number generator is used
    > by numarray?
    >
    > By the way, note how well Python 2.4 performs compared with 2.3.4.
    > (Psyco, weave, numarray not shown because I don't have them for 2.4.)
    >
    > I'm still curious about whether it could be possible to get
    > really fast loops in Python using iterators and expressions like
    > sum(1 + it1 - 2 * it2), where it1 and it2 are iterators that produce
    > numbers. Could Python be clever enough to implement that as a
    > for loop in C with just two or three C function calls in the loop?


    That would require new syntax, would it not? That seems unlikely. It
    seems that in 2.4, sum(1+i1-2*i2 for (i1, i2) in izip(i1, i2)) would
    work. It also seems that with sufficient work psyco could be made to
    optimize that in the manner you suggest. I wouldn't hold your breath though.

    -tim


    >
    > Dan
    >
    > Here's the code I used:
    >
    > -----pi.py-----
    > from random import random
    > from itertools import *
    > from math import sqrt
    > from operator import pow, sub, add
    >
    > from timeTest import *
    >
    > try:
    > import weave
    > haveWeave = True
    > except:
    > haveWeave = False
    >
    > try:
    > import numarray
    > import numarray.random_array
    > haveNumarray = True
    > except:
    > haveNumarray = False
    >
    > def v1(n = 500000):
    > "naive loop"
    > rand = random
    > sqr = sqrt
    > sm = 0
    > for i in range(n):
    > r = rand()
    > sm += sqr(1-r*r)
    > return 4*sm/n
    >
    > def v1weave(n = 500000):
    > "naive loop weave"
    >
    > support = "#include <stdlib.h>"
    >
    > code = """
    > double sm;
    > float rnd;
    > srand(1); // seed random number generator
    > sm = 0.0;
    > for(int i=0;i<n;++i) {
    > rnd = rand()/(RAND_MAX+1.0);
    > sm += sqrt(1.0-rnd*rnd);
    > }
    > return_val = 4.0*sm/n;"""
    > return weave.inline(code,('n'),support_code=support)
    >
    > if(haveNumarray):
    > def v2(n = 500000):
    > "numarray"
    > a = numarray.random_array.uniform(0, 1, n)
    > return 4 * numarray.sum(numarray.sqrt(1 - a*a)) / n
    >
    > def v3(n = 500000):
    > "imap lots"
    > rand = random
    > pw = pow
    > sqr = sqrt
    > sq = lambda x: x*x
    > sm = sum(imap(sqr,
    > imap(sub, repeat(1),
    > imap(sq,
    > imap(lambda dummy: rand(), repeat(None, n))))))
    > return 4*sm/n
    >
    > def v4(n = 500000):
    > "imap fn"
    > rand = random
    > pw = pow
    > sqr = sqrt
    >
    > def fn(dummy):
    > r = rand()
    > return sqr(1-r*r)
    >
    > sm = sum(imap(fn, repeat(None, n)))
    > return 4*sm/n
    >
    > def custom(n):
    > rand = random
    > for i in xrange(n):
    > yield(sqrt(1-rand()**2))
    >
    > def v5(n = 500000):
    > "custom gen"
    > sm = sum(custom(n))
    > return 4*sm/n
    >
    > # The commented out psyco runs don't show significant improvement
    > # (or are slower).
    >
    > if haveWeave:
    > v1weave()
    >
    > timeTest(v1)
    > timeTestWithPsyco(v1)
    > if haveWeave:
    > timeTest(v1weave)
    > if haveNumarray:
    > timeTest(v2)
    > # timeTestWithPsyco(v2)
    > timeTest(v3)
    > #timeTestWithPsyco(v3)
    > timeTest(v4)
    > #timeTestWithPsyco(v4)
    > timeTest(v5)
    > #psyco.bind(custom)
    > #timeTestWithPsyco(v5)
    >
    > -----pi2.py----- (only for Python2.4)
    > from random import random
    > from math import sqrt
    >
    > from timeTest import *
    >
    > # Even if this is not executed, it is parsed, so Python 2.3 complains.
    >
    > def v6(n = 500000):
    > "gen expr"
    > rand = random
    > sqr = sqrt
    > sm = sum(sqr(1-rand()**2) for i in xrange(n))
    > return 4*sm/n
    >
    > timeTest(v6)
    >
    > -----timeTest.py-----
    > import time
    >
    > try:
    > import psyco
    > havePsyco = True
    > except:
    > havePsyco = False
    >
    > def timeTest(f):
    > s = time.time()
    > f()
    > print "%18s: %f" % (f.__doc__, time.time()-s)
    >
    > def timeTestWithPsyco(f):
    > if havePsyco:
    > g = psyco.proxy(f)
    > g.__doc__ = f.__doc__ + " psyco"
    > g()
    > timeTest(g)
    >
    > -----pi.c-----
    > #include <stdlib.h>
    >
    > #include "/home/spin/C/spin_time.h"
    >
    > int main()
    > {
    > double sm;
    > float rnd;
    > int i;
    > int n = 500000;
    > spin_time start, end;
    >
    > spin_time_set(start);
    > srand(1); // seed random number generator
    > sm = 0.0;
    >
    > for(i=0;i<n;++i) {
    > rnd = rand()/(RAND_MAX+1.0);
    > sm += sqrt(1.0-rnd*rnd);
    > }
    > spin_time_set(end);
    > printf("%f\n", 4.0*sm/n);
    > printf("%18s: %.3f\n", "naive loop C", spin_time_both(start, end));
    > }
    >
    > -----spin_time.h-----
    > #include <time.h>
    > #include <sys/times.h>
    > #include <stdio.h>
    >
    > #define FACTOR (1.0/100.0)
    >
    > typedef struct {
    > clock_t real;
    > struct tms t;
    > } spin_time;
    >
    > #define spin_time_set(st) st.real = times(&(st.t))
    >
    > // floating point number of seconds:
    > #define spin_time_both(st_start, st_end) \
    > ((st_end.t.tms_utime-st_start.t.tms_utime)*FACTOR + \
    > (st_end.t.tms_stime-st_start.t.tms_stime)*FACTOR)
    Tim Hochberg, Jul 13, 2004
    #16
  17. Karl Pech

    Paul Rubin Guest

    Tim Hochberg <> writes:
    > > If someone volunteers, that would be great. I don't know anything
    > > about x86 floating point; is it a real mess?

    >
    > From what I understand yes, but I haven't looked at assembly, let
    > alone machine code for many, many years and I was never particularly
    > adept at doing anything with it.


    x86 floating point is a simple stack-based instruction set (like an HP
    calculator), plus some hacks that let you improve efficiency by
    shuffling the elements of the stack around in parallel with doing
    other operations (requiring careful instruction scheduling to use
    effectively), plus several completely different and incompatible
    vector-register oriented instruction sets (SSE2, 3DNow!) that are only
    on some processor models and are sort of a bag on the side of the x86.
    So x86 floating point is a mess in the sense that generating really
    optimal code on every cpu model requires understanding all this crap.
    But if you just use the basic stack-based stuff, it's pretty simple to
    generate code for, and that code will certainly outperform interpreted
    code by a wide margin.
    Paul Rubin, Jul 13, 2004
    #17
  18. Karl Pech

    Peter Hansen Guest

    Tim Hochberg wrote:

    > wrote:
    >> Does it make two calls to sin()? There is a performance hit for 'y =
    >> sin(x)*sin(x)'
    >> when not using Psyco.

    >
    > It appears that it does not make this optimization. I suppose that's not
    > suprising since my understanding is that Psyco generates really simple
    > machine code.


    I suppose technically, since this is Python, two consecutive calls
    to sin(x) could easily return different values... It would be
    insane to write code that did this, but the dynamic nature of
    Python surely allows it and it's even possible such a dynamic
    substitution would be of value (or even improve performance) in
    some other situations.

    -Peter
    Peter Hansen, Jul 13, 2004
    #18
  19. Karl Pech

    Guest

    Dan Christensen <> wrote:

    > naive loop C: 0.040 *
    >
    >Only one of the tests used psyco. I did run the others with psyco
    >too, but the times were about the same or slower.
    >
    >For me, psyco was about four times slower than C, and numarray was
    >almost 3 times slower. I'm surprised by how close psyco and numarray
    >were in your runs, and how slow Fortran was in beliavsky's test.
    >My C program measures user+sys cpu time for the loop itself, but
    >not start-up time for the program.


    When I insert timing code within the Fortran program, the measured time is
    0.04 s, so the C and Fortran speeds are the same, although different random
    number generators are used.



    ----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
    http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
    ---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---
    , Jul 13, 2004
    #19
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. CW
    Replies:
    0
    Views:
    563
  2. mescaline
    Replies:
    4
    Views:
    1,334
    Cy Edmunds
    Sep 10, 2003
  3. Aleramo

    A simple program with Monte Carlo

    Aleramo, Dec 4, 2005, in forum: C Programming
    Replies:
    3
    Views:
    384
    Mark McIntyre
    Dec 4, 2005
  4. xoxoxo xoxoxo

    Shariffa Carlo

    xoxoxo xoxoxo, Dec 7, 2009, in forum: C++
    Replies:
    0
    Views:
    232
    xoxoxo xoxoxo
    Dec 7, 2009
  5. Kyung won Cheon
    Replies:
    0
    Views:
    202
    Kyung won Cheon
    Nov 21, 2008
Loading...

Share This Page