Python is darn fast (was: How fast is Python)

Discussion in 'Python' started by Michele Simionato, Aug 23, 2003.

  1. I posted this few weeks ago (remember the C Sharp thread?) but it went
    unnoticed on the large mass of posts, so let me retry. Here I get Python+
    Psyco twice as fast as optimized C, so I would like to now if something
    is wrong on my old laptop and if anybody can reproduce my results.
    Here are I my numbers for calling the error function a million times
    (Python 2.3, Psyco 1.0, Red Hat Linux 7.3, Pentium II 366 MHz):

    $ time p23 erf.py
    real 0m0.614s
    user 0m0.551s
    sys 0m0.029s

    This is twice as fast as optimized C:

    $ gcc erf.c -lm -o3
    $ time ./a.out
    real 0m1.125s
    user 0m1.086s
    sys 0m0.006s

    Here is the situation for pure Python

    $time p23 erf.jy
    real 0m25.761s
    user 0m25.012s
    sys 0m0.049s

    and, just for fun, here is Jython performance:

    $ time jython erf.jy
    real 0m42.979s
    user 0m41.430s
    sys 0m0.361s

    The source code follows (copied from Alex Martelli's post):

    ----------------------------------------------------------------------

    $ cat erf.py
    import math
    import psyco
    psyco.full()

    def erfc(x):
    exp = math.exp

    p = 0.3275911
    a1 = 0.254829592
    a2 = -0.284496736
    a3 = 1.421413741
    a4 = -1.453152027
    a5 = 1.061405429

    t = 1.0 / (1.0 + p*x)
    erfcx = ( (a1 + (a2 + (a3 +
    (a4 + a5*t)*t)*t)*t)*t ) * exp(-x*x)
    return erfcx

    def main():
    erg = 0.0

    for i in xrange(1000000):
    erg += erfc(0.456)

    if __name__ == '__main__':
    main()

    --------------------------------------------------------------------------

    # python/jython version = same without "import psyco; psyco.full()"

    --------------------------------------------------------------------------

    $cat erf.c
    #include <stdio.h>
    #include <math.h>

    double erfc( double x )
    {
    double p, a1, a2, a3, a4, a5;
    double t, erfcx;

    p = 0.3275911;
    a1 = 0.254829592;
    a2 = -0.284496736;
    a3 = 1.421413741;
    a4 = -1.453152027;
    a5 = 1.061405429;

    t = 1.0 / (1.0 + p*x);
    erfcx = ( (a1 + (a2 + (a3 +
    (a4 + a5*t)*t)*t)*t)*t ) * exp(-x*x);

    return erfcx;
    }

    int main()
    {
    double erg=0.0;
    int i;

    for(i=0; i<1000000; i++)
    {
    erg = erg + erfc(0.456);
    }

    return 0;
    }

    Michele Simionato, Ph. D.

    http://www.phyast.pitt.edu/~micheles
    --- Currently looking for a job ---
    Michele Simionato, Aug 23, 2003
    #1
    1. Advertising

  2. Michele Simionato wrote:
    > I posted this few weeks ago (remember the C Sharp thread?) but it went
    > unnoticed on the large mass of posts, so let me retry. Here I get Python+
    > Psyco twice as fast as optimized C, so I would like to now if something
    > is wrong on my old laptop and if anybody can reproduce my results.


    I can. :)

    I had to increase the loop counter by a factor of 10 because it
    ran too fast on my machine (celeron 533 mhz), and added a print statement
    of the accumulated sum (erg). These are my results:

    [irmen@atlantis]$ gcc -O3 -march=pentium2 -mcpu=pentium2 -lm erf.c

    [irmen@atlantis]$ time ./a.out
    5190039.338694
    4.11user 0.00system 0:04.11elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (103major+13minor)pagefaults 0swaps

    [irmen@atlantis]$ time python2.3 erf.py
    5190039.33869
    2.91user 0.01system 0:02.92elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (544major+380minor)pagefaults 0swaps

    This is with gcc 3.2.2 on Mandrake 9.1.

    While Python + Psyco is not twice as fast as compiled & optimized C,
    it's still faster by almost 30% on my system, which is still great!!

    --Irmen
    Irmen de Jong, Aug 23, 2003
    #2
    1. Advertising

  3. Michele Simionato wrote:

    > $ time p23 erf.py
    > real 0m0.614s
    > user 0m0.551s
    > sys 0m0.029s
    >
    > This is twice as fast as optimized C:
    >
    > $ gcc erf.c -lm -o3
    > $ time ./a.out
    > real 0m1.125s
    > user 0m1.086s
    > sys 0m0.006s
    >
    > Here is the situation for pure Python
    >
    > $time p23 erf.jy
    > real 0m25.761s
    > user 0m25.012s
    > sys 0m0.049s
    >
    > and, just for fun, here is Jython performance:
    >
    > $ time jython erf.jy
    > real 0m42.979s
    > user 0m41.430s
    > sys 0m0.361s


    Mmm...on my machine C is faster. What version of GCC do you have? I think
    2.9x, right?

    These are my timings (Debian GNU Linux Unstable, Duron 1300, Python2.3,
    Psyco CVS, GCC 3.3.2, Java 1.4.1):

    $ time python erf.py

    real 0m0.251s
    user 0m0.207s
    sys 0m0.012s

    $ gcc erf.c -lm -O3
    $ time ./a.out

    real 0m0.162s
    user 0m0.157s
    sys 0m0.001s

    Notice that C is faster than Psyco + Python2.3 on my machine (about 65% of
    speedup)

    Without Psyco Python2.3 tooks about 6 seconds

    $ time python erf.jy

    real 0m6.177s
    user 0m6.040s
    sys 0m0.010s


    And Jython is definitely slower :)

    $ time jython erf.jy

    real 0m10.423s
    user 0m9.506s
    sys 0m0.197s


    --
    Lawrence "Rhymes" Oluyede
    http://loluyede.blogspot.com
    Lawrence Oluyede, Aug 23, 2003
    #3
  4. Michele Simionato

    Guest

    Michele Simionato wrote:
    > I posted this few weeks ago (remember the C Sharp thread?) but it went
    > unnoticed on the large mass of posts, so let me retry. Here I get Python+
    > Psyco twice as fast as optimized C
    >
    > $ gcc erf.c -lm -O3


    try a 3.x series gcc with the appropriate -march=pentium3
    You'll be pleasently surprised. I can't understand how
    the sudden improvment of gcc code generation lately hasn't
    been hyped more? If you want to try different machines
    then http://www.pixelbeat.org/scripts/gcccpuopt will give
    you the appropriate machine specific gcc options to use.
    Note also -ffast-math might help a lot in this application?

    cheers,
    Pádraig.
    , Aug 23, 2003
    #4
  5. Michele Simionato

    John J. Lee Guest

    Irmen de Jong <> writes:

    > wrote:
    >
    > > try a 3.x series gcc with the appropriate -march=pentium3
    > > You'll be pleasently surprised.

    >
    > In my other reply I mentioned that I still get a Python+Psyco
    > advantage of 30% over a gcc 3.2.2 compiled version.
    > My gcc is doing a lot better than Michele's reported 50% difference,
    > but Python+Psyco still wins :)


    So, the interesting part is: why?


    John
    John J. Lee, Aug 24, 2003
    #5
  6. Michele Simionato

    Simon Burton Guest

    On Sun, 24 Aug 2003 00:31:15 +0100, John J. Lee wrote:

    > Irmen de Jong <> writes:
    >
    >> wrote:
    >>

    ....
    >> but Python+Psyco still wins :)

    >
    > So, the interesting part is: why?
    >
    >
    > John


    My suspicion is that when psyco looks at erfc, it
    finds that nothing changes and so replaces the
    function call with the resulting number (am i right? it's the
    same each time?). This is what a "specializing compiler"
    would do, me thinks. So, try using a different number
    with each call.

    Simon.
    Simon Burton, Aug 24, 2003
    #6
  7. Michele Simionato

    Van Gale Guest

    Lawrence Oluyede wrote:
    > wrote:
    >
    >>If you want to try different machines
    >>then http://www.pixelbeat.org/scripts/gcccpuopt will give
    >>you the appropriate machine specific gcc options to use.

    >
    > Very cool script, thanks :) Anyway it didn't change so much with erf.c
    > erfCPU is compiled with the flags suggested by gcccpuopt script:
    >
    > $ gcccpuopt
    > -march=athlon-xp -mfpmath=sse -msse -mmmx -m3dnow


    You still need some -O optimization flags. The -m options just let gcc
    generate some nice instructions specific to your Athlon CPU.

    Also, I don't think that script is all that useful because at least some
    (if not all) of those -m options are already implied by -march=athlon-xp
    (I don't recall which ones off the top of my head but I'll find a
    reference for anyone interested... you can also find out by looking at
    the gcc command line option parsing code).

    Anyone who wants some other good ideas for the best flags on their
    machine check out ccbench:

    http://www.rocklinux.net/packages/ccbench.html

    The problem here of course is that not all applications behave like the
    benchmarks :(

    Van Gale
    Van Gale, Aug 24, 2003
    #7
  8. Van Gale wrote:

    > You still need some -O optimization flags. The -m options just let gcc
    > generate some nice instructions specific to your Athlon CPU.


    I didn't mention but I also used -O3 flag. I don't know why but on my
    machine C code is faster than psyco code in this test

    --
    Lawrence "Rhymes" Oluyede
    http://loluyede.blogspot.com
    Lawrence Oluyede, Aug 24, 2003
    #8
  9. Van Gale <> wrote in message news:<XKW1b.4512$>...
    > Michele Simionato wrote:
    > > I posted this few weeks ago (remember the C Sharp thread?) but it went
    > > unnoticed on the large mass of posts, so let me retry. Here I get Python+
    > > Psyco twice as fast as optimized C, so I would like to now if something
    > > is wrong on my old laptop and if anybody can reproduce my results.
    > > Here are I my numbers for calling the error function a million times
    > > (Python 2.3, Psyco 1.0, Red Hat Linux 7.3, Pentium II 366 MHz):
    > >
    > > $ gcc erf.c -lm -o3

    >
    > Did you really use "-o3" instead of "-O3"? The lowercase -o3 will
    > produce object code file named "3" instead of doing optimization.


    Yes, I used -O3, this was a misprint in the e-email. The compiler was
    gcc 2.96.

    Michele Simionato, Ph. D.

    http://www.phyast.pitt.edu/~micheles
    --- Currently looking for a job ---
    Michele Simionato, Aug 24, 2003
    #9
  10. I finally came to the conclusion that the exceeding good performance
    of Psyco was due to the fact that the function was called a million
    times with the *same* argument. Evidently Psyco is smart enough to
    notice that. Changing the argument at each call
    (erfc(0.456) -> i/1000000.0) slows down Python+Psyco at 1/4 of C speed.
    Psyco improves Python performance by an order of magnitude, but still it
    is not enough :-(

    I was too optimistic!

    Here I my numbers for Python 2.3, Psyco 1.0, Red Hat Linux 7.3,
    Pentium II 366 MHz:

    $ time p23 erf.py
    real 0m3.245s
    user 0m3.164s
    sys 0m0.037s

    This is more than four times slower than optimized C:

    $ gcc erf.c -lm -O3
    $ time ./a.out
    real 0m0.742s
    user 0m0.725s
    sys 0m0.002s

    Here is the situation for pure Python

    $time p23 erf.jy
    real 0m27.470s
    user 0m27.162s
    sys 0m0.023s

    and, just for fun, here is Jython performance:

    $ time jython erf.jy
    real 0m44.395s
    user 0m42.602s
    sys 0m0.389s

    ----------------------------------------------------------------------

    $ cat erf.py
    import math
    import psyco
    psyco.full()

    def erfc(x):
    exp = math.exp

    p = 0.3275911
    a1 = 0.254829592
    a2 = -0.284496736
    a3 = 1.421413741
    a4 = -1.453152027
    a5 = 1.061405429

    t = 1.0 / (1.0 + p*x)
    erfcx = ( (a1 + (a2 + (a3 +
    (a4 + a5*t)*t)*t)*t)*t ) * exp(-x*x)
    return erfcx

    def main():
    erg = 0.0

    for i in xrange(1000000):
    erg += erfc(i/1000000.0)

    if __name__ == '__main__':
    main()

    --------------------------------------------------------------------------

    # python/jython version = same without "import psyco; psyco.full()"

    --------------------------------------------------------------------------

    $cat erf.c
    #include <stdio.h>
    #include <math.h>

    double erfc( double x )
    {
    double p, a1, a2, a3, a4, a5;
    double t, erfcx;

    p = 0.3275911;
    a1 = 0.254829592;
    a2 = -0.284496736;
    a3 = 1.421413741;
    a4 = -1.453152027;
    a5 = 1.061405429;

    t = 1.0 / (1.0 + p*x);
    erfcx = ( (a1 + (a2 + (a3 +
    (a4 + a5*t)*t)*t)*t)*t ) * exp(-x*x);

    return erfcx;
    }

    int main()
    {
    double erg=0.0;
    int i;

    for(i=0; i<1000000; i++)
    {
    erg = erg + erfc(i/1000000.0);
    }

    return 0;
    }

    Michele Simionato, Ph. D.

    http://www.phyast.pitt.edu/~micheles/
    ---- Currently looking for a job ----
    Michele Simionato, Aug 24, 2003
    #10
  11. Michele Simionato

    Tim Hochberg Guest

    Michele Simionato wrote:
    > I finally came to the conclusion that the exceeding good performance
    > of Psyco was due to the fact that the function was called a million
    > times with the *same* argument. Evidently Psyco is smart enough to
    > notice that. Changing the argument at each call
    > (erfc(0.456) -> i/1000000.0) slows down Python+Psyco at 1/4 of C speed.
    > Psyco improves Python performance by an order of magnitude, but still it
    > is not enough :-(


    This is not suprising. Last I checked, Psyco does not fully compile
    floating point expressions. If, I rememeber correctly (though every time
    try to delve too deeply into Psyco my brains start oozing out my ears),
    there are three ways a in which a given chunk of code evaluated. At one
    level, which I'll call #1, Psyco generates the machine code(*) for the
    expression. At a second level, Psyco calls out to C helper functions,
    but still works with unboxed values. At the third level, Psyco punts and
    creates a Python object and hands things off to the interpreter.

    Most integer functions operate at level #1, so they tend to be quite
    fast. Most floating point operations operate at level #2, so they have a
    certain amount of overhead, but are still much faster than unpsyco
    (sane?) Python. I believe the reason for this is that x86 floating point
    operations are very messy, so Armin punted...

    (*) Armin is working on virtual machine implementation of Psyco, so it
    should be available on non x86 machines soon.

    FWIW,

    -tim


    > I was too optimistic!
    >
    > Here I my numbers for Python 2.3, Psyco 1.0, Red Hat Linux 7.3,
    > Pentium II 366 MHz:
    >
    > $ time p23 erf.py
    > real 0m3.245s
    > user 0m3.164s
    > sys 0m0.037s
    >
    > This is more than four times slower than optimized C:
    >
    > $ gcc erf.c -lm -O3
    > $ time ./a.out
    > real 0m0.742s
    > user 0m0.725s
    > sys 0m0.002s
    >
    > Here is the situation for pure Python
    >
    > $time p23 erf.jy
    > real 0m27.470s
    > user 0m27.162s
    > sys 0m0.023s
    >
    > and, just for fun, here is Jython performance:
    >
    > $ time jython erf.jy
    > real 0m44.395s
    > user 0m42.602s
    > sys 0m0.389s
    >
    > ----------------------------------------------------------------------
    >
    > $ cat erf.py
    > import math
    > import psyco
    > psyco.full()
    >
    > def erfc(x):
    > exp = math.exp
    >
    > p = 0.3275911
    > a1 = 0.254829592
    > a2 = -0.284496736
    > a3 = 1.421413741
    > a4 = -1.453152027
    > a5 = 1.061405429
    >
    > t = 1.0 / (1.0 + p*x)
    > erfcx = ( (a1 + (a2 + (a3 +
    > (a4 + a5*t)*t)*t)*t)*t ) * exp(-x*x)
    > return erfcx
    >
    > def main():
    > erg = 0.0
    >
    > for i in xrange(1000000):
    > erg += erfc(i/1000000.0)
    >
    > if __name__ == '__main__':
    > main()
    >
    > --------------------------------------------------------------------------
    >
    > # python/jython version = same without "import psyco; psyco.full()"
    >
    > --------------------------------------------------------------------------
    >
    > $cat erf.c
    > #include <stdio.h>
    > #include <math.h>
    >
    > double erfc( double x )
    > {
    > double p, a1, a2, a3, a4, a5;
    > double t, erfcx;
    >
    > p = 0.3275911;
    > a1 = 0.254829592;
    > a2 = -0.284496736;
    > a3 = 1.421413741;
    > a4 = -1.453152027;
    > a5 = 1.061405429;
    >
    > t = 1.0 / (1.0 + p*x);
    > erfcx = ( (a1 + (a2 + (a3 +
    > (a4 + a5*t)*t)*t)*t)*t ) * exp(-x*x);
    >
    > return erfcx;
    > }
    >
    > int main()
    > {
    > double erg=0.0;
    > int i;
    >
    > for(i=0; i<1000000; i++)
    > {
    > erg = erg + erfc(i/1000000.0);
    > }
    >
    > return 0;
    > }
    >
    > Michele Simionato, Ph. D.
    >
    > http://www.phyast.pitt.edu/~micheles/
    > ---- Currently looking for a job ----
    Tim Hochberg, Aug 24, 2003
    #11
  12. Michele Simionato

    dan Guest

    (Michele Simionato) wrote in message
    news:<>...

    > I finally came to the conclusion that the exceeding good performance
    > of Psyco was due to the fact that the function was called a million
    > times with the *same* argument. Evidently Psyco is smart enough to
    > notice that. Changing the argument at each call
    > (erfc(0.456) -> i/1000000.0) slows down Python+Psyco at 1/4 of C speed.
    > Psyco improves Python performance by an order of magnitude, but still it
    > is not enough :-(
    >

    It's plenty! A factor of 4 from optimized C, considering the newness
    and limited resources behind psyco, is very encouraging, and good
    enough for most tasks. Java JIT compilers are still around a factor
    of 2 slower than C, and they've had at least 2 orders of magnitude
    more whumpage.

    This is a far cry from the factor of 10-30 I've been seeing with pure
    python. For performance-critical code, this could be the difference
    between hand-coding 5% versus 20% of your code.

    Excellent news!!
    dan, Aug 25, 2003
    #12
  13. Michele Simionato

    John J. Lee Guest

    (dan) writes:

    > (Michele Simionato) wrote in message
    > news:<>...

    [...]
    > This is a far cry from the factor of 10-30 I've been seeing with pure
    > python. For performance-critical code, this could be the difference
    > between hand-coding 5% versus 20% of your code.
    >
    > Excellent news!!


    If you care about this a lot, don't forget Pyrex.


    John
    John J. Lee, Aug 26, 2003
    #13
  14. Michele Simionato

    dan Guest

    right, pyrex -- looked at that a while ago. Compiled Python with
    C-style type declarations, right? Kinda like common lisp??? (I'm
    stretching my memory cells now)

    will review

    (John J. Lee) wrote in message news:<>...
    > (dan) writes:
    >
    > > (Michele Simionato) wrote in message
    > > news:<>...

    > [...]
    > > This is a far cry from the factor of 10-30 I've been seeing with pure
    > > python. For performance-critical code, this could be the difference
    > > between hand-coding 5% versus 20% of your code.
    > >
    > > Excellent news!!

    >
    > If you care about this a lot, don't forget Pyrex.
    >
    >
    > John
    dan, Aug 27, 2003
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Robocop
    Replies:
    2
    Views:
    431
    Andrew Hobbs
    Feb 22, 2004
  2. danny

    darn

    danny, Jan 1, 2007, in forum: Java
    Replies:
    1
    Views:
    526
    Andrew Thompson
    Jan 1, 2007
  3. gwowen

    Re: Darn default-int (minor vent)

    gwowen, Sep 24, 2009, in forum: C Programming
    Replies:
    3
    Views:
    278
    James Kuyper
    Sep 25, 2009
  4. Mark

    Re: Darn default-int (minor vent)

    Mark, Sep 25, 2009, in forum: C Programming
    Replies:
    7
    Views:
    364
    David Thompson
    Oct 5, 2009
  5. Chris Torek

    those darn exceptions

    Chris Torek, Jun 21, 2011, in forum: Python
    Replies:
    13
    Views:
    943
    John Nagle
    Jun 27, 2011
Loading...

Share This Page