Re: Python speed vs csharp

Discussion in 'Python' started by David M. Cooke, Jul 31, 2003.

  1. At some point, Mike <> wrote:
    > Here's the chunk of code that I'm spending most of my time executing:
    >
    > # Rational approximation for erfc(x) (Abramowitz & Stegun, Sec. 7.1.26)
    > # Fifth order approximation. |error| <= 1.5e-7 for all x
    > #
    > def erfc( x ):
    > p = 0.3275911
    > a1 = 0.254829592
    > a2 = -0.284496736
    > a3 = 1.421413741
    > a4 = -1.453152027
    > a5 = 1.061405429
    >
    > t = 1.0 / (1.0 + p*float(x))
    > erfcx = ( (a1 + (a2 + (a3 +
    > (a4 + a5*t)*t)*t)*t)*t ) * math.exp(-(x**2))
    > return erfcx


    With python2.2, a million iterations takes 9.93 s on my machine.
    With python2.3, 8.13 s.

    Replacing float(x) with just x (multiplying by p (a float) is enough to
    coerce x to a float), with python 2.3 I'm down to 6.98 s. Assigning
    math.exp to a global variable exp cuts this down to 6.44 s:

    exp = math.exp
    def erfc(x):
    [...]
    t = 1.0 / (1.0 + p*x)
    erfcx = ... * exp(-x**2)
    return erfcx

    You can go a little further and look up ways of optimising the
    polynomial evaluation -- Horner's method is a good, general-purpose
    technique, but you can get faster if you're willing to put some effort
    into it. I believe Ralston and Rabonwitz's numerical analysis book has
    a discussion on this (sorry, I don't have the complete reference; the
    book's at home).

    That's about the best you can do in pure python.

    Now look at how you're calling this routine. Can you call it on a
    bunch of numbers at once? Using Numeric arrays (http://numpy.org/),
    calling the above routine on an array of a million numbers takes
    1.6 s. Calling it 1000 times on an array of 1000 numbers takes 0.61 s.

    Using erfc from scipy.special, this takes 0.65 s on a array of a
    million numbers. Scipy (from http://scipy.org/) is a library of
    scientific tools, supplementing Numeric. I believe the routine it uses
    for erfc comes from the Cephes library (http://netlib.org/), which has
    a error < 3.4e-14 for x in (-13,0). Calling it 1000 times on an array
    of 1000 numbers takes 0.43 s.

    (Ok, I don't know why 1000 times on arrays of 1000 numbers is twice as
    fast as once on an array of 1000*1000 numbers.)

    Extrapolating from your numbers, your C program is 28x faster. Using
    scipy.special.erfc on moderate-sized arrays is 18.9x faster. That's
    pretty good :)

    For numerical codes in Python, I *strongly* suggest that you use
    Numeric (or numarray, it's successor). It's _much_ better to pass 1000
    numbers around as a aggregate, instead of 1 number 1000 times.

    --
    |>|\/|<
    /--------------------------------------------------------------------------\
    |David M. Cooke
    |cookedm(at)physics(dot)mcmaster(dot)ca
     
    David M. Cooke, Jul 31, 2003
    #1
    1. Advertising

  2. David M. Cooke

    Terry Reedy Guest

    "David M. Cooke" <> wrote in message
    news:...
    > At some point, Mike <> wrote:
    > > Here's the chunk of code that I'm spending most of my time

    executing:
    > >
    > > # Rational approximation for erfc(x) (Abramowitz & Stegun, Sec.

    7.1.26)
    > > # Fifth order approximation. |error| <= 1.5e-7 for all x
    > > #
    > > def erfc( x ):
    > > p = 0.3275911
    > > a1 = 0.254829592
    > > a2 = -0.284496736
    > > a3 = 1.421413741
    > > a4 = -1.453152027
    > > a5 = 1.061405429
    > >
    > > t = 1.0 / (1.0 + p*float(x))
    > > erfcx = ( (a1 + (a2 + (a3 +
    > > (a4 + a5*t)*t)*t)*t)*t ) * math.exp(-(x**2))
    > > return erfcx


    Does inlining the constants give any speedup? or is local lookup so
    fast that it does not matter?

    TJR
     
    Terry Reedy, Aug 1, 2003
    #2
    1. Advertising

  3. At some point, "Terry Reedy" <> wrote:

    > "David M. Cooke" <> wrote in message
    > news:...
    >> At some point, Mike <> wrote:
    >> > Here's the chunk of code that I'm spending most of my time

    > executing:
    >> >
    >> > # Rational approximation for erfc(x) (Abramowitz & Stegun, Sec.

    > 7.1.26)
    >> > # Fifth order approximation. |error| <= 1.5e-7 for all x
    >> > #
    >> > def erfc( x ):
    >> > p = 0.3275911
    >> > a1 = 0.254829592
    >> > a2 = -0.284496736
    >> > a3 = 1.421413741
    >> > a4 = -1.453152027
    >> > a5 = 1.061405429
    >> >
    >> > t = 1.0 / (1.0 + p*float(x))
    >> > erfcx = ( (a1 + (a2 + (a3 +
    >> > (a4 + a5*t)*t)*t)*t)*t ) * math.exp(-(x**2))
    >> > return erfcx

    >
    > Does inlining the constants give any speedup? or is local lookup so
    > fast that it does not matter?


    Should've thought of that too :). It runs in 6.07 s now (my previous
    version was 6.45 s, and the unmodified version above was 8.23 s).

    Here's my best version:

    exp = math.exp
    def erfc( x ):
    t = 1.0 + 0.3275911*x
    return ( (0.254829592 + (
    (1.421413741 + (-1.453152027 + 1.061405429/t)/t)/t - 0.284496736)
    /t)/t ) * exp(-(x**2))


    The reversal of the order of the a2 evaluation is because looking at
    the bytecodes (using the dis module) it turns out that -0.284 was
    being stored as 0.284, then negated. However, -1.45 is stored as that.
    This runs in 5.77 s. You could speed that up a little if you're
    willing to manipulate the bytecode directly :)

    --
    |>|\/|<
    /--------------------------------------------------------------------------\
    |David M. Cooke
    |cookedm(at)physics(dot)mcmaster(dot)ca
     
    David M. Cooke, Aug 1, 2003
    #3
  4. David M. Cooke

    Brendan Hahn Guest

    (David M. Cooke) wrote:
    >(Ok, I don't know why 1000 times on arrays of 1000 numbers is twice as
    >fast as once on an array of 1000*1000 numbers.)


    Probably cache effects.

    --
    brendan DOT hahn AT hp DOT com
     
    Brendan Hahn, Aug 1, 2003
    #4
  5. David M. Cooke

    Andrew Dalke Guest

    David M. Cooke:
    > Here's my best version:
    >
    > exp = math.exp
    > def erfc( x ):


    Change that to

    def erfc(x, exp = math.exp)

    so there's a local namespace lookup instead of a global one. This
    is considered a hack.

    Andrew
     
    Andrew Dalke, Aug 1, 2003
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Martin v. =?iso-8859-15?q?L=F6wis?=

    Re: Python speed vs csharp

    Martin v. =?iso-8859-15?q?L=F6wis?=, Jul 31, 2003, in forum: Python
    Replies:
    1
    Views:
    448
    Bengt Richter
    Jul 31, 2003
  2. Richie Hindle

    Re: Python speed vs csharp

    Richie Hindle, Jul 31, 2003, in forum: Python
    Replies:
    3
    Views:
    451
    Richie Hindle
    Aug 1, 2003
  3. Bengt Richter

    Re: Python speed vs csharp

    Bengt Richter, Aug 1, 2003, in forum: Python
    Replies:
    1
    Views:
    357
    John Machin
    Aug 2, 2003
  4. Greg Brunet

    Re: Python speed vs csharp

    Greg Brunet, Aug 1, 2003, in forum: Python
    Replies:
    13
    Views:
    610
    Michele Simionato
    Aug 4, 2003
  5. Mike

    Re: Python speed vs csharp

    Mike, Aug 2, 2003, in forum: Python
    Replies:
    2
    Views:
    322
Loading...

Share This Page