Re: Python speed vs csharp

Discussion in 'Python' started by David M. Cooke, Jul 31, 2003.

1. David M. CookeGuest

At some point, Mike <> wrote:
> Here's the chunk of code that I'm spending most of my time executing:
>
> # Rational approximation for erfc(x) (Abramowitz & Stegun, Sec. 7.1.26)
> # Fifth order approximation. |error| <= 1.5e-7 for all x
> #
> def erfc( x ):
> p = 0.3275911
> a1 = 0.254829592
> a2 = -0.284496736
> a3 = 1.421413741
> a4 = -1.453152027
> a5 = 1.061405429
>
> t = 1.0 / (1.0 + p*float(x))
> erfcx = ( (a1 + (a2 + (a3 +
> (a4 + a5*t)*t)*t)*t)*t ) * math.exp(-(x**2))
> return erfcx

With python2.2, a million iterations takes 9.93 s on my machine.
With python2.3, 8.13 s.

Replacing float(x) with just x (multiplying by p (a float) is enough to
coerce x to a float), with python 2.3 I'm down to 6.98 s. Assigning
math.exp to a global variable exp cuts this down to 6.44 s:

exp = math.exp
def erfc(x):
[...]
t = 1.0 / (1.0 + p*x)
erfcx = ... * exp(-x**2)
return erfcx

You can go a little further and look up ways of optimising the
polynomial evaluation -- Horner's method is a good, general-purpose
technique, but you can get faster if you're willing to put some effort
into it. I believe Ralston and Rabonwitz's numerical analysis book has
a discussion on this (sorry, I don't have the complete reference; the
book's at home).

That's about the best you can do in pure python.

Now look at how you're calling this routine. Can you call it on a
bunch of numbers at once? Using Numeric arrays (http://numpy.org/),
calling the above routine on an array of a million numbers takes
1.6 s. Calling it 1000 times on an array of 1000 numbers takes 0.61 s.

Using erfc from scipy.special, this takes 0.65 s on a array of a
million numbers. Scipy (from http://scipy.org/) is a library of
scientific tools, supplementing Numeric. I believe the routine it uses
for erfc comes from the Cephes library (http://netlib.org/), which has
a error < 3.4e-14 for x in (-13,0). Calling it 1000 times on an array
of 1000 numbers takes 0.43 s.

(Ok, I don't know why 1000 times on arrays of 1000 numbers is twice as
fast as once on an array of 1000*1000 numbers.)

scipy.special.erfc on moderate-sized arrays is 18.9x faster. That's
pretty good

For numerical codes in Python, I *strongly* suggest that you use
Numeric (or numarray, it's successor). It's _much_ better to pass 1000
numbers around as a aggregate, instead of 1 number 1000 times.

--
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke
|cookedm(at)physics(dot)mcmaster(dot)ca

David M. Cooke, Jul 31, 2003

2. Terry ReedyGuest

"David M. Cooke" <> wrote in message
news:...
> At some point, Mike <> wrote:
> > Here's the chunk of code that I'm spending most of my time

executing:
> >
> > # Rational approximation for erfc(x) (Abramowitz & Stegun, Sec.

7.1.26)
> > # Fifth order approximation. |error| <= 1.5e-7 for all x
> > #
> > def erfc( x ):
> > p = 0.3275911
> > a1 = 0.254829592
> > a2 = -0.284496736
> > a3 = 1.421413741
> > a4 = -1.453152027
> > a5 = 1.061405429
> >
> > t = 1.0 / (1.0 + p*float(x))
> > erfcx = ( (a1 + (a2 + (a3 +
> > (a4 + a5*t)*t)*t)*t)*t ) * math.exp(-(x**2))
> > return erfcx

Does inlining the constants give any speedup? or is local lookup so
fast that it does not matter?

TJR

Terry Reedy, Aug 1, 2003

3. David M. CookeGuest

At some point, "Terry Reedy" <> wrote:

> "David M. Cooke" <> wrote in message
> news:...
>> At some point, Mike <> wrote:
>> > Here's the chunk of code that I'm spending most of my time

> executing:
>> >
>> > # Rational approximation for erfc(x) (Abramowitz & Stegun, Sec.

> 7.1.26)
>> > # Fifth order approximation. |error| <= 1.5e-7 for all x
>> > #
>> > def erfc( x ):
>> > p = 0.3275911
>> > a1 = 0.254829592
>> > a2 = -0.284496736
>> > a3 = 1.421413741
>> > a4 = -1.453152027
>> > a5 = 1.061405429
>> >
>> > t = 1.0 / (1.0 + p*float(x))
>> > erfcx = ( (a1 + (a2 + (a3 +
>> > (a4 + a5*t)*t)*t)*t)*t ) * math.exp(-(x**2))
>> > return erfcx

>
> Does inlining the constants give any speedup? or is local lookup so
> fast that it does not matter?

Should've thought of that too . It runs in 6.07 s now (my previous
version was 6.45 s, and the unmodified version above was 8.23 s).

Here's my best version:

exp = math.exp
def erfc( x ):
t = 1.0 + 0.3275911*x
return ( (0.254829592 + (
(1.421413741 + (-1.453152027 + 1.061405429/t)/t)/t - 0.284496736)
/t)/t ) * exp(-(x**2))

The reversal of the order of the a2 evaluation is because looking at
the bytecodes (using the dis module) it turns out that -0.284 was
being stored as 0.284, then negated. However, -1.45 is stored as that.
This runs in 5.77 s. You could speed that up a little if you're
willing to manipulate the bytecode directly

--
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke
|cookedm(at)physics(dot)mcmaster(dot)ca

David M. Cooke, Aug 1, 2003
4. Brendan HahnGuest

(David M. Cooke) wrote:
>(Ok, I don't know why 1000 times on arrays of 1000 numbers is twice as
>fast as once on an array of 1000*1000 numbers.)

Probably cache effects.

--
brendan DOT hahn AT hp DOT com

Brendan Hahn, Aug 1, 2003
5. Andrew DalkeGuest

David M. Cooke:
> Here's my best version:
>
> exp = math.exp
> def erfc( x ):

Change that to

def erfc(x, exp = math.exp)

so there's a local namespace lookup instead of a global one. This
is considered a hack.

Andrew

Andrew Dalke, Aug 1, 2003