Profiling gives very different predictions of best algorithm

Rick Muller · May 1, 2009

I'm the main programmer for the PyQuante package, a quantum chemistry
package in Python. I'm trying to speed up one of my rate determining
steps. Essentially, I have to decide between two algorithms:

1. Packed means that I compute N**4/8 integrals, and then do a bunch
of indexing operations to unpack;
2. Unpacked means that I compute all N**4 integrals, but don't have to
do any indexing.

Raw timing the two options show that packed is clearly faster (12.5
sec vs 20.6 sec). However, the profilings show very different results.
I have the results below. Clearly I'm going to use the packed scheme.
My question to the mailing list is what am I doing wrong with my
profiling that it shows such poor predictions? I rely on profiling a
great deal to tune my algorithms, and I'm used to seeing differences,
but nothing close to this magnitude.

Here is packed:
ncalls tottime percall cumtime percall filename:lineno(function)
11021725 84.493 0.000 84.493 0.000 :0(ijkl2intindex)
18 62.064 3.448 119.865 6.659 Ints.py:150(getK)
18 32.063 1.781 61.186 3.399 Ints.py:131(getJ)
52975 9.404 0.000 19.658 0.000 CGBF.py:189(coulomb)
313643 2.542 0.000 2.542 0.000 :0(range)
52975 2.260 0.000 2.260 0.000 :0(contr_coulomb)
218200 1.377 0.000 1.377 0.000 CGBF.py:51(norm)
211900 1.337 0.000 1.337 0.000 CGBF.py:53(powers)
211900 1.336 0.000 1.336 0.000 CGBF.py:56(exps)
211900 1.329 0.000 1.329 0.000 CGBF.py:58(pnorms)
211900 1.328 0.000 1.328 0.000 CGBF.py:52(origin)
211900 1.328 0.000 1.328 0.000 CGBF.py:57(coefs)
1 0.979 0.979 21.108 21.108 Ints.py:112(get2ints)
11790 0.197 0.000 0.197 0.000 :0(dot)
11828 0.166 0.000 0.166 0.000 :0(zeros)

Here is unpacked:
ncalls tottime percall cumtime percall filename:lineno(function)
18 16.158 0.898 17.544 0.975 Ints.py:167(getK)
52975 9.301 0.000 19.515 0.000 CGBF.py:189(coulomb)
18 4.584 0.255 5.904 0.328 Ints.py:146(getJ)
313643 2.630 0.000 2.630 0.000 :0(range)
52975 2.254 0.000 2.254 0.000 :0(contr_coulomb)
218200 1.375 0.000 1.375 0.000 CGBF.py:51(norm)
211900 1.330 0.000 1.330 0.000 CGBF.py:58(pnorms)
211900 1.325 0.000 1.325 0.000 CGBF.py:53(powers)
211900 1.325 0.000 1.325 0.000 CGBF.py:57(coefs)
211900 1.323 0.000 1.323 0.000 CGBF.py:56(exps)
211900 1.321 0.000 1.321 0.000 CGBF.py:52(origin)
1 0.782 0.782 20.373 20.373 Ints.py:114(get2ints)
1875 0.156 0.000 0.384 0.000 CGBF.py:106(nuclear)
11790 0.147 0.000 0.147 0.000 :0(dot)
17856 0.112 0.000 0.112 0.000 PGBF.py:63(coef)

Terry Reedy · May 1, 2009

Rick said:
I'm the main programmer for the PyQuante package, a quantum chemistry
package in Python. I'm trying to speed up one of my rate determining
steps. Essentially, I have to decide between two algorithms:

1. Packed means that I compute N**4/8 integrals, and then do a bunch
of indexing operations to unpack;
2. Unpacked means that I compute all N**4 integrals, but don't have to
do any indexing.

Raw timing the two options show that packed is clearly faster (12.5
sec vs 20.6 sec). However, the profilings show very different results.
I have the results below. Clearly I'm going to use the packed scheme.
My question to the mailing list is what am I doing wrong with my
profiling that it shows such poor predictions?

That *might* be easier to answer if you were to show exactly what you
did to get the odd-looking results ;-)

Rick · May 1, 2009

Sorry, I'm running the function:

def debugsuite():
import profile,pstats
profile.run('runsuite()','prof')
prof = pstats.Stats('prof')
prof.strip_dirs().sort_stats('time').print_stats(15)

where runsuite() runs the Hartree-Fock energy of a water molecule, and
is given by:

import unittest,logging
from PyQuante.CI import CIS
from PyQuante.Molecule import Molecule
from PyQuante.MP import MP2
from PyQuante.OEP import oep_hf,oep_hf_an
from PyQuante.PyQuante2 import SCF,SubspaceSolver,DmatSolver

class UnitTests(unittest.TestCase):
def setUp(self):
from PyQuante.Molecule import Molecule
self.h2o = Molecule('h2o',[(8,(0,0,0)),(1,(1.,0,0)),(1,
(0,1.,0))],
units="Angstrom")

def testH2OHF(self):
h2o_hf = SCF(self.h2o,method='HF')
h2o_hf.iterate()
self.assertAlmostEqual(h2o_hf.energy,-76.011755864850628,4)

def runsuite(verbose=True):
# To use psyco, uncomment this line:
#import psyco; psyco.full()
if verbose: verbosity=2
else: verbosity=1
# If you want more output, uncomment this line:
#logging.basicConfig(format="%(message)s",level=logging.DEBUG)
suite = unittest.TestLoader().loadTestsFromTestCase(UnitTests)
unittest.TextTestRunner(verbosity=verbosity).run(suite)
# Running without verbosity is equivalent to replacing the above
# two lines with the following:
#unittest.main()
return

Terry Reedy · May 2, 2009

Rick wrote:
[Rearrangee to put answer after question. Unless your name is Guido and
you are making a short pronouncement that you want to be sure everyone
sees, please do not toppost]

>>> My question to the mailing list is what am I doing wrong with my
>>> profiling that it shows such poor predictions?

Click to expand...

[i wrote]
>> That *might* be easier to answer if you were to show exactly what you
>> did to get the odd-looking results ;-)

Click to expand...

Sorry, I'm running the function:

def debugsuite():
import profile,pstats
profile.run('runsuite()','prof')
prof = pstats.Stats('prof')
prof.strip_dirs().sort_stats('time').print_stats(15)

where runsuite() runs the Hartree-Fock energy of a water molecule, and
is given by:

import unittest,logging
from PyQuante.CI import CIS
from PyQuante.Molecule import Molecule
from PyQuante.MP import MP2
from PyQuante.OEP import oep_hf,oep_hf_an
from PyQuante.PyQuante2 import SCF,SubspaceSolver,DmatSolver

As far as I could see, you only actually use Molecule and SCF

class UnitTests(unittest.TestCase):
def setUp(self):
from PyQuante.Molecule import Molecule

and you repeat the Molecule import here.

self.h2o = Molecule('h2o',[(8,(0,0,0)),(1,(1.,0,0)),(1,
(0,1.,0))],
units="Angstrom")

def testH2OHF(self):
h2o_hf = SCF(self.h2o,method='HF')
h2o_hf.iterate()

I presume that this is where the time goes.

self.assertAlmostEqual(h2o_hf.energy,-76.011755864850628,4)

def runsuite(verbose=True):
# To use psyco, uncomment this line:
#import psyco; psyco.full()
if verbose: verbosity=2
else: verbosity=1
# If you want more output, uncomment this line:
#logging.basicConfig(format="%(message)s",level=logging.DEBUG)
suite = unittest.TestLoader().loadTestsFromTestCase(UnitTests)
unittest.TextTestRunner(verbosity=verbosity).run(suite)
# Running without verbosity is equivalent to replacing the above
# two lines with the following:
#unittest.main()
return

I presume in your overall time text, you ran the two versions of the
algorith 'naked'. But, for some reason, you are profiling them embedded
inside a test suite and runner. It does not seem that this should
affect relative timing, but I have seen some pretty strange behaviors.
At best, it will add noise.

Let me expand my question: what did you do differently between the two
profile runs?

tjr

Rick · May 2, 2009

I presume in your overall time text, you ran the two versions of the
algorith 'naked'. But, for some reason, you are profiling them embedded
inside a test suite and runner. It does not seem that this should
affect relative timing, but I have seen some pretty strange behaviors.
At best, it will add noise.

Let me expand my question: what did you do differently between the two
profile runs?

When I compute the electron repulsion integrals, the two different
methods are:

if packed:
ijkl = intindex(i,j,k,l)
Ints[ijkl] = coulomb(bfs,bfs[j],bfs
[k],bfs[l])
else:
val = coulomb(bfs,bfs[j],bfs[k],bfs[l])
Ints[i,j,k,l] = val
Ints[j,i,k,l] = val
Ints[i,j,l,k] = val
Ints[j,i,l,k] = val
Ints[k,l,i,j] = val
Ints[k,l,j,i] = val
Ints[l,k,i,j] = val
Ints[l,k,j,i] = val

and when I access the integrals the differences are:

if packed:
index = intindex(i,j,k,l)
temp[kl] = Ints[index]
else:
temp[kl] = Ints[i,j,k,l]

If you look at the profiling, I'm making something like 11M calls to
intindex, which is a routine I've written in C. I thought that by
storing all N**4 integrals (rather than the N**4/8 that packed stores)
would remove the need for the intindex calls and speed things up, but
it's 50% slower. I can't really figure out why it's slower, though,
since profiling makes it look like the slower version is faster.
Something like 30% of the time for the full PyQuante test suite is
taken up with calls to intindex, and I'd like to remove this if
possible.

I also wrote a version of the code where I stored all of the intindex
values in a Python dictionary rather than calling a C function. That
version appeared to be roughly the same speed when I profiled the
code, although I didn't test it without profiling (stupidly) because
at that point I didn't recognize the magnitude of the discrepancy
between the profiling/nonprofiling timings.

I was posting to the list mostly because I would like to know whether
I can do something different when I profile my code so that it give
results that correspond more closely to those that the nonprofiling
results give.

Rick · May 2, 2009

I was posting to the list mostly because I would like to know whether
I can do something different when I profile my code so that it give
results that correspond more closely to those that the nonprofiling
results give.

A good comment that was emailed to me rather than being posted to the
list was to use cProfile rather than profile. Doing this gave much
more realistic results.

definition of sub-functions in the hotshot profiler	0	Mar 23, 2006
Profiling, recursive func slower than imperative, normal?	1	Apr 16, 2008
profiling and performance of shelves	0	Jun 22, 2004
question about xrange performance	11	Apr 17, 2009
cProfiler Question	0	Oct 20, 2008
Performance profiling Python code	2	Mar 24, 2006
Python 2.3.3 : Win32 build vs Cygwin build performance ?	3	Jan 27, 2004
multithreading, performance, again...	1	Dec 30, 2009

Profiling gives very different predictions of best algorithm

Rick Muller

Terry Reedy

Rick

Terry Reedy

Rick

Rick

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads