Profiling gives very different predictions of best algorithm

R

Rick Muller

I'm the main programmer for the PyQuante package, a quantum chemistry
package in Python. I'm trying to speed up one of my rate determining
steps. Essentially, I have to decide between two algorithms:

1. Packed means that I compute N**4/8 integrals, and then do a bunch
of indexing operations to unpack;
2. Unpacked means that I compute all N**4 integrals, but don't have to
do any indexing.

Raw timing the two options show that packed is clearly faster (12.5
sec vs 20.6 sec). However, the profilings show very different results.
I have the results below. Clearly I'm going to use the packed scheme.
My question to the mailing list is what am I doing wrong with my
profiling that it shows such poor predictions? I rely on profiling a
great deal to tune my algorithms, and I'm used to seeing differences,
but nothing close to this magnitude.

Here is packed:
ncalls tottime percall cumtime percall filename:lineno(function)
11021725 84.493 0.000 84.493 0.000 :0(ijkl2intindex)
18 62.064 3.448 119.865 6.659 Ints.py:150(getK)
18 32.063 1.781 61.186 3.399 Ints.py:131(getJ)
52975 9.404 0.000 19.658 0.000 CGBF.py:189(coulomb)
313643 2.542 0.000 2.542 0.000 :0(range)
52975 2.260 0.000 2.260 0.000 :0(contr_coulomb)
218200 1.377 0.000 1.377 0.000 CGBF.py:51(norm)
211900 1.337 0.000 1.337 0.000 CGBF.py:53(powers)
211900 1.336 0.000 1.336 0.000 CGBF.py:56(exps)
211900 1.329 0.000 1.329 0.000 CGBF.py:58(pnorms)
211900 1.328 0.000 1.328 0.000 CGBF.py:52(origin)
211900 1.328 0.000 1.328 0.000 CGBF.py:57(coefs)
1 0.979 0.979 21.108 21.108 Ints.py:112(get2ints)
11790 0.197 0.000 0.197 0.000 :0(dot)
11828 0.166 0.000 0.166 0.000 :0(zeros)

Here is unpacked:
ncalls tottime percall cumtime percall filename:lineno(function)
18 16.158 0.898 17.544 0.975 Ints.py:167(getK)
52975 9.301 0.000 19.515 0.000 CGBF.py:189(coulomb)
18 4.584 0.255 5.904 0.328 Ints.py:146(getJ)
313643 2.630 0.000 2.630 0.000 :0(range)
52975 2.254 0.000 2.254 0.000 :0(contr_coulomb)
218200 1.375 0.000 1.375 0.000 CGBF.py:51(norm)
211900 1.330 0.000 1.330 0.000 CGBF.py:58(pnorms)
211900 1.325 0.000 1.325 0.000 CGBF.py:53(powers)
211900 1.325 0.000 1.325 0.000 CGBF.py:57(coefs)
211900 1.323 0.000 1.323 0.000 CGBF.py:56(exps)
211900 1.321 0.000 1.321 0.000 CGBF.py:52(origin)
1 0.782 0.782 20.373 20.373 Ints.py:114(get2ints)
1875 0.156 0.000 0.384 0.000 CGBF.py:106(nuclear)
11790 0.147 0.000 0.147 0.000 :0(dot)
17856 0.112 0.000 0.112 0.000 PGBF.py:63(coef)
 
T

Terry Reedy

Rick said:
I'm the main programmer for the PyQuante package, a quantum chemistry
package in Python. I'm trying to speed up one of my rate determining
steps. Essentially, I have to decide between two algorithms:

1. Packed means that I compute N**4/8 integrals, and then do a bunch
of indexing operations to unpack;
2. Unpacked means that I compute all N**4 integrals, but don't have to
do any indexing.

Raw timing the two options show that packed is clearly faster (12.5
sec vs 20.6 sec). However, the profilings show very different results.
I have the results below. Clearly I'm going to use the packed scheme.
My question to the mailing list is what am I doing wrong with my
profiling that it shows such poor predictions?

That *might* be easier to answer if you were to show exactly what you
did to get the odd-looking results ;-)
 
R

Rick

Sorry, I'm running the function:

def debugsuite():
import profile,pstats
profile.run('runsuite()','prof')
prof = pstats.Stats('prof')
prof.strip_dirs().sort_stats('time').print_stats(15)

where runsuite() runs the Hartree-Fock energy of a water molecule, and
is given by:


import unittest,logging
from PyQuante.CI import CIS
from PyQuante.Molecule import Molecule
from PyQuante.MP import MP2
from PyQuante.OEP import oep_hf,oep_hf_an
from PyQuante.PyQuante2 import SCF,SubspaceSolver,DmatSolver

class UnitTests(unittest.TestCase):
def setUp(self):
from PyQuante.Molecule import Molecule
self.h2o = Molecule('h2o',[(8,(0,0,0)),(1,(1.,0,0)),(1,
(0,1.,0))],
units="Angstrom")

def testH2OHF(self):
h2o_hf = SCF(self.h2o,method='HF')
h2o_hf.iterate()
self.assertAlmostEqual(h2o_hf.energy,-76.011755864850628,4)

def runsuite(verbose=True):
# To use psyco, uncomment this line:
#import psyco; psyco.full()
if verbose: verbosity=2
else: verbosity=1
# If you want more output, uncomment this line:
#logging.basicConfig(format="%(message)s",level=logging.DEBUG)
suite = unittest.TestLoader().loadTestsFromTestCase(UnitTests)
unittest.TextTestRunner(verbosity=verbosity).run(suite)
# Running without verbosity is equivalent to replacing the above
# two lines with the following:
#unittest.main()
return
 
T

Terry Reedy

Rick wrote:
[Rearrangee to put answer after question. Unless your name is Guido and
you are making a short pronouncement that you want to be sure everyone
sees, please do not toppost]
>>> My question to the mailing list is what am I doing wrong with my
>>> profiling that it shows such poor predictions?
[i wrote]
>> That *might* be easier to answer if you were to show exactly what you
>> did to get the odd-looking results ;-)
Sorry, I'm running the function:

def debugsuite():
import profile,pstats
profile.run('runsuite()','prof')
prof = pstats.Stats('prof')
prof.strip_dirs().sort_stats('time').print_stats(15)

where runsuite() runs the Hartree-Fock energy of a water molecule, and
is given by:


import unittest,logging
from PyQuante.CI import CIS
from PyQuante.Molecule import Molecule
from PyQuante.MP import MP2
from PyQuante.OEP import oep_hf,oep_hf_an
from PyQuante.PyQuante2 import SCF,SubspaceSolver,DmatSolver

As far as I could see, you only actually use Molecule and SCF
class UnitTests(unittest.TestCase):
def setUp(self):
from PyQuante.Molecule import Molecule

and you repeat the Molecule import here.
self.h2o = Molecule('h2o',[(8,(0,0,0)),(1,(1.,0,0)),(1,
(0,1.,0))],
units="Angstrom")

def testH2OHF(self):
h2o_hf = SCF(self.h2o,method='HF')
h2o_hf.iterate()

I presume that this is where the time goes.
self.assertAlmostEqual(h2o_hf.energy,-76.011755864850628,4)

def runsuite(verbose=True):
# To use psyco, uncomment this line:
#import psyco; psyco.full()
if verbose: verbosity=2
else: verbosity=1
# If you want more output, uncomment this line:
#logging.basicConfig(format="%(message)s",level=logging.DEBUG)
suite = unittest.TestLoader().loadTestsFromTestCase(UnitTests)
unittest.TextTestRunner(verbosity=verbosity).run(suite)
# Running without verbosity is equivalent to replacing the above
# two lines with the following:
#unittest.main()
return

I presume in your overall time text, you ran the two versions of the
algorith 'naked'. But, for some reason, you are profiling them embedded
inside a test suite and runner. It does not seem that this should
affect relative timing, but I have seen some pretty strange behaviors.
At best, it will add noise.

Let me expand my question: what did you do differently between the two
profile runs?

tjr
 
R

Rick

I presume in your overall time text, you ran the two versions of the
algorith 'naked'.  But, for some reason, you are profiling them embedded
inside a test suite and runner.  It does not seem that this should
affect relative timing, but I have seen some pretty strange behaviors.
At best, it will add noise.

Let me expand my question: what did you do differently between the two
profile runs?

When I compute the electron repulsion integrals, the two different
methods are:

if packed:
ijkl = intindex(i,j,k,l)
Ints[ijkl] = coulomb(bfs,bfs[j],bfs
[k],bfs[l])
else:
val = coulomb(bfs,bfs[j],bfs[k],bfs[l])
Ints[i,j,k,l] = val
Ints[j,i,k,l] = val
Ints[i,j,l,k] = val
Ints[j,i,l,k] = val
Ints[k,l,i,j] = val
Ints[k,l,j,i] = val
Ints[l,k,i,j] = val
Ints[l,k,j,i] = val

and when I access the integrals the differences are:

if packed:
index = intindex(i,j,k,l)
temp[kl] = Ints[index]
else:
temp[kl] = Ints[i,j,k,l]

If you look at the profiling, I'm making something like 11M calls to
intindex, which is a routine I've written in C. I thought that by
storing all N**4 integrals (rather than the N**4/8 that packed stores)
would remove the need for the intindex calls and speed things up, but
it's 50% slower. I can't really figure out why it's slower, though,
since profiling makes it look like the slower version is faster.
Something like 30% of the time for the full PyQuante test suite is
taken up with calls to intindex, and I'd like to remove this if
possible.

I also wrote a version of the code where I stored all of the intindex
values in a Python dictionary rather than calling a C function. That
version appeared to be roughly the same speed when I profiled the
code, although I didn't test it without profiling (stupidly) because
at that point I didn't recognize the magnitude of the discrepancy
between the profiling/nonprofiling timings.

I was posting to the list mostly because I would like to know whether
I can do something different when I profile my code so that it give
results that correspond more closely to those that the nonprofiling
results give.
 
R

Rick

I was posting to the list mostly because I would like to know whether
I can do something different when I profile my code so that it give
results that correspond more closely to those that the nonprofiling
results give.

A good comment that was emailed to me rather than being posted to the
list was to use cProfile rather than profile. Doing this gave much
more realistic results.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top