Testing for performance regressions

Steven D'Aprano · Apr 5, 2011

I'm writing some tests to check for performance regressions (i.e. you
change a function, and it becomes much slower) and I was hoping for some
guidelines or hints.

This is what I have come up with so far:

* The disclaimers about timing code snippets that can be found in the
timeit module apply. If possible, use timeit rather than roll-you-own
timers.

* Put performance tests in a separate test suite, because they're
logically independent of regression tests and functional tests, and
therefore you might not want to run them all the time.

* Never compare the speed of a function to some fixed amount of time,
since that will depend on the hardware you are running on, but compare it
relative to some other function's running time. E.g.:

# Don't do this:
time_taken = Timer(my_func).timeit() # or similar
assert time_taken <= 10
# This is bad, since the test is hardware dependent, and a change
# in environment may cause this to fail even if the function
# hasn't changed.

# Instead do this:
time_taken = Timer(my_func).timeit()
baseline = Timer(simple_func).timeit()
assert time_taken <= 2*baseline
# my_func shouldn't be more than twice as expensive as simple_func
# no matter how fast or slow they are in absolute terms.

Any other lessons or hints I should know?

If it helps, my code will be targeting Python 3.1, and I'm using a
combination of doctest and unittest for the tests.

Thanks in advance,

geremy condra · Apr 5, 2011

I'm writing some tests to check for performance regressions (i.e. you
change a function, and it becomes much slower) and I was hoping for some
guidelines or hints.

This is what I have come up with so far:

* The disclaimers about timing code snippets that can be found in the
timeit module apply. If possible, use timeit rather than roll-you-own
timers.

Huh. In looking into timing attacks actually one of the biggest
lessons I learned was *not* to use timeit- that the overhead and
variance involved in using it will wind up consuming small changes in
behavior in ways that are fairly opaque until you really take it
apart.

* Put performance tests in a separate test suite, because they're
logically independent of regression tests and functional tests, and
therefore you might not want to run them all the time.

* Never compare the speed of a function to some fixed amount of time,
since that will depend on the hardware you are running on, but compare it
relative to some other function's running time. E.g.:

# Don't do this:
time_taken = Timer(my_func).timeit() # or similar
assert time_taken <= 10
# This is bad, since the test is hardware dependent, and a change
# in environment may cause this to fail even if the function
# hasn't changed.

# Instead do this:
time_taken = Timer(my_func).timeit()
baseline = Timer(simple_func).timeit()
assert time_taken <= 2*baseline
# my_func shouldn't be more than twice as expensive as simple_func
# no matter how fast or slow they are in absolute terms.

Any other lessons or hints I should know?

If you can get on it, emulab is great for doing network performance
and correctness testing, and even if you can't it might be worth
running a small one at your company. I wish I'd found out about it
years ago.

Geremy Condra

Steven D'Aprano · Apr 5, 2011

On Mon, Apr 4, 2011 at 7:45 PM, Steven D'Aprano

Huh. In looking into timing attacks actually one of the biggest lessons
I learned was *not* to use timeit- that the overhead and variance
involved in using it will wind up consuming small changes in behavior in
ways that are fairly opaque until you really take it apart.

Do you have more details?

I follow the advice in the timeit module, and only ever look at the
minimum value, and never try to calculate a mean or variance. Given the
number of outside influences ("What do you mean starting up a browser
with 200 tabs at the same time will affect the timing?"), I wouldn't
trust a mean or variance to be meaningful.

geremy condra · Apr 5, 2011

Do you have more details?

I follow the advice in the timeit module, and only ever look at the
minimum value, and never try to calculate a mean or variance. Given the
number of outside influences ("What do you mean starting up a browser
with 200 tabs at the same time will affect the timing?"), I wouldn't
trust a mean or variance to be meaningful.

I think it's safe to treat timeit as an opaque, medium-precision
benchmark with those caveats. If you need actual timing data though-
answering the question 'how much faster?' rather than 'which is
faster?' just taking actual timings seems to provide much, much better
answers. Part of that is because timeit adds the cost of the for loop
to every run- here's the actual code:

def inner(_it, _timer):
%(setup)s
_t0 = _timer()
for _i in _it:
%(stmt)s
_t1 = _timer()
return _t1 - _t0

(taken from Lib/timeit.py line 81)

where %(setup)s and %(stmt)s are what you passed in. Obviously, if the
magnitude of the change you're looking for is smaller than the
variance in the for loop's overhead this makes things a lot harder
than they need to be, and the whole proposition gets pretty dodgy for
measuring in the sub-millisecond range, which is where many timing
attacks are going to lay. It also has some problems at the opposite
end of the spectrum- timing large, long-running, or memory-intensive
chunks of code can be deceptive because timeit runs with the GC
disabled. This bit me a while back working on Graphine, actually, and
it confused the hell out of me at the time.

I'm also not so sure about the 'take the minimum' advice. There's a
reasonable amount of empirical evidence suggesting that timings taken
at the 30-40% mark are less noisy than those taken at either end of
the spectrum, especially if there's a network performance component.
YMMV, of course.

Geremy Condra

timeit module in IDLE	0	Mar 14, 2013
Python testing tools	8	Jul 19, 2013
ctype performance benchmark	2	Jul 17, 2009
Regression testing for pointers	86	Mar 9, 2012
performance problem with time.strptime()	1	Jul 2, 2009
Performance of int/long in Python 3	187	Mar 25, 2013
Need help using callables and setup in timeit.Timer	0	May 12, 2010
Unittest - testing for filenames and filesize	15	Aug 23, 2012

Testing for performance regressions

Steven D'Aprano

geremy condra

Steven D'Aprano

geremy condra

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads