coverage.py: "Statement coverage is the weakest measure of code coverage"

B

Ben Finney

Howdy all,

Ned Batchelder has been maintaining the nice simple tool 'coverage.py'
<URL:http://nedbatchelder.com/code/modules/coverage.html> for
measuring unit test coverage.

On the same site, Ned includes documentation
<URL:http://nedbatchelder.com/code/modules/rees-coverage.html> by the
previous author, Gareth Rees, who says in the "Limitations" section:

Statement coverage is the weakest measure of code coverage. It
can't tell you when an if statement is missing an else clause
("branch coverage"); when a condition is only tested in one
direction ("condition coverage"); when a loop is always taken and
never skipped ("loop coverage"); and so on. See [Kaner 2000-10-17]
<URL:http://www.kaner.com/pnsqc.html> for a summary of test
coverage measures.

So, measuring "coverage of executed statements" reports complete
coverage incorrectly for an inline branch like 'foo if bar else baz',
or a 'while' statement, or a 'lambda' statement. The coverage is
reported complete if these statements are executed at all, but no
check is done for the 'else' clause, or the "no iterations" case, or
the actual code inside the lambda expression.

What approach could we take to improve 'coverage.py' such that it
*can* instrument and report on all branches within the written code
module, including those hidden inside multi-part statements?
 
J

John Roth

Howdy all,

Ned Batchelder has been maintaining the nice simple tool 'coverage.py'
<URL:http://nedbatchelder.com/code/modules/coverage.html> for
measuring unit test coverage.

On the same site, Ned includes documentation
<URL:http://nedbatchelder.com/code/modules/rees-coverage.html> by the
previous author, Gareth Rees, who says in the "Limitations" section:

Statement coverage is the weakest measure of code coverage. It
can't tell you when an if statement is missing an else clause
("branch coverage"); when a condition is only tested in one
direction ("condition coverage"); when a loop is always taken and
never skipped ("loop coverage"); and so on. See [Kaner 2000-10-17]
<URL:http://www.kaner.com/pnsqc.html> for a summary of test
coverage measures.

So, measuring "coverage of executed statements" reports complete
coverage incorrectly for an inline branch like 'foo if bar else baz',
or a 'while' statement, or a 'lambda' statement. The coverage is
reported complete if these statements are executed at all, but no
check is done for the 'else' clause, or the "no iterations" case, or
the actual code inside the lambda expression.

What approach could we take to improve 'coverage.py' such that it
*can* instrument and report on all branches within the written code
module, including those hidden inside multi-part statements?

--
\ "Technology is neither good nor bad; nor is it neutral." |
`\ -Melvin Kranzberg's First Law of Technology |
_o__) |
Ben Finney

Well, having used it for Python FIT, I've looked at some if its
deficiencies. Not enough to do anything about it (although I did
submit a patch to a different coverage tool), but enough to come to a
few conclusions.

There are two primary limitations: first, it runs off of the debug or
trace hooks in the Python kernel, and second, it's got lots of little
problems due to inconsistencies in the way the compiler tools generate
parse trees.

It's not like there are a huge number of ways to do coverage. At the
low end you just count the number of times you hit a specific point,
and then analyze that.

At the high end, you write a trace to disk, and analyze that.

Likewise, on the low end you take advantage of existing hooks, like
Python's debug and trace hooks, on the high end you instrument the
program yourself, either by rewriting it to put trace or count
statements everywhere, or by modifying the bytecode to do the same
thing.

If I was going to do it, I'd start by recognizing that Python doesn't
have hooks where I need them, and it doesn't have a byte code
dedicated to a debugging hook (I think). In other words, the current
coverage.py tool is getting the most out of the available hooks: the
ones we really need just aren't there.

I'd probably opt to rewrite the programs (automatically, of course) to
add instrumentation statements. Then I could wallow in data to my
heart's content.

One last little snark: how many of us keep our statement coverage
above 95%? Statement coverage may be the weakest form of coverage, but
it's also the simplest to handle.

John Roth
 
B

Ben Finney

John Roth said:
If I was going to do it, I'd start by recognizing that Python
doesn't have hooks where I need them, and it doesn't have a byte
code dedicated to a debugging hook (I think).

Is this something that Python could be improved by adding? Perhaps
there's a PEP in this.
One last little snark: how many of us keep our statement coverage
above 95%? Statement coverage may be the weakest form of coverage,
but it's also the simplest to handle.

Yes, I have several projects where statement coverage of unit tests is
98% or above. The initial shock of running 'coverage.py' is in seeing
just how low one's coverage actually is; but it helpfully points out
the exact line numbers of the statements that were not tested.

Once you're actually measuring coverage as part of the development
process (e.g. set up a rule so 'make coverage' does it automatically),
it's pretty easy to see the holes in coverage and either write the
missing unit tests or (even better) refactor the code so the redundant
statements aren't there at all.
 
K

Kay Schluehr

Howdy all,

Ned Batchelder has been maintaining the nice simple tool 'coverage.py'
<URL:http://nedbatchelder.com/code/modules/coverage.html> for
measuring unit test coverage.

On the same site, Ned includes documentation
<URL:http://nedbatchelder.com/code/modules/rees-coverage.html> by the
previous author, Gareth Rees, who says in the "Limitations" section:

Statement coverage is the weakest measure of code coverage. It
can't tell you when an if statement is missing an else clause
("branch coverage"); when a condition is only tested in one
direction ("condition coverage"); when a loop is always taken and
never skipped ("loop coverage"); and so on. See [Kaner 2000-10-17]
<URL:http://www.kaner.com/pnsqc.html> for a summary of test
coverage measures.

So, measuring "coverage of executed statements" reports complete
coverage incorrectly for an inline branch like 'foo if bar else baz',
or a 'while' statement, or a 'lambda' statement. The coverage is
reported complete if these statements are executed at all, but no
check is done for the 'else' clause, or the "no iterations" case, or
the actual code inside the lambda expression.

What approach could we take to improve 'coverage.py' such that it
*can* instrument and report on all branches within the written code
module, including those hidden inside multi-part statements?

I used to write once a coverage tool ( maybe I can factor this out of
my tool suite some time ) which is possibly transformative. Currently
it generates measurement code for statement coverage and i'm not sure
it has more capabilities than coverage.py because I was primary
interested in the code generation and monitoring process, so I didn't
compare.

Given it's nature it might act transformative. So a statement:

if a and b:
BLOCK

can be transformed into

if a:
if b:
BLOCK

Also

if a or b:
BLOCK

might be transformed into

if a:
BLOCK
elif b:
BLOCK

So boolean predicates are turned into statements and statement
coverage keeps up. This is also close to the way bytecode works
expressing "and" | "or" predicates using jumps. I'm not sure about
expressions yet, since I did not care about expression execution but
traces.

The underlying monitoring technology needs to be advanced. I used a
similar approach for an even more interesting purpose of feeding
runtime type information back into a cloned parse tree of the initial
tree which might be unparsed to type annotated source code after
program execution. But that's another issue.

The basic idea of all those monitorings is as follows: implement an
identity function with a side effect. I'm not sure how this monitoring
code conflicts with rather deep reflection ( stacktrace inspection
etc. )

Kay
 
B

Ben Finney

Kay Schluehr said:
I used to write once a coverage tool ( maybe I can factor this out
of my tool suite some time )

That'd be wonderful. I'd like to see comparisons between different
test-coverage tools, just as we have the different but comparable
'pyflakes' and 'pylint' code inspection tools.
Given it's nature it might act transformative. So a statement:

if a and b:
BLOCK

can be transformed into

if a:
if b:
BLOCK

I don't see that this actually helps in the cases described in the
original post. The lack of coverage checking isn't "are both sides of
an 'and' or 'or' expression evaluated", since that's the job of the
language runtime, and is outside the scope of our unit test.

what needs to be tested is "do the tests execute both the 'true' and
'false' branches of this 'if' statement", or "do the tests exercise
the 'no iterations' case for this loop", et cetera. That is, whether
all the functional branches are exercised by tests, not whether the
language is parsed correctly.
 
K

Kay Schluehr

That'd be wonderful. I'd like to see comparisons between different
test-coverage tools, just as we have the different but comparable
'pyflakes' and 'pylint' code inspection tools.





I don't see that this actually helps in the cases described in the
original post. The lack of coverage checking isn't "are both sides of
an 'and' or 'or' expression evaluated", since that's the job of the
language runtime, and is outside the scope of our unit test.

what needs to be tested is "do the tests execute both the 'true' and
'false' branches of this 'if' statement", or "do the tests exercise
the 'no iterations' case for this loop", et cetera. That is, whether
all the functional branches are exercised by tests, not whether the
language is parsed correctly.

You are right. I re-read my coverage tool documentation and found also
the correct expansion for the statement

if a and b:
BLOCK

which is:

if a:
if b:
BLOCK
else:
BLOCK
else:
BLOCK

This will cover all relevant traces. The general idea still holds.

Note I would like to see some kind of requirement specification ( a
PEP style document ) of different coverage purposes and also a test
harness. I'm all for advancing Python and improve the code base not
just accidentally. Something in the way of an MVC framework was nice
in addition which implements UI functions independently s.t. the basic
coverage functionality can be factored out into components and
improved separately. I do not think it's a good idea to have 10
coverage tools that handle presentation differently.
 
J

John Roth

That'd be wonderful. I'd like to see comparisons between different
test-coverage tools, just as we have the different but comparable
'pyflakes' and 'pylint' code inspection tools.





I don't see that this actually helps in the cases described in the
original post. The lack of coverage checking isn't "are both sides of
an 'and' or 'or' expression evaluated", since that's the job of the
language runtime, and is outside the scope of our unit test.

what needs to be tested is "do the tests execute both the 'true' and
'false' branches of this 'if' statement", or "do the tests exercise
the 'no iterations' case for this loop", et cetera. That is, whether
all the functional branches are exercised by tests, not whether the
language is parsed correctly.

Since 'and' and 'or' are short-circuit evaluations, you do need
something to determine if each piece was actually executed. Turning it
into an if-else construct would do this nicely.

John Roth
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top