coverage.py: "Statement coverage is the weakest measure of code coverage"

Discussion in 'Python' started by Ben Finney, Oct 28, 2007.

  1. Ben Finney

    Ben Finney Guest

    Howdy all,

    Ned Batchelder has been maintaining the nice simple tool 'coverage.py'
    <URL:http://nedbatchelder.com/code/modules/coverage.html> for
    measuring unit test coverage.

    On the same site, Ned includes documentation
    <URL:http://nedbatchelder.com/code/modules/rees-coverage.html> by the
    previous author, Gareth Rees, who says in the "Limitations" section:

    Statement coverage is the weakest measure of code coverage. It
    can't tell you when an if statement is missing an else clause
    ("branch coverage"); when a condition is only tested in one
    direction ("condition coverage"); when a loop is always taken and
    never skipped ("loop coverage"); and so on. See [Kaner 2000-10-17]
    <URL:http://www.kaner.com/pnsqc.html> for a summary of test
    coverage measures.

    So, measuring "coverage of executed statements" reports complete
    coverage incorrectly for an inline branch like 'foo if bar else baz',
    or a 'while' statement, or a 'lambda' statement. The coverage is
    reported complete if these statements are executed at all, but no
    check is done for the 'else' clause, or the "no iterations" case, or
    the actual code inside the lambda expression.

    What approach could we take to improve 'coverage.py' such that it
    *can* instrument and report on all branches within the written code
    module, including those hidden inside multi-part statements?

    --
    \ "Technology is neither good nor bad; nor is it neutral." |
    `\ —Melvin Kranzberg's First Law of Technology |
    _o__) |
    Ben Finney
     
    Ben Finney, Oct 28, 2007
    #1
    1. Advertising

  2. Ben Finney

    John Roth Guest

    On Oct 28, 4:56 pm, Ben Finney <> wrote:
    > Howdy all,
    >
    > Ned Batchelder has been maintaining the nice simple tool 'coverage.py'
    > <URL:http://nedbatchelder.com/code/modules/coverage.html> for
    > measuring unit test coverage.
    >
    > On the same site, Ned includes documentation
    > <URL:http://nedbatchelder.com/code/modules/rees-coverage.html> by the
    > previous author, Gareth Rees, who says in the "Limitations" section:
    >
    > Statement coverage is the weakest measure of code coverage. It
    > can't tell you when an if statement is missing an else clause
    > ("branch coverage"); when a condition is only tested in one
    > direction ("condition coverage"); when a loop is always taken and
    > never skipped ("loop coverage"); and so on. See [Kaner 2000-10-17]
    > <URL:http://www.kaner.com/pnsqc.html> for a summary of test
    > coverage measures.
    >
    > So, measuring "coverage of executed statements" reports complete
    > coverage incorrectly for an inline branch like 'foo if bar else baz',
    > or a 'while' statement, or a 'lambda' statement. The coverage is
    > reported complete if these statements are executed at all, but no
    > check is done for the 'else' clause, or the "no iterations" case, or
    > the actual code inside the lambda expression.
    >
    > What approach could we take to improve 'coverage.py' such that it
    > *can* instrument and report on all branches within the written code
    > module, including those hidden inside multi-part statements?
    >
    > --
    > \ "Technology is neither good nor bad; nor is it neutral." |
    > `\ -Melvin Kranzberg's First Law of Technology |
    > _o__) |
    > Ben Finney


    Well, having used it for Python FIT, I've looked at some if its
    deficiencies. Not enough to do anything about it (although I did
    submit a patch to a different coverage tool), but enough to come to a
    few conclusions.

    There are two primary limitations: first, it runs off of the debug or
    trace hooks in the Python kernel, and second, it's got lots of little
    problems due to inconsistencies in the way the compiler tools generate
    parse trees.

    It's not like there are a huge number of ways to do coverage. At the
    low end you just count the number of times you hit a specific point,
    and then analyze that.

    At the high end, you write a trace to disk, and analyze that.

    Likewise, on the low end you take advantage of existing hooks, like
    Python's debug and trace hooks, on the high end you instrument the
    program yourself, either by rewriting it to put trace or count
    statements everywhere, or by modifying the bytecode to do the same
    thing.

    If I was going to do it, I'd start by recognizing that Python doesn't
    have hooks where I need them, and it doesn't have a byte code
    dedicated to a debugging hook (I think). In other words, the current
    coverage.py tool is getting the most out of the available hooks: the
    ones we really need just aren't there.

    I'd probably opt to rewrite the programs (automatically, of course) to
    add instrumentation statements. Then I could wallow in data to my
    heart's content.

    One last little snark: how many of us keep our statement coverage
    above 95%? Statement coverage may be the weakest form of coverage, but
    it's also the simplest to handle.

    John Roth
     
    John Roth, Oct 29, 2007
    #2
    1. Advertising

  3. Ben Finney

    Ben Finney Guest

    John Roth <> writes:

    > On Oct 28, 4:56 pm, Ben Finney <> wrote:
    > > What approach could we take to improve 'coverage.py' such that it
    > > *can* instrument and report on all branches within the written
    > > code module, including those hidden inside multi-part statements?

    >
    > If I was going to do it, I'd start by recognizing that Python
    > doesn't have hooks where I need them, and it doesn't have a byte
    > code dedicated to a debugging hook (I think).


    Is this something that Python could be improved by adding? Perhaps
    there's a PEP in this.

    > One last little snark: how many of us keep our statement coverage
    > above 95%? Statement coverage may be the weakest form of coverage,
    > but it's also the simplest to handle.


    Yes, I have several projects where statement coverage of unit tests is
    98% or above. The initial shock of running 'coverage.py' is in seeing
    just how low one's coverage actually is; but it helpfully points out
    the exact line numbers of the statements that were not tested.

    Once you're actually measuring coverage as part of the development
    process (e.g. set up a rule so 'make coverage' does it automatically),
    it's pretty easy to see the holes in coverage and either write the
    missing unit tests or (even better) refactor the code so the redundant
    statements aren't there at all.

    --
    \ "I'd like to see a nude opera, because when they hit those high |
    `\ notes, I bet you can really see it in those genitals." -- Jack |
    _o__) Handey |
    Ben Finney
     
    Ben Finney, Oct 29, 2007
    #3
  4. Ben Finney

    Kay Schluehr Guest

    On Oct 28, 11:56 pm, Ben Finney <> wrote:
    > Howdy all,
    >
    > Ned Batchelder has been maintaining the nice simple tool 'coverage.py'
    > <URL:http://nedbatchelder.com/code/modules/coverage.html> for
    > measuring unit test coverage.
    >
    > On the same site, Ned includes documentation
    > <URL:http://nedbatchelder.com/code/modules/rees-coverage.html> by the
    > previous author, Gareth Rees, who says in the "Limitations" section:
    >
    > Statement coverage is the weakest measure of code coverage. It
    > can't tell you when an if statement is missing an else clause
    > ("branch coverage"); when a condition is only tested in one
    > direction ("condition coverage"); when a loop is always taken and
    > never skipped ("loop coverage"); and so on. See [Kaner 2000-10-17]
    > <URL:http://www.kaner.com/pnsqc.html> for a summary of test
    > coverage measures.
    >
    > So, measuring "coverage of executed statements" reports complete
    > coverage incorrectly for an inline branch like 'foo if bar else baz',
    > or a 'while' statement, or a 'lambda' statement. The coverage is
    > reported complete if these statements are executed at all, but no
    > check is done for the 'else' clause, or the "no iterations" case, or
    > the actual code inside the lambda expression.
    >
    > What approach could we take to improve 'coverage.py' such that it
    > *can* instrument and report on all branches within the written code
    > module, including those hidden inside multi-part statements?


    I used to write once a coverage tool ( maybe I can factor this out of
    my tool suite some time ) which is possibly transformative. Currently
    it generates measurement code for statement coverage and i'm not sure
    it has more capabilities than coverage.py because I was primary
    interested in the code generation and monitoring process, so I didn't
    compare.

    Given it's nature it might act transformative. So a statement:

    if a and b:
    BLOCK

    can be transformed into

    if a:
    if b:
    BLOCK

    Also

    if a or b:
    BLOCK

    might be transformed into

    if a:
    BLOCK
    elif b:
    BLOCK

    So boolean predicates are turned into statements and statement
    coverage keeps up. This is also close to the way bytecode works
    expressing "and" | "or" predicates using jumps. I'm not sure about
    expressions yet, since I did not care about expression execution but
    traces.

    The underlying monitoring technology needs to be advanced. I used a
    similar approach for an even more interesting purpose of feeding
    runtime type information back into a cloned parse tree of the initial
    tree which might be unparsed to type annotated source code after
    program execution. But that's another issue.

    The basic idea of all those monitorings is as follows: implement an
    identity function with a side effect. I'm not sure how this monitoring
    code conflicts with rather deep reflection ( stacktrace inspection
    etc. )

    Kay
     
    Kay Schluehr, Oct 29, 2007
    #4
  5. Ben Finney

    Ben Finney Guest

    Kay Schluehr <> writes:

    > I used to write once a coverage tool ( maybe I can factor this out
    > of my tool suite some time )


    That'd be wonderful. I'd like to see comparisons between different
    test-coverage tools, just as we have the different but comparable
    'pyflakes' and 'pylint' code inspection tools.

    > Given it's nature it might act transformative. So a statement:
    >
    > if a and b:
    > BLOCK
    >
    > can be transformed into
    >
    > if a:
    > if b:
    > BLOCK


    I don't see that this actually helps in the cases described in the
    original post. The lack of coverage checking isn't "are both sides of
    an 'and' or 'or' expression evaluated", since that's the job of the
    language runtime, and is outside the scope of our unit test.

    what needs to be tested is "do the tests execute both the 'true' and
    'false' branches of this 'if' statement", or "do the tests exercise
    the 'no iterations' case for this loop", et cetera. That is, whether
    all the functional branches are exercised by tests, not whether the
    language is parsed correctly.

    --
    \ "Know what I hate most? Rhetorical questions." -- Henry N. Camp |
    `\ |
    _o__) |
    Ben Finney
     
    Ben Finney, Oct 29, 2007
    #5
  6. Ben Finney

    Kay Schluehr Guest

    On Oct 29, 4:15 am, Ben Finney <>
    wrote:
    > Kay Schluehr <> writes:
    > > I used to write once a coverage tool ( maybe I can factor this out
    > > of my tool suite some time )

    >
    > That'd be wonderful. I'd like to see comparisons between different
    > test-coverage tools, just as we have the different but comparable
    > 'pyflakes' and 'pylint' code inspection tools.
    >
    > > Given it's nature it might act transformative. So a statement:

    >
    > > if a and b:
    > > BLOCK

    >
    > > can be transformed into

    >
    > > if a:
    > > if b:
    > > BLOCK

    >
    > I don't see that this actually helps in the cases described in the
    > original post. The lack of coverage checking isn't "are both sides of
    > an 'and' or 'or' expression evaluated", since that's the job of the
    > language runtime, and is outside the scope of our unit test.
    >
    > what needs to be tested is "do the tests execute both the 'true' and
    > 'false' branches of this 'if' statement", or "do the tests exercise
    > the 'no iterations' case for this loop", et cetera. That is, whether
    > all the functional branches are exercised by tests, not whether the
    > language is parsed correctly.


    You are right. I re-read my coverage tool documentation and found also
    the correct expansion for the statement

    if a and b:
    BLOCK

    which is:

    if a:
    if b:
    BLOCK
    else:
    BLOCK
    else:
    BLOCK

    This will cover all relevant traces. The general idea still holds.

    Note I would like to see some kind of requirement specification ( a
    PEP style document ) of different coverage purposes and also a test
    harness. I'm all for advancing Python and improve the code base not
    just accidentally. Something in the way of an MVC framework was nice
    in addition which implements UI functions independently s.t. the basic
    coverage functionality can be factored out into components and
    improved separately. I do not think it's a good idea to have 10
    coverage tools that handle presentation differently.
     
    Kay Schluehr, Oct 29, 2007
    #6
  7. Ben Finney

    John Roth Guest

    On Oct 28, 9:15 pm, Ben Finney <>
    wrote:
    > Kay Schluehr <> writes:
    > > I used to write once a coverage tool ( maybe I can factor this out
    > > of my tool suite some time )

    >
    > That'd be wonderful. I'd like to see comparisons between different
    > test-coverage tools, just as we have the different but comparable
    > 'pyflakes' and 'pylint' code inspection tools.
    >
    > > Given it's nature it might act transformative. So a statement:

    >
    > > if a and b:
    > > BLOCK

    >
    > > can be transformed into

    >
    > > if a:
    > > if b:
    > > BLOCK

    >
    > I don't see that this actually helps in the cases described in the
    > original post. The lack of coverage checking isn't "are both sides of
    > an 'and' or 'or' expression evaluated", since that's the job of the
    > language runtime, and is outside the scope of our unit test.
    >
    > what needs to be tested is "do the tests execute both the 'true' and
    > 'false' branches of this 'if' statement", or "do the tests exercise
    > the 'no iterations' case for this loop", et cetera. That is, whether
    > all the functional branches are exercised by tests, not whether the
    > language is parsed correctly.


    Since 'and' and 'or' are short-circuit evaluations, you do need
    something to determine if each piece was actually executed. Turning it
    into an if-else construct would do this nicely.

    John Roth

    >
    > --
    > \ "Know what I hate most? Rhetorical questions." -- Henry N. Camp |
    > `\ |
    > _o__) |
    > Ben Finney
     
    John Roth, Oct 29, 2007
    #7
  8. I don't know how to extend coverage.py to do more extensive checking,
    but I know it would be both difficult and fascinating. To help spur
    some thought, I've sketched out some problems with statement coverage:
    http://nedbatchelder.com/blog/20071030T084100.html

    --Ned.

    On Oct 28, 6:56 pm, Ben Finney <> wrote:
    > Howdy all,
    >
    > Ned Batchelder has been maintaining the nice simple tool 'coverage.py'
    > <URL:http://nedbatchelder.com/code/modules/coverage.html> for
    > measuring unit test coverage.
    >
    > On the same site, Ned includes documentation
    > <URL:http://nedbatchelder.com/code/modules/rees-coverage.html> by the
    > previous author, Gareth Rees, who says in the "Limitations" section:
    >
    > Statement coverage is the weakest measure of code coverage. It
    > can't tell you when an if statement is missing an else clause
    > ("branch coverage"); when a condition is only tested in one
    > direction ("condition coverage"); when a loop is always taken and
    > never skipped ("loop coverage"); and so on. See [Kaner 2000-10-17]
    > <URL:http://www.kaner.com/pnsqc.html> for a summary of test
    > coverage measures.
    >
    > So, measuring "coverage of executed statements" reports complete
    > coverage incorrectly for an inline branch like 'foo if bar else baz',
    > or a 'while' statement, or a 'lambda' statement. The coverage is
    > reported complete if these statements are executed at all, but no
    > check is done for the 'else' clause, or the "no iterations" case, or
    > the actual code inside the lambda expression.
    >
    > What approach could we take to improve 'coverage.py' such that it
    > *can* instrument and report on all branches within the written code
    > module, including those hidden inside multi-part statements?
    >
    > --
    > \ "Technology is neither good nor bad; nor is it neutral." |
    > `\ -Melvin Kranzberg's First Law of Technology |
    > _o__) |
    > Ben Finney
     
    Ned Batchelder, Oct 30, 2007
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Samuel

    Cross-product coverage

    Samuel, Apr 3, 2004, in forum: VHDL
    Replies:
    2
    Views:
    700
    Srinivasan Venkataramanan
    Apr 5, 2004
  2. Tony Smith
    Replies:
    0
    Views:
    828
    Tony Smith
    Apr 28, 2004
  3. Eric DELAGE
    Replies:
    1
    Views:
    875
    Jonathan Bromley
    Apr 5, 2005
  4. Raj
    Replies:
    4
    Views:
    7,910
    asicvlsi
    Feb 21, 2008
  5. Replies:
    0
    Views:
    417
Loading...

Share This Page