unit testing guidelines

Jacob · Mar 28, 2006

Hendrik said:
This discussion about whether or not to use random inputs in tests makes
me curious: is it that important at all?

Not at all.

It was included as an issue in the guidelines of the original
post, but it has been taken out of context in a way that seem
to leave many with the impression that I think random testing
is *the* way to perform unit testing.

It is explicitly (and I will consider emphasize this) suggested
as an add-on to the conventional "typical" cases and "border"
cases to improve test coverage further.

As I have done unit testing for many years, and this simple
practice actually has helped me discover many errors, it was
included in the guidelines.

(And the discussion has been quite interesting.

Roedy Green · Mar 28, 2006

Is this such unusual? Is so much code working on ints and doubles that
it is possible to use random inputs?

You can generate random strings.

see http://mindprod.com/jgloss/pseudorandom.html#STRINGS

Hendrik Maryns · Mar 29, 2006

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
NotDashEscaped: You need GnuPG to verify this message

(e-mail address removed) schreef:

Unless of course you pass in an invalid string; too long, too short,
not unique, etc, and your setter silently fixes/fails, then because of
that your getter fails, and you get a false failure on your assertion.

Then you should have preconditions or postconditions for you setter
method which take care of that, and integrate them in the test.

H.
--
Hendrik Maryns

==================
www.lieverleven.be
http://aouw.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iD8DBQFEKl7Ie+7xMGD3itQRAoZwAJ0edqH80LGATcrH52oWi29CvvvJbwCfVNcA
OMsG49neRG7obAIGnMsqYCU=
=ZhaY
-----END PGP SIGNATURE-----

Scott.R.Lemke · Mar 29, 2006

Hendrik said:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
NotDashEscaped: You need GnuPG to verify this message

(e-mail address removed) schreef:

Then you should have preconditions or postconditions for you setter
method which take care of that, and integrate them in the test.

And what if every one of your random choices fails those conditions,
and the test is never run?

The point I was trying to make is that this type of random testing is
actually a form of another type of test, often referred to as monkey
testing, and by dropping the label of "unit" or "monkey", and instead
stating the purpose and context you eliminate this whole argument.

Adam Maass · Mar 30, 2006

Jacob said:
Adam said:

Story time! Consider your reaction to a failing test case.

"Gee, that's odd. The tests passed last time..."

"What's different this time?"

"Well, I just modified the file FooBar.java. The failure must have
something to do with the change I just made there."

"But the test case that is failing is called 'testBamBazzAdd1'. How could
a change to FooBar.java cause that case to fail?"

[Many hours later...]

"There is no possible way that FooBar.java has anything to do with the
failing test case."

"Ohhhh.... you know, we saw a novel input in the test case
testBamBazzAdd1. I wonder how that happened?"

"Well, let's fix the code to account for the novel input..."

[Make some changes, but do not add a new test case. The change doesn't
actually fix the error.]

"Well, that's a relief... the test suite now runs to completion without
error."

Click to expand...

Given there is an error in the baseline I'd rather have a team
of developers tracing it for hours than having a test suite that
tells me that everything is OK.

One has to wonder about the failure in this scenario -- it is a novel input
generated by a randomness generator. If the failure were critical to the
operation of the system, (one hopes that) it would have been noted, and
probably fixed, in other, earlier test cycles. (Perhaps not a unit test...
maybe a system test run by a QA.) Since this is a new failure that has not
been fixed in earlier cycles, the behavior of the system on these novel
inputs must not be that critical. If this is the case, I'd rather have my
developers finish the work they were doing on FooBar.java than trace the
failure in testBamBazzAdd1. (Of course, in a Utopian world, they would have
the time to do both.)

Ultimately, I'd like developers to be able to use a heuristic to determine
where to look for errors when a unit-test fails. That heuristic is "The
error is almost certainly caused by some delta in the code since the last
time you ran the test suite." (Note that controlling the size of the deltas
is an issue, which is why we get recommendations to make the test suite easy
and fast to run -- so that developers aren't afraid to run the suite very
frequently.)

If the unit-test suite also contains some randomly generated inputs, then
there are two heuristics that the developers must apply to determine where
the failure is:

1. "The error could be caused by a delta in the code since the last time you
ran the test suite"; or
2. "The error could be caused by an input value the test suite has generated
that we've never seen before."

Deciding which of these cases applies complicates the task of the developer
when faced with a failure.

-- Adam Maass

davidrubin · Mar 30, 2006

Jacob said:
You *do* know the input!

Consider testing this method:

double square(double v)
{
return v * v;
}

Below is a typical unit test that verifies that the
method behaves correctly on typical input:

double v = 2.0;
double v2 = square(v); // You know the input: It is 2.0!
assertEquals(v2, 4.0);

This is fine.

The same test using random input:

double v = getRandomDouble();
double v2 = square(v); // You know the input: It is v!
assertEquals(v2, v*v);

This is completely broken. You can't test an implementation of 'square'
with an identical implementation. You need a separate representation
for your expected result. Otherwise, you are not testing anything.

If the test fails, all the details will be in the error
report.

And this method actually *do* fail for a mjority of all
possible inputs (abs of v exceeding sqrt(maxDouble)).
This will be revealed instantly using the random approach.

This may not ever be revealed using random inputs, but in the case of
'square' this is a moot point. The contract of 'square' must stipulate
that the input (v) is invalid unless
'v * v < "max double"'. Since such inputs are invalid by the contract,
there is no point in testing them.

For an experienced programmer the limitation of square()
might be obvious so border cases are probably covered
sufficiently in both the code and the test. But for more
complex logic this might not be this apparent and throwing
in random input (in ADDITION to the typical cases and all
obvious border cases) has proven quite helpful, at least
to me.

This is also wrong. The boundaries of the input is stated in the
function's contract. It is not something determined by the user's level
of experience. Your test cases must cover the boundary conditions
stipulated by the function's documented contract *as* *well* *as*
boundary conditions based on white-box knowledge of the function's
implementation. If you cover these cases, plus a small assortment of
well-chosen "sanity" values, you don't need to waste time with large
amounts of random data.

If you can't test your function in this way, it is probably not
factored correctly.

Jacob · Mar 30, 2006

This is completely broken. You can't test an implementation of 'square'
with an identical implementation. You need a separate representation
for your expected result. Otherwise, you are not testing anything.

I've already answered this in a different posting: The unit test
reflects the requirements. The requirements for square() is to
return the square of the input: v*v. From a black-box perspecitive
I don't know the implementation of square(). It can be anything.

This is also wrong. The boundaries of the input is stated in the
function's contract. It is not something determined by the user's level
of experience. Your test cases must cover the boundary conditions
stipulated by the function's documented contract *as* *well* *as*
boundary conditions based on white-box knowledge of the function's
implementation. If you cover these cases, plus a small assortment of
well-chosen "sanity" values, you don't need to waste time with large
amounts of random data.

This is all correct given you are able to identify the boundary
cases up front. In some cases you are, but for more complex ones
you easily forget some in the same way you forget to handle these
cases in the original code (that's why there are bugs afterall).

Imagine implementing a tree container. In order to test correct
removal of nodes, some of the boundary cases might be:

remove root
remove intermediate node
remove leaf node
remove root when this is the only node
remove root with exactly one leaf
remove root with exactly one intermediate node
remove intermediate node with one child
remove intermediate node with many children
remove leaf node without siblings
remove leaf node with siblings
remove intermediate node with root parent
remove intermediate node with only leaf nodes
remove intermediate node with leaf nodes and other intermediate nodes
remove intermediate node with only other intermediate node children
remove non-existing node
remove null
remove node with unique name
remove node with non-unique name
etc.

The above might or might not be boundary cases, that actually depends
on the implementation: A good implementation has few! From experience
you "know" which cases are more likely to contains bugs, even
without knowing the implementation.

I don't say you shouldn't cover the boundary cases explicitly,
of course you should (see #13 in the guidelines).

But when that is in place I whould have built a tree on random, containing
a random number of nodes (0 - 1.000.000 perhaps), and then picked nodes on
random and performed a random (add, remove, movde, copy, whatever) operation
on those, a random number of times (0 - 10.000 perhaps) and verified that the
operation behave as expected and that the tree is always in a consistent state
afterwards. This whould leave me with the confidence that if there are
cases I've forgotten (or that appears during code refactoring) they might
be trapped by this additional test.

Jacob · Mar 30, 2006

Adam said:
1. "The error could be caused by a delta in the code since the last time you
ran the test suite"; or
2. "The error could be caused by an input value the test suite has generated
that we've never seen before."

Deciding which of these cases applies complicates the task of the developer
when faced with a failure.

If I add a test to your test suite that is able to reveal a flaw in your code,
you still don't want it because when it fails your developers will be confused
about what happened?

I am not sure I get it? You should all be happy you identified an error shouldn't
you? The unit test failing should be pretty clear on what went wrong anyway.

davidrubin · Mar 30, 2006

Jacob said:
I've already answered this in a different posting: The unit test
reflects the requirements. The requirements for square() is to
return the square of the input: v*v. From a black-box perspecitive
I don't know the implementation of square(). It can be anything.

This is why black-box tests are not entirely sufficient. You must
(especially for unit tests) use some white-box knowledge to test the
boundary conditions of both the contract and the implementation.

[snip - tree stuff]

But when that is in place I whould have built a tree on random, containing
a random number of nodes (0 - 1.000.000 perhaps), and then picked nodes on
random and performed a random (add, remove, movde, copy, whatever) operation
on those, a random number of times (0 - 10.000 perhaps) and verified that the
operation behave as expected and that the tree is always in a consistent state
afterwards. This whould leave me with the confidence that if there are
cases I've forgotten (or that appears during code refactoring) they might
be trapped by this additional test.

I went to Brian Kernighan's site at Princeton a while back. One of his
assignments was to implement associative arrays similar to those in
awk. Then, he provided a script generator that produces random output
(add, remove, lookup, etc). You are supposed to run this script against
both awk and your own implementation, and compare the results. So, I
think you would probably appreciate this.

Also, John Lakos' new book is due to be published later this year. In
it, he promises to address the issue of component-level testing in
great detail, including a section on random testing, which I think you
will find very interesting.

Adam Maass · Mar 30, 2006

Jacob said:
If I add a test to your test suite that is able to reveal a flaw in your
code,
you still don't want it because when it fails your developers will be
confused
about what happened?

Let me clarify. I don't want it in the /unit/ test suite if it relies on
generation of random inputs, due to this confusion issue. If however, the
inputs are hard-coded, then the confusion issue does not apply, and I'd be
perfectly happy to have it in the unit test suite.

If there's a level of testing during which we generate random inputs to
improve the quality of the code, then that is where it belongs. If there
isn't this kind of testing already in the project, perhaps we ought to
start. It just doesn't belong in the /unit/ test suite.

I am not sure I get it? You should all be happy you identified an error
shouldn't
you? The unit test failing should be pretty clear on what went wrong
anyway.

Finding and fixing failures is, in general, a good thing, however it
happens. But a /unit/ test suite should give developers a really good idea
of where any failure originates from, and having to decide whether a failure
is due to a delta in the code under test or a novel input just overly
complicates a /unit/ test suite. The confusion issue is especially of
concern if a failure on one run of the suite simply disappears on the next
run because it didn't generate a set of inputs that causes the code to fail.
[If I saw a unit test suite with this behavior, I wouldn't have much
confidence in the value of passing all the tests -- because the next run
could just as easily produce a failure as a pass.]

Note too that there are some failures that are acceptable to tolerate, even
in shipping product. (Perhaps: It's an obscure corner case that no-one ever
actually encounters in production. It's in some subsystem that hardly anyone
uses. Or a variety of other justifications...) The critical cases should be
covered by hard-coded inputs. That leaves the non-critical cases -- and if
something non-critical fails, then it should be fixed but perhaps there are
more important things to do before it gets fixed.

-- Adam Maass

Survey on unit testing during software maintenance	0	Oct 31, 2013
C unit testing	11	Oct 13, 2010
Python battle game help	2	Feb 23, 2023
Unit testing - Breaking bad habits	32	Nov 3, 2008
My Introduction!	1	Jan 4, 2022
Advice for programming career	3	Jul 16, 2023
automated unit test generation	1	Sep 28, 2013
Looking For Advice	1	Dec 10, 2022

unit testing guidelines

Jacob

Roedy Green

Hendrik Maryns

Scott.R.Lemke

Adam Maass

davidrubin

Jacob

Jacob

davidrubin

Adam Maass

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads