A simple unit test framework

P

Phlip

Pete said:
As a practical matter, you can't test every path through the code. You get
killed by combinatorial explosions.

Yep. That's why it takes smarts to only work on the edge cases and the
important aspects. After the code exists.

Before the code exists, write simpler tests just to make it exist. Don't
expect them to do path coverage, and don't even advertise that they do
branch coverage. But the test-first cycle will force the code to be simple
and testable. The only thing better than "design for testing" is "design by
testing".
 
A

Alf P. Steinbach

* Phlip:
They use TDD up the wazoo.

It sure shows in their pitiful quality, long time to market, customer
disatisfaction, and snowed executives, huh?

;-)

Please translate to plain English.
 
G

Gianni Mariani

James said:
Then write it up, and publish it, because you have obviously
struck on something new, which no one else has been able to
measure.

I just did here...
... But you must mean something else by Monte Carlo
testing than has been meant in the past. Because the
probability of finding an error by just throwing random data at
a problem is pretty low for any code which has passed code
review.

For a large class of real world problems it works just fine.
The regions that can fail are often of the order of 2 or 3
machine instructions. In a block of several million.

That's a bit excessive for a unit test, but Ok - 3Ghz CPU at 3 issued
instructions per cycle on a 4 core processor, that's 36 billion ips. A
30 second test gives you a coverage factor of 1 on the interactions may
be tested (in a simplistic model) Do that test 10 times longer (5
minutes) and you have a high level of confidence that your code is
tested adequately for race conditions. In a practical sense it's far
less than this because natural cache coherency stalls each processor and
synchronization with the critical sections happens automatically thereby
increasing the probability that critical interactions are tested.

Really - several millions of instructions in your unit test loop. Keep
those unit tests smaller. Testability is a primary objective !

.... And in
some cases, the actual situation an only occur less often than
that: there is a threading error in the current implementation
of std::basic_string, in g++, but I've yet to see a test program
which will trigger it.

What's the error ?
Practically an infinity.

Really. Well write better code. Again, if you can't test it, by
definition it's bad code.
So perhaps, after a couple of centuries, you can say that your
code is reliable.

In practice it usually finds problems in milliseconds.
Certainly. Testing can only prove the existence of errors.
Never the absense. Well run shops don't count on testing,
because of this. (That doesn't mean that they don't test. Just
that they don't count on testing, alone, to ensure quality.)

Yah - this discussion is about unit testing. Or did I miss somthing.
 
B

Branimir Maksimovic

James Kanze wrote:

...




Common misconception.

1. Testability of code is a primary objective. (i.e. code that can't be
tested is unfit for purpose)

2. Any testing (MT or not) is about a level of confidence, not absoluteness.

I have discovered that MT test cases that push the limits of the code
using random input does provide sufficient coverage to produce a level
of confidence that makes the target "testable".
I have exactly opposite experience. Multi threading program failures
show up only with specially crafted input.
That is I have to look into code first and then make tests to
specifically target places in code .
But, such errors are usually most simplistic ones, those are common
threading errors, and can be usually catched by eye.
If you consider what happens when you have multiple processors
interacting randomly in a consistent system, you end up testing more
possibilities than can present themselves in a more systematic system.
However, with threading, it's not really systematic because external
events cause what would normally be systematic to be random. Now
consider what happens in a race condition failure. This normally
happens when two threads enter sections of code that should be mutually
exclusive. Usually there are a few thousand instructions in your test
loop (for a significant test). The regions that can fail are usually
10's of instructions, sometimes 100's. If you are able to push
randomness, how many times do you need to reschedule one thread to hit a
potential problem. Given cache latencies, pre-emption from other
threads, program randomness (like memory allocation variances) you can
achieve pretty close to full coverage of every possible race condition
in about 10 seconds of testing. There are some systematic start-up
effects that may not be found, but you mitigate that by running
automated testing. (In my shop, we run unit tests on the build machine
around the clock - all the time.)
In my opinion, in this way you can catch just typical threading
errors.
Those errors are discovered, and they usually show up pretty quickly
(if all cases are met).
But I'm talking about errors imposed by incorrect C++ programs,
or incorrect programs regarding threads,with undefined behavior
that appears to work. Say you have local C++ guru that is great with
language itself, but not so great in combination of C++ & threads .
Since C++ does not defines threads there are no books that one
can learn from, so he makes assumptions about how things should
be done, and he writes thread foundation code, on which company and
other
programmers rely on.
Code is tested pretty thoroughly on machines
(multi processor etc),
where code will be installed and company is pretty confident
that it works.
So that leaves us with the level of confidence point. You can't achieve
perfect testing all the time, but you can achieve high level of
confidence testing all of the time.

To continue my story, one they new hardware arrived and
suddenly companies programs stopped to work there ;)
Nobody knew what is the problem, and nobody suspects
on local guru code. But his code imposed ub all the
time and passed *all* test cases ;)
So company had to bind software to single CPU on new
machine in order to ship working sw ;)
It does require a true multi processor system to test adequately.

Not just mp system, but *all* possible mp systems, which is
of course impossible.

Greetings, Branimir.
 
B

Branimir Maksimovic

Works well for me. Again, it's clear you have never tried it.



Now that I'd like to see.

Speaking of threads:
Say Test is class that starts thread passing it's context to it.
Since this code is wrong as thread is obviously running while
object is being destructed make reliable test case for Test class
that shows that this code fails.
Test::~Test { join(); }
Class has private constructors,instantiation is through static
member
function, it has usable member variables.

Greetings, Branimir.
 
B

Branimir Maksimovic

nw said:
Hi,

I previously asked for suggestions on teaching testing in C++. Based
on some of the replies I received I decided that best way to proceed
would be to teach the students how they might write their own unit
test framework, and then in a lab session see if I can get them to
write their own. To give them an example I've created the following
UTF class (with a simple test program following). I would welcome and
suggestions on how anybody here feels this could be improved:

Thanks for your time!

Well in my opinion, it is better to teach them to test test classes
and know quirks of C++ ;)
class UnitTest {
.......
public:

UnitTest(std::string test_set_name_in) : tests_failed(0), ...........
void begin_test_set(std::string description, const char *filename) {
current_description = description;
current_file = filename;
int main(void) {
// create a rectangle at position 0,0 with sides of length 10
UnitTest ut("Test Shapes");
My test case for test would be:
UnitTest ut(NULL);
........
// Test Class Rectangle
ut.begin_test_set("Rectangle",__FILE__);
and for begin_test_st:
ut.begin_test_st(NULL,NULL);

What result of test do you expect ;) ?

Greetings, Branimir.
 
B

Branimir Maksimovic

Each line has coverage. That's still not the ultimate coverage, where tests
cover every path through the entire program. But the effect on code makes
such coverage less important. The tests force the code to be simpler,
because it must pass simple tests.

I think that you are missing major point here. Test driven development
(or how I imagine it) cannot work with C++ as this language has
undefined
behavior programs. There is no test to detect undefined behavior
by it's definition. Ub can only be reliably detected by eye.
TDD will work ok for languages that are well controlled and where
program errors are just errors in algorithms. For C++, only source
inspection, *with* tests works, IMO. Other thing is (I really don't
know anything about TDD) with lot's of test code out there, and
following this discussion, I have a slight feeling that test code
is assumed almost to be bug free? Testers could make bugs too,
who checks them?

Greetings, Branimir.
 
P

Phlip

Branimir said:
I think that you are missing major point here. Test driven development
(or how I imagine it) cannot work with C++ as this language has
undefined behavior programs.

Nobody said we write tests to catch every possible bug.

Next, much of C++ is simply exposed plumbing. Things like smart pointers. I
can think of a test-first for a pointer, and one to make it smart, but one
generally doesn't bother with such things. The point is to make tests that
reduce debugging and help designing.
I have a slight feeling that test code is assumed almost to be bug free?

Besides the pair programming, code review, continuous deployment, and
40-hour work weeks, TDD has a major advantage there. It's not perfect, but
it's still very powerful.

When you make a test fail, you carefully inspect that it is failing for the
right reason. The workflow here is to run the test and announce out loud to
your pair what you expect the test to do. It should "fail because the target
method doesn't exist", or "fail on that assertion because the method returns
the old value", or "pass", etc.

This tests the tests. If they fail for the wrong reason, you stop and
re-evaluate things.
 
J

James Kanze

What does short mean to you ?

It depends on the context, and the release target. Internal
releases are rarely more than a week apart. (Obviously, you
shouldn't have more than one, or at the most two, external
releases per year.)
 
J

James Kanze

[First, a meta-comment: I am, of course, exagerating to make
a point, and I don't really suspect Ian of trying to use any
technique to rip off his customers.]
So by helping them to get what they really wanted, rather than forcing
them to commit to what they thought the wanted, I'm ripping them off?

How does testing help the customer to get what he really wants?
Some prototyping is useful for this, although the code used for
it rarely ends up in the final product. And some things don't
even need prototyping: local law specifies how numerical values
are to be rounded in financial transactions, for example, and
neither the customer nor I have any room to manoeuvre concerning
"what we want".

Roughly speaking, the user interface needs prototyping; the rest
of the code needs specification. But I don't really see a role
for testing. Typical customers wouldn't understand the test
code anyway.
The person I'm ripping off is me, I'm doing my self out of all the bug
fixing and rework jobs.
Man you have a strange view of customer focused development.

Customer focused means talking to the customer in his language,
not in yours. A test suite doesn't do this. Getting a customer
to sign off a project on the basis of a test that he's not
capable of understanding is not, IMHO, showing him much respect.

To come back to an obvious case: for certain types of thread
safety problems, I don't know how to write a test which is
guaranteed to fail. If I could convince my customer to
use "passing the test" as the sole contractual requirement, then
I can ignore these issues. Personally, I would consider that
"ripping the customer off". I write my code to be thread safe,
even if I don't know how to test for it.
 
B

Branimir Maksimovic

Nobody said we write tests to catch every possible bug.

Next, much of C++ is simply exposed plumbing. Things like smart pointers. I
can think of a test-first for a pointer, and one to make it smart, but one
generally doesn't bother with such things. The point is to make tests that
reduce debugging and help designing.

That is the problem. Usually code that no one test, make those
problems.
For example I have yet to see thread safe reference counted object
destruction. I have to admit I don't know how to write one, therefore
addref/release when sharing object across threads is controlled
and black boxed in my code.
This post ignited me to inspect boost::shared_ptr implementation
for ref count destruction thread safety.
Code uses add_ref_lock to increase reference usage and
release to decrease. It has internal reference count that when
it reaches 0 results in call to "delete this".
Function add_ref_lock has safety check for use_count_ if 0,
but it makes assumption that between acquisition of mutex
and check for 0 object will always exist.
Function "release", decreases use_count_ , releases lock ,
if 0, calls dispose of object, then decreases reference count
of object itself by calling weak_release.
Function weak_release locks same mutex again and if 0,
does destruct, which is delete this.
Problem: From this code I can't see anything that stops
object being destructed in one thread, while other
thread is blocked in add_ref_lock waiting to acquire
on same mutex( which is part of object).
Only thing that I can hope for is that mutex is some
special kind that *assures* that other thread will
acquire mutex between two refcount releases,
but IMO without looking, I guess this is not so, as
code to use mutex looks pretty generic.
So if I am right (could be wrong though, just looked the code
and I am really tired), and your company for example,
uses this ptr in mt code, then there is bug hanging around
which is neither tested nor inspected?
I think that none of the test cases would hit this bug in foreseeable
future, so inspection by eye is most efficient in this case?

code follows (boost 1-32):

void add_ref_lock()
{
#if defined(BOOST_HAS_THREADS)
mutex_type::scoped_lock lock(mtx_);
#endif
if(use_count_ == 0)
boost::throw_exception(boost::bad_weak_ptr());
++use_count_;
}

void release() // nothrow
{
{
#if defined(BOOST_HAS_THREADS)
mutex_type::scoped_lock lock(mtx_);
#endif
long new_use_count = --use_count_;

if(new_use_count != 0) return;
}

dispose();
weak_release();
}

void weak_add_ref() // nothrow
{
#if defined(BOOST_HAS_THREADS)
mutex_type::scoped_lock lock(mtx_);
#endif
++weak_count_;
}

void weak_release() // nothrow
{
long new_weak_count;

{
#if defined(BOOST_HAS_THREADS)
mutex_type::scoped_lock lock(mtx_);
#endif
new_weak_count = --weak_count_;
}

if(new_weak_count == 0)
{
destruct();
}
}
When you make a test fail, you carefully inspect that it is failing for the
right reason. The workflow here is to run the test and announce out loud to
your pair what you expect the test to do. It should "fail because the target
method doesn't exist", or "fail on that assertion because the method returns
the old value", or "pass", etc.

This tests the tests. If they fail for the wrong reason, you stop and
re-evaluate things.

Problem with this is when code imposes undefined behavior, you cannot
say neither that test will pass or not for any reason. So before doing
any testing, code should be inspected by eye to check for possible ub,
then test.

Greetings, Branimir.
 
J

James Kanze

I just did here...

I certainly didn't see it. All I saw were some wild,
unjustified claims.
For a large class of real world problems it works just fine.

For what definition of "works".

I guess it depends on the level of quality you expect.

[...]
... And in
What's the error ?

Bug 21334. Found by code review; I've yet to be able to create
a test which systematically triggers it. (On the other hand, it
does cause problems in real code.)
Really. Well write better code. Again, if you can't test it, by
definition it's bad code.

So you exclude multithreading and floating point. Since neither
can really be tested.
In practice it usually finds problems in milliseconds.

So you write bad code to begin with. Or simply prefer to ignore
real problems that don't show up trivially.
 
J

James Kanze

I have exactly opposite experience. Multi threading program failures
show up only with specially crafted input.
That is I have to look into code first and then make tests to
specifically target places in code .
But, such errors are usually most simplistic ones, those are common
threading errors, and can be usually catched by eye.

Even when you know exactly where the error is, it's sometimes
impossible to write code which reliably triggers it. Consider
std::string, in g++: if you have an std::string object shared by
two threads, and one thread copies it, and the other does [] or
grabs an iterator at exactly the same time, you can end up with
a dangling pointer---an std::string object whose implementation
memory has been freed. The probability of doing so, however, is
very small, and even knowing exactly where the error is, I've
yet to be able to write a program which will reliably trigger
it.

Consider some variations on DCL. In at least one case, you only
get into trouble if one of the processors has read memory in the
same cache line as the pointer just before executing the
critical code. Which means that the function can work perfectly
in one application, and fail when you link it into another
application.

As I mentionned in a response to another poster, it may just
depend on what you consider acceptable quality. I've worked on
critical systems a lot in the past. With contractual penalties
for downtime. So I tend to set my standards high.
Interestingly enough, however, it turns out that developing code
to such high standards is actually cheaper than just churning it
out. As a rough, back of the envelope figure: correcting an
error found in code review costs one tenth of correcting the
same error found in unit tests, which costs one tenth of
correcting the same error found in integration tests, which
costs one tenth of correcting the same error found in the field.
Not to mention the advantages of the transfer of knowledge which
takes place in code review. The result is that any time an
integration test finds an error, it is considered first and
foremost an error in the process; we analyse the process to find
out where it failed. And there is even a tendancy to do this
for unit tests, although perhaps not as systematically.

[...]
Not just mp system, but *all* possible mp systems, which is
of course impossible.

It's perhaps worth pointing out that most current low-end MP
systems use a more or less synchronized memory model. This is
not true, however, for Alpha processors, nor the Itanium, nor, I
think, top of the line Sparcs, nor, probably future desktop
processors. That means that the fact that code works on a
current four processor system based on Intel 32 bits means
nothing with regards to future processors.
 
A

anon

Pete said:
One for each floating-point value. <g>

good joke ;)
discussion). The point is that TDD is iterative; as you understand the
implementation of log() better, you recognize it's bad spots, add test
cases that hit them, and then change the implementation in response to
those tests. But that's a more sophisticated statement than "The latest
trends are to write tests first which demonstrates the requirements,
then code (classes+methods). In this case you will not have to do a
coverage, but it is a plus", which started this subthread.

Might be. English is not my native language.
 
A

anon

James said:
Which, of course, is entirely backwards.

It is, but you get better code
That is, of course, one solution. It's theoretically possible
for log, but it will take several hundred centuries of CPU time,
which means that it's not very practical. In practice, the way
you verify that a log function is correct is with code review,
with testing of the border cases (which implies the what Pete is
calling white box testing), to back up the review.

If you were to write a log() function, would you test it against all floats?
In this case, I think that the behavior is specified before
hand. It is a mathematical function, after all, and we can know
the precise result for every possible input. In practice, of
course, it isn't at all possible to test it, at least not
exhaustively.


So your log function only has to produce correct results for the
limited set of values you use to test it? I hope I never have
to use a library you wrote.

If you take some random numbers (for example 0.0, 0.007, 0.29, 0.999,
1.0, 1.0001, 1.9900, 555.0, 999999.0) and your log function for these
numbers gives correct results (with small enough error) you can be sure
your log function is good
 
I

Ian Collins

James said:
It depends on the context, and the release target. Internal
releases are rarely more than a week apart. (Obviously, you
shouldn't have more than one, or at the most two, external
releases per year.)
That depends on the application. The last Web application I produced
has had monthly releases, at the client's request. They gather user
feedback and send me a list up updates each month. The application is
generally deployed a couple of days after I deliver it.
 
A

anon

James said:
The latest trend where? Certainly not in any company concerned
with good management, or quality software.

So, whats quality software for you? Maybe we should all write some crap
without testing it as all? I guess - assuming it works should be enough
And will not necessarily meet requirements, or even be useful.

If you write tests according to the requirements, you can be 100% sure
that statement is correct
 
J

James Kanze

I don't have the exact before figures,

Which is typical for most process improvements:). At least at
the beginning; one of the problems in the initial process (or
lack of it) is that it doesn't generate figures.
but there were dozens of bugs in
the system for the previous version of the product and they took a
significant amount of developer and test time. The lack of unit tests
made the code extremely hard to fix without introducing new bugs.
Comprehensive unit tests are the only way to break out of this cycle.

Comprehensive unit tests are important for many reasons. I'm
not arguing against them. I'm saying that they aren't
sufficient, if they are the only measure. For that matter, how
do you know the tests are comprehensive? The only means I know
is to review them at the same time you review the code.

Comprehensive unit tests are most important for maintenance.
New code should always be reviewed. But a review is not without
costs, and reviewing an entire module because someone has
changed just a couple of characters in just one line isn't
really cost effective. Comprehensive unit tests, on the other
hand, are very effective at catching slips of the finger in the
editor.

Another point---one thing that eXtreme Programming does get
right (but we were doing it before eXtreme Programming came
along)---is that you never correct an error detected in
integration or in the field without first writing a test which
detects it. (If possible, of course. Some errors just aren't
testable.) During everyday maintenance, we do use something
along the lines of what you suggest: don't fix the error until
you have a (unit) test which fails. But if you've got a good
process, everyday maintenance is well under 10% of your total
activity; errors in integration or in the field are exceptional
events.
We didn't bother tracking bugs for the replacement product, there were
so few of them and due to their minor nature, they could be fixed within
a day of being reported. We had about 6 in the first year.

And why weren't they being tracked? 6 is, IMHO, a lot, and can
certainly be reduced. Find out why the bug crept in there to
begin with, and modify the process to eliminate it. (I've
released code where the only bug, for the lifetime of the
product, was a spelling error in a log message. That happened
to be the product where there weren't any unit tests, but
admittedly, it was a very small project---not even 100 KLoc in
all.)
Sounds like you weren't pairing.

We tried that, but found that it's no where near as cost
effective as good code review. The code review brings in an
"outside" view---someone who's not implicated in the code. In
practice, it turns out that it's this external viewpoint which
really finds the most bugs. And of course, code review costs
less than pairing (although not by a large amount).

Ideally, you do both, but that can be overly expensive, and the
incremental benefits of pairing aren't worth the cost, once you
have established good code reviews.

If you don't have an effective review process, of course,
pairing has definite benefits.
So you practiced full on XP for a few months and measured the results?

One team used full XP for about six months. We stopped the
effort when we saw their error rate shooting up significantly.
(We were at about 1 error per 100 KLoc, going into integration,
at that time. The team using full XP ended up at about ten
times that.)
 
I

Ian Collins

James said:
[First, a meta-comment: I am, of course, exagerating to make
a point, and I don't really suspect Ian of trying to use any
technique to rip off his customers.]
So by helping them to get what they really wanted, rather than forcing
them to commit to what they thought the wanted, I'm ripping them off?

How does testing help the customer to get what he really wants?

By being an integral part of the process.
Roughly speaking, the user interface needs prototyping; the rest
of the code needs specification. But I don't really see a role
for testing.

Not every customer is in a position to provide specifications and those
that think they can often change their minds once the development has
started.
Customer focused means talking to the customer in his language,
not in yours. A test suite doesn't do this. Getting a customer
to sign off a project on the basis of a test that he's not
capable of understanding is not, IMHO, showing him much respect.
If you were to take the time to look into XP or Scrum, you would see
that you are to some extent preaching to the choir. That's why there
are (web driven) test suites like FIT or Selenium that are specifically
designed for the customer to understand and in some cases, use them
selves. Don't forget the customer acceptance tests are system
behavioral tests, not code unit tests (unplug module A and module A
missing trap is sent kind of tests). How the application behaves is
what the customer wants to see.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top