C unit testing and regression testing

J

James Harris

What do you guys use for testing C code? From web searches there seem to be
common ways for some languages but for C there doesn't seem to be any
particular consensus.

I am looking for an approach in preference to a package or a framework.
Simple and adequate is better than comprehensive and complex.

If I have to use a package (which is not cross-platform) this is to run on
Linux. But as I say an approach would be best - something I can write myself
without too much hassle. At the moment I am using ad-hoc code but there has
to be a better way.

My main focus is on how best to do unit testing and regression testing so
any comments on those would be appreciated.

BTW, I have considered more esoteric approaches like behavioural testing.
That approach looked great - though was mainly focussed on acceptance
tests - but I couldn't see how behavioral checks would adequately manage
changes of state in the system under test. Most of my code changes state as
it runs. Nor did behavioral verification seem good for testing range limits.
However, it looked like it could be made very useful - possibly with a
variation in approach. I mention it purely in case anyone else has had
similar experiences and wanted to discuss.

James
 
J

James Kuyper

What do you guys use for testing C code? From web searches there seem to be
common ways for some languages but for C there doesn't seem to be any
particular consensus.

I'm surprised to hear that there's a consensus in any language. I'd
expect any "one size fits all" approach to fail drastically. I realize
that this is not what you were looking to discuss, but if you would,
could you at least summarize what are those "common ways for some
languages" are?
I am looking for an approach in preference to a package or a framework.
Simple and adequate is better than comprehensive and complex.

For me, "simple" and "adequate" would be inherently conflicting goals,
while "comprehensive" would be implied by "adequate" - but it all
depends upon what you mean by those terms.
 
L

Les Cargill

James said:
What do you guys use for testing C code?

1) More 'C' code.
2) External script based drivers.
From web searches there seem to be
common ways for some languages but for C there doesn't seem to be any
particular consensus.

I don't find COTS test frameworks valuable.
I am looking for an approach in preference to a package or a framework.
Simple and adequate is better than comprehensive and complex.

If I have to use a package (which is not cross-platform) this is to run on
Linux. But as I say an approach would be best - something I can write myself
without too much hassle. At the moment I am using ad-hoc code but there has
to be a better way.

Why must there be a better way?
 
M

Malcolm McLean

For me, "simple" and "adequate" would be inherently conflicting goals,
while "comprehensive" would be implied by "adequate" - but it all
depends upon what you mean by those terms.
Adequate means that a bug might slip through, but the software is likely still
to be usable. Comprehensive would mean that, as far as humanly impossible,
all bugs are caught, regardless of expense. Comprehensive only equates to
adequate if any bug at all renders the software unacceptable, which might
be the case in a life support system (but only if the chance of a software
failure is of the same or greater order of magnitude as the chance of a
hardware component failing).
The real secret is to write code so that it is easily unit-testable. That
means with as few dependencies and as simple an interface as possible. If
it takes a "world" parameter, it can't be tested except in conjunction with
the rest of the program that sets up and populates that world. So effectively
it can't be unit-tested. If it just takes a "grid", you can test it on a 3x3
example, ascertain that it handles all the corner cases correctly, and be
extremely confident that it will scale up to a full 100x100 world grid.

Techniques include use of function pointers to remove hard-coded dependencies,
separation of bit-shuffling code from IO, heavy use of malloc() to enable
functions to operate on arbitrary sized inputs, and the use of basic types
wherever possible to cut dependencies. Another useful tip is to put trivial
functions like strdup as statics rather than in separate files, to make as
many source files "leaf" as possible. Then you can take the file, test it,
and then put it back, knowing that it's been debugged.
 
J

Jorgen Grahn

What do you guys use for testing C code? From web searches there seem to be
common ways for some languages but for C there doesn't seem to be any
particular consensus.

I am looking for an approach in preference to a package or a framework.
Simple and adequate is better than comprehensive and complex.

Here's what I use.

1. A script which goes through the symbols in an object file or archive,
finds the C++ functions named testSomething() and generates code to
call all of these and print "OK" or "FAIL" as needed.

2. A C++ header file which defines the exception which causes the
"FAIL". Also a bunch of template functions named assert_eq(a, b)
etc for generating them.

3. And then I write tests. One function is one test case. No support
for setup/teardown functions and stuff; the user will have to take
care of that if needed.

Doesn't really force the user to learn C++ ... and I suppose something
similar can be done in C. Longjmp instead of exceptions?

/Jorgen
 
I

Ian Collins

James said:
What do you guys use for testing C code? From web searches there seem to be
common ways for some languages but for C there doesn't seem to be any
particular consensus.

I am looking for an approach in preference to a package or a framework.
Simple and adequate is better than comprehensive and complex.

If I have to use a package (which is not cross-platform) this is to run on
Linux. But as I say an approach would be best - something I can write myself
without too much hassle. At the moment I am using ad-hoc code but there has
to be a better way.

My main focus is on how best to do unit testing and regression testing so
any comments on those would be appreciated.

For unit (and by extension, regression) testing I use cppUnint or google
test. Both ore C++ frameworks, but I think you really benefit from the
extra expressive power of C++ when writing unit tests.

I have a mock C function generator I use to replace external (to the
code under test) functions. The ability to providing generic mock
functions is where C++ really makes a difference.
 
J

James Kuyper

Adequate means that a bug might slip through, but the software is likely still
to be usable.

With very few exceptions, for my software, most of the things that could
possibly go wrong with it render its products unusable. It is possible
for it to contain errors too small to matter, but most of the plausible
ways a defect could occur cause unacceptable errors.
... Comprehensive would mean that, as far as humanly impossible,
all bugs are caught, regardless of expense. ...

The amount of expense that could be expended by humans to catch all bugs
in a piece of software has no upper limit that I'm aware of, just an
unbounded increase in the marginal cost per bug caught as the expense
gets higher.
... Comprehensive only equates to
adequate if any bug at all renders the software unacceptable, which might
be the case in a life support system (but only if the chance of a software
failure is of the same or greater order of magnitude as the chance of a
hardware component failing).

For my group, "adequate" testing has been officially defined as testing
that each branch of the code gets exercised by at least one of the test
cases, and that it has been confirmed that the expected test results
have been achieved for each of those cases, and that the test plan has
been designed to make sure that test cases corresponding to different
branches have distinguishable expected results (I'm amazed at how often
people writing test plans forget that last issue).

I also consider that fairly comprehensive testing, even though it is in
fact humanly possible to do more to catch bugs, if we could afford to
spend the time needed to do so. We can't quite afford the time needed to
do "adequate" testing, as defined above.
 
M

Malcolm McLean

On 08/08/2013 02:49 PM, Malcolm McLean wrote:

For my group, "adequate" testing has been officially defined as testing
that each branch of the code gets exercised by at least one of the test
cases, and that it has been confirmed that the expected test results
have been achieved for each of those cases, and that the test plan has
been designed to make sure that test cases corresponding to different
branches have distinguishable expected results (I'm amazed at how often
people writing test plans forget that last issue).
Your group might have officially defined the term "adequate testing". But
most of us don't work for your group, and are unlikely to adopt the
definition.

Coverage is a reasonable criterion, however. It doesn't prove a program
is correct, because you've got a combinatorial problem with branch
points. But it will catch most bugs.
However a lot of code has branch points for memory allocation failures
which are extremely unlikely to happen. You could argue that if it's worth
handle the allocation failure, it's also worth testing it. But it does
increase the difficulty of testing considerably - either you've got to
alter the source code, or you need a special malloc package, or you need
to fiddle with a debugger. You can make a strong case that not going to
the trouble is "adequate".
 
J

James Kuyper

Your group might have officially defined the term "adequate testing". But
most of us don't work for your group, and are unlikely to adopt the
definition.

I made no claim to the contrary. As I said, it's a definition "for my
group", and I brought it up as an example of how "it all depends upon
what you mean by those terms." The terms that James Harris used:
"simple", "adequate", "comprehensive", and "complex", are all judgement
calls - they will inherently be judged differently by different people
in different contexts.
 
J

Jorgen Grahn

Shell scripting!

It takes thoughtful API preparation, but that's sort of a bonus. And it's
exceedingly portable, future-proof, and has no library dependencies.

Basically, each unit file has a small section at the bottom with a
preprocessor-guarded main() section. I then write code to parse command-line
options which forward to the relevant functions and translate the output.

So you have a test.c, and you can build and run two tests as e.g.

./test t4711 t9112

right? Or perhaps you mix the functionality, the tests and the
command-line parser in the same file.
Or
I write miniprograms which use several related routines, driven by
command-line switches and options.
This requires writing exceptionally modular code, [...]

My regression tests are just shell scripts. Easy to write; easy to tweak;
easy to expand with new tests.

This is the part I don't get: what do these shell scripts do?
As far as I can tell they can only do one thing: select a number
of tests to run. And usually you want to run all of them.

Or do you write text-oriented wrappers for all the code you want to
test, and then write the tests themselves as shell scripts? E.g.

x=`my_test my_add_function 2 3`
[ $x = 5 ] || fail
Regression
testing just isn't a place where it pays to be on the bleeding edge. You
want simple... reuseable... scriptable. In the end only the shell can
provide that long-term.

C is rather long-term and not very bleeding edge. I'm missing
something, and am curious to know what it is.

(It's not that I dislike shell scripts or don't use them for testing;
I just don't see how they help with unit testing.)

/Jorgen
 
I

Ian Collins

Malcolm said:
Adequate means that a bug might slip through, but the software is likely still
to be usable. Comprehensive would mean that, as far as humanly impossible,
all bugs are caught, regardless of expense. Comprehensive only equates to
adequate if any bug at all renders the software unacceptable, which might
be the case in a life support system (but only if the chance of a software
failure is of the same or greater order of magnitude as the chance of a
hardware component failing).
The real secret is to write code so that it is easily unit-testable.

The best way to do that is to write the unit tests before the code.
That
means with as few dependencies and as simple an interface as possible.

Which tended to be a consequence of writing the unit tests before the code.
 
I

Ian Collins

Malcolm said:
Your group might have officially defined the term "adequate testing". But
most of us don't work for your group, and are unlikely to adopt the
definition.

Coverage is a reasonable criterion, however. It doesn't prove a program
is correct, because you've got a combinatorial problem with branch
points. But it will catch most bugs.
However a lot of code has branch points for memory allocation failures
which are extremely unlikely to happen. You could argue that if it's worth
handle the allocation failure, it's also worth testing it. But it does
increase the difficulty of testing considerably - either you've got to
alter the source code, or you need a special malloc package, or you need
to fiddle with a debugger.

Or you use a testing framework that can mock malloc and friends.
You can make a strong case that not going to
the trouble is "adequate".

Not really, if the code isn't covered by a test, it shouldn't be there.
One of my teams used to pay the testers who did the product acceptance
testing in beer if they found bugs in our code. Most "bugs" turned out
to be ambiguities in the specification.
 
J

James Harris

James Kuyper said:
I'm surprised to hear that there's a consensus in any language. I'd
expect any "one size fits all" approach to fail drastically. I realize
that this is not what you were looking to discuss, but if you would,
could you at least summarize what are those "common ways for some
languages" are?

I mentioned common ways for some languages. For Java the biggie seems to be
JUnit. For Python there's Unittest, possibly because it is built in. Also,
Nose and Cucumber seem popular.

James
 
J

James Harris

James Kuyper said:
I made no claim to the contrary. As I said, it's a definition "for my
group", and I brought it up as an example of how "it all depends upon
what you mean by those terms." The terms that James Harris used:
"simple", "adequate", "comprehensive", and "complex", are all judgement
calls - they will inherently be judged differently by different people
in different contexts.

To me, "adequate" means something allowing the job to be done, albeit
without nice-to-have extras. In this context, "comprehensive" means complex
and all encompassing. I was thinking of a testing approach providing
comprehensive facilities, not of being able to carry out a comprehensive set
of tests. Anything that allows a comprehensive set of tests is, er,
adequate!

James
 
J

James Harris

Les Cargill said:
1) More 'C' code.
2) External script based drivers.


I don't find COTS test frameworks valuable.


Why must there be a better way?

For a number of reasons. Ad-hoc code is bespoke, unfamiliar, irregular. More
structured approaches, on the other hand, are easier to understand, modify
and develop. Also, most activities which have a common thread can be made
more regular and the common parts abstracted out to make further similar
work shorter and easier.

James
 
J

James Harris

Jorgen Grahn said:
Here's what I use.

1. A script which goes through the symbols in an object file or archive,
finds the C++ functions named testSomething() and generates code to
call all of these and print "OK" or "FAIL" as needed.

2. A C++ header file which defines the exception which causes the
"FAIL". Also a bunch of template functions named assert_eq(a, b)
etc for generating them.

When such an assert_eq discovers a mismatch how informative is it about the
cause? I see it has no parameter for descriptive text.

That gives me an idea. I wonder if C's macros could be a boon here in that
they could be supplied with any needed parameters to generate good testing
code. Possibly something like the following

EXPECT(function(parms), return_type, expected_result, "text describing the
test")
EXPECT(function(parms), return_type, expected_result, "text describing the
test")

Then a whole series of such EXPECT calls could carry out the simpler types
of test. For any that fail the EXPECT call could state what was expected,
what was received, and produce a relevant message identifying the test which
failed such as

Expected 4, got 5: checking the number of zero elements in the array

where the text at the end comes from the last macro argument.

Of course, the test program could write the number of successes and failures
at the end.

3. And then I write tests. One function is one test case. No support
for setup/teardown functions and stuff; the user will have to take
care of that if needed.

Doesn't really force the user to learn C++ ... and I suppose something
similar can be done in C. Longjmp instead of exceptions?

It's a good idea but I'm not sure that longjmp would be possible without
modifying the code under test.

James
 
J

James Harris

Jorgen Grahn said:
So you have a test.c, and you can build and run two tests as e.g.

./test t4711 t9112

right? Or perhaps you mix the functionality, the tests and the
command-line parser in the same file.

It's an interesting approach to test whole programs. Some time ago I wrote a
tester that could be controlled by a grid where each row was a test and the
columns represented 1) the command, 2) what would be passed to the program
under test via its stdin, 3) what to look for coming out of the program on
its stdout and stderr, and 4) what return code to expect.

That approach did have some advantages. It could test command line programs
written in any language. The language did not matter as all it cared about
was their inputs and outputs. Further, being able to see and edit tests in a
grid form (e.g. via a spreadsheet) it was very easy to understand which
tests were present and which were missing. There was also a script interface
to the same tester.

However, that's got to be an unusual approach hasn't it? I imagine it's more
normal to compile test code with the program being tested and interact with
it at a lower level. That would certainly be more flexible.

James
 
I

Ian Collins

James said:
When such an assert_eq discovers a mismatch how informative is it about the
cause? I see it has no parameter for descriptive text.

If it's anything like common unit test frameworks, it would output
something like "failure in test whatever at line bla, got this, expected
that" which is all you really need.
That gives me an idea. I wonder if C's macros could be a boon here in that
they could be supplied with any needed parameters to generate good testing
code. Possibly something like the following

EXPECT(function(parms), return_type, expected_result, "text describing the
test")
EXPECT(function(parms), return_type, expected_result, "text describing the
test")

There is a common tool called "expect (http://expect.sourceforge.net )
which is frequently used for acceptance testing.

I use something similar with my unit test framework. For example if I
want to test write gets called with the expected file descriptor and
data size, but don't care about the data I would write something like:

write::expect( 42, test::Ignore, size );

functionUnderTest();

CPPUNIT_ASSERT( write::called );

The harness maps mocked functions to objects, so they can have state and
perform generic actions such as checking parameter values.
Then a whole series of such EXPECT calls could carry out the simpler types
of test. For any that fail the EXPECT call could state what was expected,
what was received, and produce a relevant message identifying the test which
failed such as

Expected 4, got 5: checking the number of zero elements in the array

where the text at the end comes from the last macro argument.

Of course, the test program could write the number of successes and failures
at the end.

That's normal behaviour for a test harness.
It's a good idea but I'm not sure that longjmp would be possible without
modifying the code under test.

It's way easier with exceptions, another reason for using a C++ harness.
 
J

Jorgen Grahn

It's an interesting approach to test whole programs. Some time ago I wrote a
tester that could be controlled by a grid where each row was a test and the
columns represented 1) the command, 2) what would be passed to the program
under test via its stdin, 3) what to look for coming out of the program on
its stdout and stderr, and 4) what return code to expect.

That approach did have some advantages. It could test command line programs
written in any language. The language did not matter as all it cared about
was their inputs and outputs. Further, being able to see and edit tests in a
grid form (e.g. via a spreadsheet) it was very easy to understand which
tests were present and which were missing. There was also a script interface
to the same tester.

I see it as one of the benefits of designing your programs as
non-interactive command-line things, just like you say.

Although I don't think I would use a spreadsheet -- too much
duplication of test data between different test cases. This is one
case where I might use shell scripts to implement the tests.
However, that's got to be an unusual approach hasn't it? I imagine it's more
normal to compile test code with the program being tested and interact with
it at a lower level. That would certainly be more flexible.

It's two different things. The latter is unit test. The former is
testing the system itself[1], for a case where the system is unusually
well-suited to system test (no GUI to click around in and so on).

You might want to do both. Personally I enjoy testing the system as a
whole more: it's more obviously useful, and it helps me stay focused
on the externally visible behavior rather than internals.

At any rate, I don't trust a bunch of unit tests to show that the
system works as intended.

/Jorgen

[1] Or a subsystem, because maybe the whole system isn't just a
single command-line tool but several, or several connected by
shell scripts, or ...
 
M

Malcolm McLean

Or you use a testing framework that can mock malloc and friends.


Not really, if the code isn't covered by a test, it shouldn't be there.
You can make that argument.
But you've got to balance the costs and difficulty of the test against
the benefits. I don't have such a testing framework. That's not to say
I might not use one if I could find a good one. But I don't have one at
the moment. I have written malloc wrappers which fail on every 10% or so
of allocation requests, but rarely used that technique. It's too
burdensome for the benefit, for the code that I write.
One of my teams used to pay the testers who did the product acceptance
testing in beer if they found bugs in our code. Most "bugs" turned out
to be ambiguities in the specification.
Knuth offered 1 cent for the first bug in his Art of Computer Programming,
rising exponentially to 2 cents, 4 cents, etc. He soon had to default on
the policy, to avoid going bankrupt.
Beer is a better idea. But sometimes an ambiguous specification is better
than one which is written in strange language designed to avoid
any possibility of misinterpretation. If it's hard to read and to
understand, then it costs more.
Most programs and even functions need to do a job which can be expressed
in plain informal English, and the details don't matter. For example I
might want an embedded, low strength chess-playing algorithm. That can
be fulfilled by someone nabbing one from the web, with the right licence.
It's just adding unnecessary expense to be more specific.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top