[Followups set to comp.lang.c]
Robert Seacord said:
We would like to invite the C community to review and comment on the
current version of the CERT C Secure Coding Standard available online at
www.securecoding.cert.org <
http://www.securecoding.cert.org> before
Version 1.0 is published.
Here are my comments on the preprocessor section, PRE00-A to PRE31-C.
These comments can also be found at
http://www.cpax.org.uk/prg/portable/c/reviews/seccode.php
I will endeavour to get to the rest of the document as and when I can find
the time.
PRE00-A. Prefer inline functions to macros
The claim that 'macros are dangerous' can, to a certain extent, be
justified, in much the same way that the claim 'crossing the road is
dangerous' can be justified. Crossing the road is dangerous, even to the
experienced road user, if done carelessly. Likewise, using macros can be
problematic. SECCODE quotes the classic i++ abuse of a macro call, and
suggests that inline functions can eliminate this problem.
This is certainly true where inline functions are available. It is true
that C99 guarantees that inline functions are available, but nowhere is
there any guarantee that a C99 implementation is available! Yes, within
the terms of reference of SECCODE, the suggestion to use inline functions
has obvious merit. But for those of us without access to inline functions,
it is impractical. SECCODE itself recognises this, by pointing out
portability concerns.
For the experienced C programmer, the use of UPPER CASE for macro names is
generally sufficient to act as a warning that macro arguments with side
effects should be avoided.
In the first code fragment, we see a #define that is clearly local to a
function (because the fragment contains executable code). This is
generally considered to be unwise. I can see why the fragment is written
that way, of course - it's to save space! But in a document which purports
to provide a guide to best practice, it's still a curious way to write
code.
The expansion given in the second code fragment is (trivially) incorrect.
Strictly, it should be:
int a = 81 / ((++i) * (++i) * (++i));
The SECCODE version of the expansion omits the extra parentheses that its
previous definition requires.
The third code fragment, which demonstrates the inline version of the code,
is poorly written. When we're cubing a value, even a moderate input can
result in overflow. For example, on 16-bit-int systems, i = 32 will
produce overflow in the call to cube. On 32-bit-int systems, i = 1291
(hardly excessive) will produce overflow. Precisely how one would solve
this problem depends on one's requirements. If it is known that excessive
inputs will never occur, an assertion is appropriate. If they may well
occur, it must be decided whether they constitute an error. If so, the
function needs a way to report that error. Otherwise, an alternative
strategy (e.g. bignums) must be considered. In any event, the function as
written is insecure, because it can produce undefined behaviour.
The fourth fragment illustrates that a clash between a macro parameter name
and a file scope object name can cause incorrect results. It seems to me,
however, that this is not so much a reason not to use macros as a reason
not to use file scope objects!
In the 'SWAP' example, the hoary old XOR trick is used to swap two values.
This is a really bad idea, for two reasons: firstly, it only works on
integer types: SWAP(mydoubleA, mydoubleB) will fail to compile. Secondly,
if you do this: SWAP(myintA, myintA), you don't get a swap - you get 0!
The example, then, is a poor one. Whilst the suggested replacement
sidesteps the first problem neatly (because it suggests a function that
takes int *, so nobody can reasonably expect it to swap doubles), it fails
to address the second problem. Far better to use a temp:
void swap(int *pa, int *pb)
{
int tmp = *pa;
*pa = *pb;
*pb = tmp;
}
This is guaranteed to work, provided only that pa and pb point to valid int
objects.
The example where the interaction of two macros and a file scope object
produce incorrect results is rather contrived. It's an argument against
using macros to adjust file scope object values. It's an argument against
file scope object values themselves. It's an argument against tight
coupling. But it's not an argument against macros.
The claim that the execution of functions cannot be interleaved 'so
problematic orderings are not possible' ignores the fact that the order of
evaluation of multiple functions called in the same statement is not
specified. Whilst this doesn't cause a problem in the example given, it
can certainly cause problems in cases where the called functions have
related side effects (e.g. updating the same object, writing to the same
stream, or whatever). Although y = f(x) + g(x); is harmless in the example
given, it is not a recipe for success.
Lest the wrong impression be garnered from the above, let me reiterate that
the danger of side effects being unwittingly duplicated by macros is
significant, and the careful programmer should ensure that (a) macros
don't evaluate arguments more than once if at all possible; (b) macro
names are written in UPPER CASE to draw attention to them; (c) macros are
only used if there is no sensible alternative that meets the project
requirements.
PRE01-A. Use parentheses within macros around parameter names
It is of course a good idea to parenthesise macro parameters. I would,
however, take issue with the reasoning that it is not necessary to do this
when the parameter names are surrounded by commas in the replacement text
'because commas have lower precedence than any other operator'! The
example given is: #define FOO(a, b, c) bar(a, b, c) - which suggests that
bar is a function. The commas that separate arguments in a function call
are comma separators, not comma operators, so precedence doesn't enter
into it.
PRE02-A. Macro replacement lists should be parenthesized
Good advice - macro replacement lists should indeed be parenthesised. The
example given, EOF, is a strange one, because the claim is that defining
this to -1 rather than (-1) results in a typographical error of if(c EOF)
(a typo for if(c != EOF)) failing to be diagnosed, whereas the parentheses
make the expansion of the typo syntactically invalid. That, of course, is
quite wrong. if(c(-1)) is perfectly valid, syntactically speaking. It is a
function call!
PRE03-A. Prefer typedefs to defines for encoding types
In my experience, typedef should be used sparingly, although there are
definite cases where it is useful. If, however, you are going to 'encode'
(create a new name for) a type anyway, then you should certainly use
typedef rather than a macro! So, to that extent, I agree with SECCODE
here.
The example, however, is less than ideal. Because ISO reserves identifiers
of the form str[a-z]* in most situations, it is generally wise to avoid
creating any such identifiers yourself. I haven't bothered to look up
whether typedef char * string; breaches ISO rules (feel free to do so
yourself). But if I were writing code that used such a typedef, I'd be
forced to look it up. Rather than bother to do that, I'd choose a
different name - e.g. cstring, or something like that. Worse still, the
name is misleading - char * is not a synonym for string in C, and to
suggest (via the typedef) that it is, is a disservice to the code reader.
PRE04-A. Do not reuse a standard header file name
Quite right - don't reuse a standard header name. Note, however, that
headers are not required to be files. Implementations have considerable
licence with headers, and are not obliged to provide or use files to
encapsulate the information that headers are required to represent.
PRE05-A. Understand macro replacement
Whilst I certainly agree that it is a great idea to understand macro
replacement, I'm not quite sure why this advice appears in SECCODE. It
would seem to be better suited to an introductory tutorial or an FAQ.
PRE06-A. Enclose header file in an inclusion sandwich
I'm not sure why SECCODE refers to an 'inclusion sandwich' rather than
'inclusion guards', but the idea itself is obviously sound. SECCODE fails
to mention, however, that the identifier used for the guard should observe
ISO rules on reserved identifiers. It's an easy trap to fall into, if you
use the convention that the identifier HEADER_H is used for guarding
header.h - this means that, say, errors.h will be guarded by ERRORS_H,
which violates ISO rules that reserve identifiers starting with E and
continuing with another upper case letter. A more robust convention is to
protect header.h with H_HEADER (so errors.h's guard becomes H_ERRORS).
PRE07-A. Avoid using repeated question marks
The point of this advice is to protect you from accidental trigraphs! And
of course it's sound advice. The example, though, is an interesting one.
It introduces a problem for which the blame should probably be shared
about evenly between single-line comments and trigraphs, and presents a
solution in which both have been eliminated. I'm not complaining about
this, but it seems to lack focus. (Note that the simplest fix would have
been the introduction of a space after, or even just the removal of, the
second question mark.)
If you don't use trigraphs, see if you can get your implementation to
disable them. (Some do this by default - you actually have to turn them
on, rather than off.) If you can't disable them (either because your
implementation won't let you or because you do actually use them), it's
worth grepping your source code occasionally in search of trigraph
sequences, so that you can decide on a case-by-case basis whether you
intended to use a trigraph in that situation.
PRE08-A. Guarantee that header filenames are unique
The advice given here is good. In general, you should favour short,
descriptive, unambiguous header names. Yes, I know - how can they be
descriptive and unambiguous if they're short? Nevertheless, the Standard
does limit the length that you're guaranteed to be able to use.
Fortunately, there are 1,572,120,576 different combinations of alpha + 7
alphanumeric, so you should be able to find something that is, at the very
least, short and unambiguous!
PRE30-C. Do not create a universal character name through concatenation
This advice is C99-specific, obviously, but it's still a good idea to avoid
accidental use of UCNs in C89 code, lest you choose to port to a C99
implementation some day.
PRE31-C. Never invoke an unsafe macro with arguments containing assignment,
increment, decrement, or function call
By 'unsafe macro', SECCODE means a macro that evaluates at least one of its
arguments more than once. It is clearly a bad idea to pass to such a macro
any argument that has side effects. The four side effects that are
singled(?!) out in the title are in fact the only four I can think of -
but if you do manage to think of any others, don't pass those to macros
either, okay?
The example is strange, because it's completely pointless. Would it really
have taken the author too much time to write a = ABS(n); rather than just
ABS(n)?