I'm not certain the last section is still relevant (page was last
modified ont 1998-May-15 22:22:53).
----v----
Include files
Simple rule: include files should never include include files. If
instead they state (in comments or implicitly) what files they need to
have included first, the problem of deciding which files to include is
pushed to the user (programmer) but in a way that's easy to handle and
that, by construction, avoids multiple inclusions. Multiple inclusions
are a bane of systems programming. It's not rare to have files included
five or more times to compile a single C source file. The Unix
/usr/include/sys stuff is terrible this way.
There's a little dance involving #ifdef's that can prevent a file
being read twice, but it's usually done wrong in practice - the #ifdef's
are in the file itself, not the file that includes it. The result is
often thousands of needless lines of code passing through the lexical
analyzer, which is (in good compilers) the most expensive phase.
Just follow the simple rule.
----^----
a) I think if a header declares a function with external linkage then it
should also declare all necessary function parameter types. A single
inclusion should make all constructs declared there completely usable.
For example, if you provide
int mangle_context(struct context *ctx);
in "mangle_context.h", then inclusion of "mangle_context.h" should make
immedately possible to create/initialize (at least) such an object.
Either an initialization function taking a pointer and a complete struct
declaration (so that manual auto allocation / sizeof is possible), or an
incomplete ("opaque") declaration and a "factory" function (like
fopen()).
For example, C99 7.20.3.3 "The malloc function" gives the following
synopsis for malloc():
----v----
#include <stdlib.h>
void *malloc(size_t size);
----^----
and 7.20 "General utilities <stdlib.h>" refers to 7.17 "Common
definitions <stddef.h>" when introducing "size_t" (among others).
Indeed, the standard doesn't talk about the inclusion of <stddef.h>, but
<stdlib.h> makes "size_t" visible and its definition obviously cannot
clash with that in <stddef.h>.
Thus I see my general guideline about making everything needed available
vindicated. Then the programmer is left with a choice how to implement
this in his/her own headers. One is explicitly repeating declarations,
which I find horrendous, and the other is recursive inclusion.
b) "the lexical analyzer [...] is (in good compilers) the most expensive
phase" -- I'm not sure what Rob Pike thinks about gcc, but gcc's -O
options increase compilation time noticeably. I'd think the translation
phases commonly implemented with a separate preprocessor program are the
computationally least intensive.
Even though ccache [0] states
----v----
ccache is a compiler cache. It acts as a caching pre-processor to C/C++
compilers, using the -E compiler switch and a hash to detect when a
compilation can be satisfied from cache. This often results in a 5 to 10
times speedup in common compilations.
----^----
it's not spelled out wheter that compilation time saved by ccache was IO
or CPU dominated. I'd figure the former; hard disk seeks are very slow,
and commonly used declarations are scattered over a lot of small files.
Simply serializing an inclusion-tree as a single sequentially allocated
file should be a huge performance benefit (which should probably vanish
with a warm buffer cache), and has nothing to do with the amount of text
the lexer has to tokenize. Thus the speedup won by ccache doesn't prove
Rob Pike's point (unless he also meant IO-time, which I doubt).
Cheers,
lacos
[0]
http://ccache.samba.org/