Hi all,
(Apologies if this is in a FAQ somewhere, I couldn't find anything).
Almost every time I do any significant amount of coding in C or C++, I
end up wishing C-style comments would nest. It would make rapid
debugging much more convenient (vs. #if 0/#endif or editor macros).
Rapid meaning, what you save a few keystrokes over #if 0?
Anyway, was this an explicit design decision, or some sort of historical
artifact? (e.g. too expensive to parse at the time). What other sorts of
reasons might exist against nesting comments?
Consideration 1:
In order for comments to be suitable as a feature for commenting out code,
the language has to be defined differently. Firstly, /* and */ have to be
tokens. Secondly, the "commented out" material between the /* and */ tokens has
to also be delimited into tokens.
This way, for instance, the following will not break:
/* char *comment_end = "/*"; */
Here, "/*" is embedded in a string literal token and so loses
the comment-closing meaning.
If you don't have this kind of robustness, there is no point in allowing nested
comments.
Either solve the embedding problem 100% or don't bother.
Note that if you allow only tokens between /* and */, then it becomes
more difficult to write comments, which cannot be freeform text.
The #if 0 directive solves the embedding problem, because material which is
skipped by the preprocessor is still decomposed into tokens.
This is why we don't use it for writing comments.
The upshot is that /* */ is for writing comments and #if ... #endif
is for "compiling out" code you don't want.
Consideration 2:
Nesting C comments are not recognizeable by a finite automaton (regular
language). They require a push-down automaton or counter: something
to keep track of the nesting levels so every open /* is balanced
by a closing */.
This is a minor thing but it does mean that in a lexical analyzer generator
based on a tool like lex, you have to write dedicated code for C comments.
(This is sometimes done in practice anyway even for non-nesting comments,
because the regex for C comments is convoluted and ugly if the regex language
does not support advanced operators like a non-greedy Kleene star, or
complement.)
It is nevertheless a useful fact that C can be tokenized purely with regexes,
ugly or not, including the recognition of comments.