Why don't C comments nest?

A

Aron Packer

Hi all,

(Apologies if this is in a FAQ somewhere, I couldn't find anything).

Almost every time I do any significant amount of coding in C or C++, I
end up wishing C-style comments would nest. It would make rapid
debugging much more convenient (vs. #if 0/#endif or editor macros).

Anyway, was this an explicit design decision, or some sort of historical
artifact? (e.g. too expensive to parse at the time). What other sorts of
reasons might exist against nesting comments?

Thanks!
 
N

nroberts

Hi all,

(Apologies if this is in a FAQ somewhere, I couldn't find anything).

Almost every time I do any significant amount of coding in C or C++, I
end up wishing C-style comments would nest. It would make rapid
debugging much more convenient (vs. #if 0/#endif or editor macros).

Anyway, was this an explicit design decision, or some sort of historical
artifact? (e.g. too expensive to parse at the time). What other sorts of
reasons might exist against nesting comments?

Thanks!

Well, it does make parsing a little easier. If you try to write a C-
comment remover, which I believe is an exercise in K&R, you'd probably
see why.
 
J

James Kuyper

Hi all,

(Apologies if this is in a FAQ somewhere, I couldn't find anything).

Almost every time I do any significant amount of coding in C or C++, I
end up wishing C-style comments would nest. It would make rapid
debugging much more convenient (vs. #if 0/#endif or editor macros).

Anyway, was this an explicit design decision, or some sort of historical
artifact? (e.g. too expensive to parse at the time). What other sorts of
reasons might exist against nesting comments?

The Rationale (Revision 5.1.0 April-2003) says:
 
K

Kaz Kylheku

Hi all,

(Apologies if this is in a FAQ somewhere, I couldn't find anything).

Almost every time I do any significant amount of coding in C or C++, I
end up wishing C-style comments would nest. It would make rapid
debugging much more convenient (vs. #if 0/#endif or editor macros).

Rapid meaning, what you save a few keystrokes over #if 0?
Anyway, was this an explicit design decision, or some sort of historical
artifact? (e.g. too expensive to parse at the time). What other sorts of
reasons might exist against nesting comments?

Consideration 1:

In order for comments to be suitable as a feature for commenting out code,
the language has to be defined differently. Firstly, /* and */ have to be
tokens. Secondly, the "commented out" material between the /* and */ tokens has
to also be delimited into tokens.

This way, for instance, the following will not break:

/* char *comment_end = "/*"; */

Here, "/*" is embedded in a string literal token and so loses
the comment-closing meaning.

If you don't have this kind of robustness, there is no point in allowing nested
comments.

Either solve the embedding problem 100% or don't bother.

Note that if you allow only tokens between /* and */, then it becomes
more difficult to write comments, which cannot be freeform text.

The #if 0 directive solves the embedding problem, because material which is
skipped by the preprocessor is still decomposed into tokens.
This is why we don't use it for writing comments.

The upshot is that /* */ is for writing comments and #if ... #endif
is for "compiling out" code you don't want.

Consideration 2:

Nesting C comments are not recognizeable by a finite automaton (regular
language). They require a push-down automaton or counter: something
to keep track of the nesting levels so every open /* is balanced
by a closing */.

This is a minor thing but it does mean that in a lexical analyzer generator
based on a tool like lex, you have to write dedicated code for C comments.
(This is sometimes done in practice anyway even for non-nesting comments,
because the regex for C comments is convoluted and ugly if the regex language
does not support advanced operators like a non-greedy Kleene star, or
complement.)

It is nevertheless a useful fact that C can be tokenized purely with regexes,
ugly or not, including the recognition of comments.
 
B

BartC

Kaz Kylheku said:
Rapid meaning, what you save a few keystrokes over #if 0?


Consideration 1:

In order for comments to be suitable as a feature for commenting out
code,
the language has to be defined differently. Firstly, /* and */ have to be
tokens. Secondly, the "commented out" material between the /* and */
tokens has
to also be delimited into tokens.

I don't think that's necessary, but you have to consider that */ inside a
string literal, or as part of a // comment, may cause problems.
This way, for instance, the following will not break:

/* char *comment_end = "/*"; */
Here, "/*" is embedded in a string literal token and so loses
the comment-closing meaning.

You mean "*/"? In which case you can't even write it whether you have nested
comments or not:

/* char *comment_end = "*/"; */

I've programmed nested comments in lexers before, and most of the time they
work fine. (It seems to be mostly commenting-out blocks of lexer code, which
is full of literals and comments to do with comment-processing, that cause
problems.)
If you don't have this kind of robustness, there is no point in allowing
nested
comments.
Either solve the embedding problem 100% or don't bother.

The above example (and a there are a few more) show that even normal
comments aren't 100% robust, so what's the difference? Nobody's going to be
forcing anyone to make use of them.
Note that if you allow only tokens between /* and */, then it becomes
more difficult to write comments, which cannot be freeform text.

The #if 0 directive solves the embedding problem, because material which
is

Perhaps have #comment ... #end then

The upshot is that /* */ is for writing comments and #if ... #endif
is for "compiling out" code you don't want.

Suppose the code you want to comment out is the middle of a line, which is
already using /*...*/ ?
 
K

Keith Thompson

BartC said:
Perhaps have #comment ... #end then

Hmm? What problem would #comment ... #end solve that #if 0 ... #endif
doesn't already solve?
Suppose the code you want to comment out is the middle of a line, which is
already using /*...*/ ?

Well then I guess you've got a bit of a problem. But I don't think I've
evern run into that situation.

Let's see if I can come up with a plausible example.

some_func(10 /* param1 */, 20 /* param2 */, 30 /* param3 */);

The comments are there to show the names of the actual parameters,
without which "some_func(10, 20, 30);" isn't nearly as clear.

Then some_func is changed so it only takes 2 parameters, but you want to
keep the information about the 3rd one in the source for some reason.

First, consider whether that's really worth doing. You can just delete
the third argument altogether:

some_func(10 /* param1 */, 20 /* param2 */);

and if you need to see the 3-argument version of the call, you can just
look in your source control system. (You *are* using one, right?)

But ok, let's say there's a good reason to keep the original version as
a comment. Here's how I might do it:

#if 0
some_func(10 /* param1 */, 20 /* param2 */, 30 /* param3 */);
#else
some_func(10 /* param1 */, 20 /* param2 */);
#endif

Any definition of how comments work (nesting vs. non-nesting,
single-line vs. end-of-line vs. partial-line vs. multi-line, etc.) is
going to be inconvenient for some purposes. None of these
inconveniences are insurmountable.

And of course changing the rules now would break existing code. I've
seen code that specifically depends on the fact that /**/ comments
*don't* nest.
 
J

James Kuyper

Well then I guess you've got a bit of a problem. But I don't think I've
evern run into that situation.

Let's see if I can come up with a plausible example.

some_func(10 /* param1 */, 20 /* param2 */, 30 /* param3 */);

The comments are there to show the names of the actual parameters,
without which "some_func(10, 20, 30);" isn't nearly as clear.

Then some_func is changed so it only takes 2 parameters, but you want to
keep the information about the 3rd one in the source for some reason.

First, consider whether that's really worth doing. You can just delete
the third argument altogether:

some_func(10 /* param1 */, 20 /* param2 */);

and if you need to see the 3-argument version of the call, you can just
look in your source control system. (You *are* using one, right?)

But ok, let's say there's a good reason to keep the original version as
a comment. Here's how I might do it:

#if 0
some_func(10 /* param1 */, 20 /* param2 */, 30 /* param3 */);
#else
some_func(10 /* param1 */, 20 /* param2 */);
#endif

My preference is

some_func(10 /* param1 */, 20 /* param2 */
#if 0
, 30 /* param3 */
#endif
);

But I can understand why some might consider that ugly.
 
K

Keith Thompson

James Kuyper said:
On 11/11/2011 04:16 PM, Keith Thompson wrote: [...]
But ok, let's say there's a good reason to keep the original version as
a comment. Here's how I might do it:

#if 0
some_func(10 /* param1 */, 20 /* param2 */, 30 /* param3 */);
#else
some_func(10 /* param1 */, 20 /* param2 */);
#endif

My preference is

some_func(10 /* param1 */, 20 /* param2 */
#if 0
, 30 /* param3 */
#endif
);

But I can understand why some might consider that ugly.

If a comma were permitted after the last parameter declaration, you
could just write:

some_function(10 /* param1 */,
20 /* param2 */,
#if 0
30 /* param3 */,
#endif
);

C99 changed the syntax to permit a trailing comma in an enum
declaration. It would have made sense to permit a trailling
comma for *any* comma-separated list. (Probably not for the comma
operator, though.)
 
B

BartC

Hmm? What problem would #comment ... #end solve that #if 0 ... #endif
doesn't already solve?

It's more self-explanatory than #if 0
Well then I guess you've got a bit of a problem. But I don't think I've
evern run into that situation.

Let's see if I can come up with a plausible example.

The problem is placing unnecessary restrictions on how C code is laid out.

Insisting on #if 0 means imposing a line-oriented structure, while C is
supposed to be free-format.

Besides, any example anyone comes up with, can be rewritten to use multiple
lines that would allow the use of #if 0. If which case, why bother having
/*...*/ comments at all?

The advantage of /*...*/ is starting and/or ending the comment in the middle
of a line. If that is desirable, why shouldn't one be able to do it on a
section of code (which could span multiple lines) which happens to already
have a comment in there?

Maybe there are several existing comments, some can be whole lines and some
partial. Or it's a massive bit of code and it may not be easy to check
whether there are comments there or not! But it shouldn't be necessary to
care.
 
I

Ian Collins

Suppose the code you want to comment out is the middle of a line, which is
already using /*...*/ ?

Check in, remove code, compile, test and if required, revert.
 
B

Ben Pfaff

BartC said:
Besides, any example anyone comes up with, can be rewritten to use multiple
lines that would allow the use of #if 0. If which case, why bother
having /*...*/ comments at all?

#if 0
Because this won't compile.
#endif
 
P

Phil Carmody

BartC said:
I don't think that's necessary, but you have to consider that */ inside a
string literal, or as part of a // comment, may cause problems.



You mean "*/"? In which case you can't even write it whether you have nested
comments or not:

That example wouldn't demonstrate nesting. I prefer the "/*" example,
as, with nesting, you can't use the final */ to close the comment, only
to decrease its comment nesting to 1.

Phil
 
K

Keith Thompson

BartC said:
Keith Thompson said:
BartC said:
news:[email protected]... [...]
The #if 0 directive solves the embedding problem, because material which
is

Perhaps have #comment ... #end then
Hmm? What problem would #comment ... #end solve that #if 0 ... #endif
doesn't already solve?

It's more self-explanatory than #if 0

So use #ifdef SOME_MEANINGFUL_IDENTIFIER, and don't provide a definition
for that identifier.
The problem is placing unnecessary restrictions on how C code is laid out.

Insisting on #if 0 means imposing a line-oriented structure, while C is
supposed to be free-format.

Besides, any example anyone comes up with, can be rewritten to use multiple
lines that would allow the use of #if 0. If which case, why bother having
/*...*/ comments at all?

Because /*...*/ comments are more convenient in many cases.

(Personally, I wouldn't mind having just // comments and no /*...*/
comments, but of course we can't change that now. There are plenty of
languages that only have end-of-lime comments.)
The advantage of /*...*/ is starting and/or ending the comment in the middle
of a line. If that is desirable, why shouldn't one be able to do it on a
section of code (which could span multiple lines) which happens to already
have a comment in there?

Because nothing is perfect, and you can't have everything.
Maybe there are several existing comments, some can be whole lines and some
partial. Or it's a massive bit of code and it may not be easy to check
whether there are comments there or not! But it shouldn't be necessary to
care.

I have no doubt that the syntax of C comments could be tweaked in some
way that would be more to your liking. Any such change would result in
something that's less to someone else's liking. I don't suggest that
your preferences are less important than someone else's; the deciding
factor has to be backward compatibility.

Come up with a proposed change that (a) does what you want, and (b)
doesn't break existing code, and we can discuss it. (But the current
system isn't sufficiently broken that any such change is likely to be
adopted.)
 
B

BartC

Ben Pfaff said:
#if 0
Because this won't compile.
#endif

OK, so lines between #if 0 and #endif are tokenised (as I found out earlier)
and need to contain well-formed char/string literals and ordinary comments,
even if it doesn't care about other tokens and about any syntax.

(Which also means it might be unusable for some kinds of multi-line
comments, where you want to temporarily comment-out a half-finished block of
code where some tokens are not complete. Although there is the same problem
with enclosing the block in a /*...*/ comment when there is an open /*...*/
comment inside.)

I would have expected #if 0 to completely ignore any lines that didn't start
with #.
 
B

BartC

Keith Thompson said:
Because /*...*/ comments are more convenient in many cases.

(Personally, I wouldn't mind having just // comments and no /*...*/
comments, but of course we can't change that now. There are plenty of
languages that only have end-of-lime comments.)

// comments aren't too bad, other than problems with imposing
line-orientation on the source code (so breaking a line inside a // comment
causes a problem, and so does joining a //-line to a non-// one).

And they behave much better with nested comments.
I have no doubt that the syntax of C comments could be tweaked in some
way that would be more to your liking.

Don't worry about me; I don't use C enough for it to be that important.

I'm just saying that /*...*/ can be made to nest without any serious
problems, most of the time...; no new syntax is needed.

(I've programmed nested comments for other languages, and they work fine;
mostly I used {...}, which would be problematic for C, and have experimented
with \...\ and {/.../} (which frees up { and }). However 99% of the time I
use //-like comments to end-of-line, but with a single character, such as !,
# and \.
Any such change would result in
something that's less to someone else's liking.

(Ideally you would just tell an editor to comment/uncomment some highlighted
text, and it would just do it. Nested comments are still important, and
maybe it could even fix-up any troublesome characters, such as a stray */,
that might cause a problem. In this case the actual comment syntax becomes
less critical.)
 
E

Eric Sosman

[...]
Besides, any example anyone comes up with, can be rewritten to use multiple
lines that would allow the use of #if 0. If which case, why bother
having /*...*/ comments at all?

Because the text inside /*...*/ can be anything at all except */.

strcat(p, "'"); /* can't use " here */

If you want comments to nest and *not* to be tripped up by examples
like Kaz' `/* char *comment_end = "/*"; */', then you need to insist
that the comment's content be lex-able, and comments like the one in
my example would become impossible.
 
K

Kaz Kylheku

You mean "*/"? In which case you can't even write it whether you have nested
comments or not:

That's my point. You can't write it without nested comments, but nested
comments don't fix anything.
Perhaps have #comment ... #end then

That's a useless syntactic sugar for #if 0; furthermore, it's badly
named since this feature is not for commenting. Your "comment" has
to be written in valid C preprocessor tokens.

Really, do you need to be able to invoke undefined behavior in comments? :)
Suppose the code you want to comment out is the middle of a line, which is
already using /*...*/ ?

There is this:

C89:

#define IGN(X)

IGN( foo() );

void bar(IGN( const ) int *ptr)
{
}

Now, an unparenthesized comma doesn't work, e,g. IGN(A, B).

C99 varidic macros to the rescue:

#define IGN(...)

struct x y = { 1, 2 IGN( , 3, 4 ) };

Or you could just edit the darn code. Remove the inner /* */ and add the outer
ones. There are few enough people working in C, and out of those few enough who
have this problem, that it's not some kind of wide-spread economic problem.
 
B

BartC

Kaz Kylheku said:
That's a useless syntactic sugar for #if 0; furthermore, it's badly
named since this feature is not for commenting. Your "comment" has
to be written in valid C preprocessor tokens.

That was before I realised that #if 0 didn't work as I expected, ie. ignore
everything except what was necessary to find a matching #endif.
 
B

BartC

Eric Sosman said:
On 11/11/2011 5:14 PM, BartC wrote:
strcat(p, "'"); /* can't use " here */

Sorry I don't understand your point here.
If you want comments to nest and *not* to be tripped up by examples
like Kaz' `/* char *comment_end = "/*"; */', then you need to insist
that the comment's content be lex-able, and comments like the one in
my example would become impossible.

But if someone wants to write a /* ... */ comment then they will be tripped
up by anything containing a */:

// */gobbledygook/*

The above is fine (maybe they intended the comment to be bold/italic on
Usenet). Now comment out the section using /*...*/:

/*
// */gobbledygook/*
*/

Now it no longer compiles (not unless gobbledygook is a valid macro).

So /*...*/ comments are flawed whether they are nested or not. Since they
are still widely used despite this flaw, I can't see why the similarly
flawed nested ones can't also have been in use from the start (I agree
introducing them now might throw up a few odd errors).
 
W

Willem

Keith Thompson wrote:
) If a comma were permitted after the last parameter declaration, you
) could just write:
)
) some_function(10 /* param1 */,
) 20 /* param2 */,
) #if 0
) 30 /* param3 */,
) #endif
) );

Which is why I have changed my preference to writing this:

some_function(10 /* param1 */
,20 /* param2 */
#if 0
,30 /* param3 */
#endif
);

And not only in C, but in almost any language.

The point is that this plays a lot nicer with revision control systems,
because adding or removing an argument is a 1-line change, whereas in the
other case it can potentially change the previous line as well.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,776
Messages
2,569,602
Members
45,184
Latest member
ZNOChrista

Latest Threads

Top