How to remove // comments

J

jacob navia

Recently, a heated debate started because of poor mr heathfield
was unable to compile a program with // comments.

Here is a utility for him, so that he can (at last) compile my
programs :)

More seriously, this code takes 560 bytes. Amazing isn't it? C is very
ompact, you can do great things in a few bytes.

Obviously I have avoided here, in consideration for his pedantic
compiler flags, any C99 issues, so it will compile in obsolete
compilers, and with only ~600 bytes you can run it in the toaster!

--------------------------------------------------------------cut here

/* This program reads a C source file and writes it modified to stdout
All // comments will be replaced by /* ... */ comments, to easy the
porting to old environments or to post it in usenet, where
// comments can be broken in several lines, and messed up.
*/

#include <stdio.h>

/* This function reads a character and writes it to stdout */
static int Fgetc(FILE *f)
{
int c = fgetc(f);
if (c != EOF)
putchar(c);
return c;
}

/* This function skips strings */
static int ParseString(FILE *f)
{
int c = Fgetc(f);
while (c != EOF && c != '"') {
if (c == '\\')
c = Fgetc(f);
if (c != EOF)
c = Fgetc(f);
}
if (c == '"')
c = Fgetc(f);
return c;
}
/* Skips multi-line comments */
static int ParseComment(FILE *f)
{
int c = Fgetc(f);

while (1) {
while (c != '*') {
c = Fgetc(f);
if (c == EOF)
return EOF;
}
c = Fgetc(f);
if (c == '/')
break;
}
return Fgetc(f);
}

/* Skips // comments. Note that we use fgetc here and NOT Fgetc */
/* since we want to modify the output before gets echoed */
static int ParseCppComment(FILE *f)
{
int c = fgetc(f);

while (c != EOF && c != '\n') {
putchar(c);
c = fgetc(f);
}
if (c == '\n') {
puts(" */");
c = Fgetc(f);
}
return c;
}

/* Checks if a comment is followed after a '/' char */
static int CheckComment(int c,FILE *f)
{
if (c == '/') {
c = fgetc(f);
if (c == '*') {
putchar('*');
c = ParseComment(f);
}
else if (c == '/') {
putchar('*');
c = ParseCppComment(f);
}
else {
putchar(c);
c = Fgetc(f);
}
}
return c;
}

/* Skips chars between simple quotes */
static int ParseQuotedChar(FILE *f)
{
int c = Fgetc(f);
while (c != EOF && c != '\'') {
if (c == '\\')
c = Fgetc(f);
if (c != EOF)
c = Fgetc(f);
}
if (c == '\'')
c = Fgetc(f);
return c;
}


int main(int argc,char *argv[])
{
FILE *f;
int c;
if (argc == 1) {
fprintf(stderr,"Usage: %s <file.c>\n",argv[0]);
return EXIT_FAILURE;
}
f = fopen(argv[1],"r");
if (f == NULL) {
fprintf(stderr,"Can't find %s\n",argv[1]);
return EXIT_FAILURE;
}
c = Fgetc(f);
while (c != EOF) {
/* Note that each of the switches must advance the character */
/* read so that we avoid an infinite loop. */
switch (c) {
case '"':
c = ParseString(f);
break;
case '/':
c = CheckComment(c,f);
break;
case '\'':
c = ParseQuotedChar(f);
break;
default:
c = Fgetc(f);
}
}
fclose(f);
return 0;
}
 
R

Richard Heathfield

jacob navia said:
Recently, a heated debate started because of poor mr heathfield
was unable to compile a program with // comments.

Not so. It's not difficult to compile a program with // "comments" under
gcc. All I have to do is invoke gcc in non-conforming mode, thus foregoing
opportunities for useful diagnostic messages - something I'm not prepared
to do lightly.
Here is a utility for him, so that he can (at last) compile my
programs :)

Alas, not yet. You see, the utility itself won't compile:

foo.c: In function `main':
foo.c:104: `EXIT_FAILURE' undeclared (first use in this function)
foo.c:104: (Each undeclared identifier is reported only once
foo.c:104: for each function it appears in.)
make: *** [foo.o] Error 1

Sometimes, words fail me.
 
P

Peter Nilsson

jacob said:
... poor mr heathfield ... Here is a utility for him ...

Tediously childish.
--------------------------------------------------------------cut here

/* This program reads a C source file and writes it modified to stdout
All // comments will be replaced by /* ... */ comments, to easy the

Perhaps you should write a utility that also fixes nested comments that
are not allowed by C90 or C99.
porting to old environments or to post it in usenet, where
// comments can be broken in several lines, and messed up.
*/

I'm sure there are alternative one line perl scripts floating around.
#include <stdio.h>

Does this header define the identifier EXIT_FAILURE which you use
further on? If so, your implementation is not conforming.

<snip>

Some test cases for you to consider...

int c = a //* ... */
b;
int d = '??''; // this is a // comment, is it translated?
 
O

Old Wolf

jacob said:
Recently, a heated debate started because of poor mr heathfield
was unable to compile a program with // comments.

Here is a utility for him, so that he can (at last) compile my
programs :)

Hey, thanks :) One of the things on my TODO list was to
write such a utility, so I can compile a large project in ANSI
conformance mode and see if the compiler throws up any
errors. The project source is conforming (afaik!) except for the
use of // comments.
 
R

Richard Heathfield

Peter Nilsson said:

Some test cases for you to consider...

int c = a //* ... */
b;
int d = '??''; // this is a // comment, is it translated?

After I hacked the code to get it to compile, it failed both those tests,
and it also failed the following two tests:

/\
/ this is a BCPL-style comment

and

// /* Comment */
 
K

Keith Thompson

jacob navia said:
Obviously I have avoided here, in consideration for his pedantic
compiler flags, any C99 issues,

Yes, you have.
so it will compile in obsolete
compilers,

No, it won't.

[...]

You *really* *really* need to try compiling your code before you post
it.

If whatever compiler you used actually accepted the code you posted,
then it's buggy.

You might also consider not acting as if portability is some horrible
burden being imposed on you personally, rather than just a good idea.
 
W

Walter Bright

Peter said:
Some test cases for you to consider...

int c = a //* ... */
b;
int d = '??''; // this is a // comment, is it translated?

A trigraph case:

char* d = "??/""; // "

but of course I've never seen trigraphs outside of a test suite.

There's also backslash line splicing:

// this is the start of a comment \
that continues on this line
 
J

jacob navia

Walter said:
A trigraph case:

char* d = "??/""; // "

but of course I've never seen trigraphs outside of a test suite.
Me neither. But I do not support trigraphs anyway. They are an
unnecessary feature. We had several lebgthy discussions about this in
comp.std.c.
There's also backslash line splicing:

// this is the start of a comment \
that continues on this line

Yes, I added that one.
 
B

Ben Bacarisse

Richard Heathfield said:
jacob navia said:
Recently, a heated debate started because of poor mr heathfield
was unable to compile a program with // comments.

Not so. It's not difficult to compile a program with // "comments" under
gcc. All I have to do is invoke gcc in non-conforming mode, thus foregoing
opportunities for useful diagnostic messages - something I'm not prepared
to do lightly.
Here is a utility for him, so that he can (at last) compile my
programs :)

Alas, not yet. You see, the utility itself won't compile:

foo.c: In function `main':
foo.c:104: `EXIT_FAILURE' undeclared (first use in this function)
foo.c:104: (Each undeclared identifier is reported only once
foo.c:104: for each function it appears in.)
make: *** [foo.o] Error 1

Sometimes, words fail me.

I think there is a deeper irony. Did you relax you compiler options get
this far? If so, it allowed the non standard nested comment to pass.
I get a syntax error at the word "easy".

A program to correct non-C89 comments relies on an extension to the
comment syntax to compile!
 
B

Ben Bacarisse

jacob navia said:
Recently, a heated debate started because of poor mr heathfield
was unable to compile a program with // comments.

Here is a utility for him, so that he can (at last) compile my
programs :)

More seriously, this code takes 560 bytes. Amazing isn't it? C is very
ompact, you can do great things in a few bytes.

You can do *some* things in 560 bytes. Great things, it seems need a
few more. You need to:

(a) include <stdlib.h>
(b) remove the nested comment.
(c) fix the logic bugs.

On given the admittedly clumsy but valid:

return 1//* what divisor? */2;

you program produces the invalid:

return 1/** what divisor? */2; */

And on the more plausible:

// I don't like */ delimiters

we get:

/* I don't like */ delimiters */
 
J

jacob navia

Ben said:
You can do *some* things in 560 bytes. Great things, it seems need a
few more. You need to:

(a) include <stdlib.h> yeah

(b) remove the nested comment. yes

(c) fix the logic bugs.

On given the admittedly clumsy but valid:

return 1//* what divisor? */2;


This is the same as finding a spurious */ in
a cpp comment. Corrected
you program produces the invalid:

return 1/** what divisor? */2; */

And on the more plausible:

// I don't like */ delimiters

we get:

/* I don't like */ delimiters */

!!!!

Corrected, thanks
 
W

Walter Bright

jacob said:
Me neither. But I do not support trigraphs anyway. They are an
unnecessary feature. We had several lebgthy discussions about this in
comp.std.c.

Trigraphs are a worthless feature. Nevertheless, they are in the
standard, and it's much less effort to implement them than it is to
constantly have to justify otherwise.

Aside from such trivial defects, overall the C standard was a vast
improvement over existing practice at the time: having multiple compiler
switches to be quirk-compatible with this or that dialect.
 
J

jacob navia

I edited the code before posting, without recompiling it.
Big mistake. I added a nested comment, and when replacing the
EXIT_FAILURE because of Keith's remarks in another thread
I forgot to add the stdlib.h include.

Besides, I have fixed the few logic bugs pointed out by you:
1) continuation lines that become comments
e.g. /\
/ comment
will become
/*
comment */
2) If a sequence */ is found in a cpp comment it will be replaced by
* /, i.e. a blank will be inserted. There is no other way to do that.
3) Trigraphs are NOT supported.

Thanks to all people that participated. Updated program below.
----------------------------------------------------cut here
#include <stdio.h>
#include <stdlib.h>
/* This function reads a character and writes it to stdout */
static int Fgetc(FILE *f)
{
int c = fgetc(f);
if (c != EOF)
putchar(c);
return c;
}

/* Skips strings */
static int ParseString(FILE *f)
{
int c = Fgetc(f);
while (c != EOF && c != '"') {
if (c == '\\')
c = Fgetc(f);
if (c != EOF)
c = Fgetc(f);
}
if (c == '"')
c = Fgetc(f);
return c;
}
/* Skips multi-line comments */
static int ParseComment(FILE *f)
{
int c = Fgetc(f);

while (1) {
while (c != '*') {
c = Fgetc(f);
if (c == EOF)
return EOF;
}
c = Fgetc(f);
if (c == '/')
break;
}
return Fgetc(f);
}

/* Skips // comments */
static int ParseCppComment(FILE *f)
{
int c = fgetc(f);

while (c != EOF && c != '\n') {
int last;
putchar(c);
last = c;
c = fgetc(f);
if (c == '/' && last == '*')
putchar(' ');
}
if (c == '\n') {
puts(" */");
c = Fgetc(f);
}
return c;
}

/* Checks if a comment is followed after a '/' char */
static int CheckComment(int c,FILE *f)
{
c = fgetc(f);
if (c == '*') {
putchar('*');
c = ParseComment(f);
}
else if (c == '/') {
putchar('*');
c = ParseCppComment(f);
}
else if (c == '\\') {
c = fgetc(f);
if (c == '\n') {
c = fgetc(f);
if (c == '/') {
printf("*\n");
ParseCppComment(f);
}
else printf("\\\n%c",c);
}
else {
putchar('\\');
putchar(c);
}
}
else {
putchar(c);
c = Fgetc(f);
}
return c;
}

/* Skips chars between simple quotes */
static int ParseQuotedChar(FILE *f)
{
int c = Fgetc(f);
while (c != EOF && c != '\'') {
if (c == '\\')
c = Fgetc(f);
if (c != EOF)
c = Fgetc(f);
}
if (c == '\'')
c = Fgetc(f);
return c;
}


int main(int argc,char *argv[])
{
FILE *f;
int c;
if (argc == 1) {
fprintf(stderr,"Usage: %s <file.c>\n",argv[0]);
return EXIT_FAILURE;
}
f = fopen(argv[1],"r");
if (f == NULL) {
fprintf(stderr,"Can't find %s\n",argv[1]);
return EXIT_FAILURE;
}
c = Fgetc(f);
while (c != EOF) {
/* Note that each of the switches must advance the character */
/* read so that we avoid an infinite loop. */
switch (c) {
case '"':
c = ParseString(f);
break;
case '/':
c = CheckComment(c,f);
break;
case '\'':
c = ParseQuotedChar(f);
break;
default:
c = Fgetc(f);
}
}
fclose(f);
return 0;
}
 
B

Ben Bacarisse

jacob navia said:
This is the same as finding a spurious */ in
a cpp comment.

I don't think so. The point of this test case is that it does not
*have* a CPP comment at all.
Corrected

No. Your new version produces:

return 1/** what divisor? * /2; */

(which is not a valid statement) from

return 1//* what divisor? */2;

which is, I think, a valid way to write return 1/2;
 
J

jacob navia

Ben said:
I don't think so. The point of this test case is that it does not
*have* a CPP comment at all.




No. Your new version produces:

return 1/** what divisor? * /2; */

(which is not a valid statement) from

return 1//* what divisor? */2;

which is, I think, a valid way to write return 1/2;

No. MSVC for instance will pre-proccess your statement to
return 1

without anything beyond the //
gcc will do the same
lcc-win32 will do the same
 
W

Walter Bright

Ben said:
No. Your new version produces:

return 1/** what divisor? * /2; */

(which is not a valid statement) from

return 1//* what divisor? */2;

which is, I think, a valid way to write return 1/2;

Jacob has that right. //* is lexed as the start of a // comment, not a
divide followed by the start of a /* comment. It's the same reason that:

i/*p;
i++; /* comment */
*p+3;

is parsed as:

(i * p) + 3;

i.e. the maximal munch rule.

Walter Bright
www.digitalmars.com C, C++, D programming language compilers
 
B

Ben Bacarisse

jacob navia said:
No. MSVC for instance will pre-proccess your statement to
return 1

without anything beyond the //
gcc will do the same
lcc-win32 will do the same

Yes, I had assumed you program would be "C89 safe", but I can see now that
there is no reasonable way that is could be.
 
B

Ben Bacarisse

Walter Bright said:
Jacob has that right. //* is lexed as the start of a // comment

Yes, ack'd already. I had stupidly thought the program should be C89
neutral, but the input will never be C89 if it has // intended as a
comment.

So, who wants to do moving declarations up to the top on the enclosing
block? :)
 
B

Bart

jacob said:
Besides, I have fixed the few logic bugs pointed out by you:
1) continuation lines that become comments
e.g. /\
/ comment
will become
/*
comment */

But the more likely

//\
comment

Won't work.

You also forgot the case:

#include <ftp://domain.com/myfile.h>

And your program output is very misleading when given the input:

#error // comments not allowed

Regards,
Bart.
 
J

jacob navia

Bart said:
But the more likely

//\
comment

Won't work.
Fixed


You also forgot the case:

#include <ftp://domain.com/myfile.h>

????
Well, URLs in #include directives...

Not yet.
And your program output is very misleading when given the input:

#error // comments not allowed

If they are not allowed...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

// comments 35
A simple parser 121
Text processing 29
Command Line Arguments 0
Working with files 1
Serial port 5
hexump.c 79
Taking a stab at getline 40

Members online

Forum statistics

Threads
474,056
Messages
2,570,443
Members
47,089
Latest member
Bobby2025b

Latest Threads

Top