Please critique my code/program

G

G Patel

I wrote the following program to remove C89 type comments from stdin
and send it to stdout (as per exercise in K&R2) and it works but I was
hoping more experienced programmer would critique the layout/style/etc.
[I accidentally posted this before in comp.std.c, not trying to post to
several groups on purpose]

Any comments will be helpful. Thank you.

/*
****************************************************
C89/90 COMMENT REMOVER
======================

Description: This program removes all C89/90 style
commnets from a c source (file).
The program does not support C++/C99
style comments.

Input/Ouput: Input for the c source is taken from
stdin and output is to stdout.
Command-line piping of c source files
is expected.

Author: Gaya Patel

Date: Feb 24, 2005

****************************************************
*/

#include <stdio.h>

#define OUT 0
#define IN 1

int main()
{
int c; /* input character for state machine cycle */

/* state variables */
int comstate = OUT; /* state variable C, in explanation below */
int entry = OUT; /* state variable E, in explanation below */
int quostate = OUT; /* state variable Q, in explanation below */

/* --------------- */

/*************************************************************
OVERVIEW OF STATE MACHINE DESIGN:
*********************************

State machine input: c = getchar()
State machine cycle: on every c = getchar() input

State A: (C:E:Q) = (OUT: OUT: OUT) = (0: 0: 0)
-----------------------------------------------------
If input is " transition to STATE Q
If input is / transition to STATE B
Otherwise no transition

State B: (C:E:Q) = (OUT: IN: OUT) = (0: 1: 0)
-----------------------------------------------------
If input is * transition to STATE C
If input is " transition to STATE Q and putchar('/')
Otherwise transition to STATE A and putchar('/')

State C: (C:E:Q) = (IN: OUT: OUT) = (1: 0: 0)
------------------------------------------------------
If input is * transition to STATE D
Otherwise no transition

State D: (C:E:Q) = (IN: IN: OUT) = (1: 1: 0)
------------------------------------------------------
If input is / transition to STATE D
Otherwise transition to STATE C

State E (C:E:Q) = (IN: IN: IN) = (1: 1: 1)
------------------------------------------------------
Any input transition to A

STATE Q: (C:E:Q) = (OUT: OUT: IN) = (0: 0: 1)
------------------------------------------------------
If input is " transition to STATE A
Otherwise no transition

OUTPUT FORMING LOGIC SPECIAL CASE FOR STATE B
**********************************************
The output forming logic of the state machine checks the
post-transition state variables to decide whether or not
to output the input char, or to discard it (not output).
In the case of post-transition STATE B, at that point, it
is still not sure whether the '/' is the start of a
comment or not. The output forming logic simply discards
this character (assumes start of comment). Therefore,
within the state machine design, an extra '/' is outputted
upon any transition from B other than B->C.

******************************************************************/

while((c = getchar()) != EOF)
{
/* ***** state transition logic ***** */

if(comstate == OUT && entry == OUT && quostate == OUT)
{ /* IF IN STATE A */
if(c == '"')
{
quostate = IN; /* TRANSITION TO STATE Q */
}
else if(c == '/')
{
entry = IN; /* TRANSITION TO STATE B */
}
else
{
/* STAY IN STATE A */
}
}
else if(comstate == OUT && entry == IN && quostate == OUT)
{ /* IF IN STATE B */
if(c == '*')
{
comstate = IN; /* TRANSITION TO STATE C */
entry = OUT;
}
else if('"')
{
quostate = IN; /* TRANSITION TO STATE Q */
entry = OUT;
putchar('/'); /* Print '/' */
}
else
{
entry = OUT; /* TRANSITION TO STATE A */
putchar('/'); /* Print '/' */
}

}
else if(comstate == IN && entry == OUT && quostate == OUT)
{ /* IF IN STATE C */
if(c == '*')
{
entry = IN; /* TRANSITION TO STATE D */
}
else
{
/* STAY IN C */
}
}
else if(comstate == IN && entry == IN && quostate == OUT)
{ /* IF IN STATE D */
if(c == '/')
{
quostate = IN; /* TRANSITION TO STATE E */

}
else
{
entry = OUT; /* TRANSITION TO STATE C */

}
}
else if(comstate == IN && entry == IN && quostate == IN)
{ /* IF IN STATE E */
comstate = OUT;
entry = OUT; /* TRANSITION TO STATE A */
quostate = OUT;
}
else if(quostate == IN) /* IF IN STATE Q */
{
if(c == '"')
{
quostate = OUT; /* TRANSITION TO STATE A */
}
else
{
/* STAY IN STATE Q */
}
}
else
{
/* INVALID STATE */
}

/**********************************************************
OUTPUT FORMING LOGIC
********************
* The output forming logic is simply an if statement
that checks the post-transition state.

* If the state is A or Q, the output is allowed,
otherwise no output occur on that cycle

* See state B's special case in the description of
the state machine (top of module).

*********************************************************/

/* ***** output forming logic ***** */

if(c == '\n')
{
putchar('\n');
}
else if(comstate == OUT && entry == OUT)
{
putchar(c);
}

} /* end of large while loop */

return 0;
}
/**/
 
T

Thad Smith

G said:
> I wrote the following program to remove C89 type comments from stdin
> and send it to stdout (as per exercise in K&R2) and it works but I was
> hoping more experienced programmer would critique the layout/style/etc.
> [I accidentally posted this before in comp.std.c, not trying to post to
> several groups on purpose]
>
> Any comments will be helpful. Thank you.

As far as style, I think the C, E, and Q binary variables are an
unnecessary complexity. You also describes the states as A, B, C,...
You can simply define an enum with these (or better yet, mnemonic) state
names.

The code fails in several cases:
1. The end of comment in /***/ will not be detected.
2. If fails to detect a comment following /:
a = b//*days*/2;
3. It fails to omit detection of a comment start within a character
constant: '/*' (yes, that's allowed in C90).
4. It fails to protect end of string detection from escaped quotes:
" Not end:\" since quote was escaped".
5. And finally, it fails to handle the trigraph ??/, which translates to
\, thus affecting both comment delimiters and string and character
constant escapes.

Thad
 
C

CBFalconer

G said:
I wrote the following program to remove C89 type comments from
stdin and send it to stdout (as per exercise in K&R2) and it works
but I was hoping more experienced programmer would critique the
layout/style/etc. [I accidentally posted this before in comp.std.c,
not trying to post to several groups on purpose]

Any comments will be helpful. Thank you.

.... snip code ...

On a cursory look it appears to be a good exercise in defining a
state machine. However I consider a state machine to be overly
complex for this particular job. Well formed source naturally
terminates each comment and/or string, and the constructs do not
nest. So my version is as follows:

/* File uncmntc.c - demo of a text filter
Strips C comments. Tested to strip itself
by C.B. Falconer. 2002-08-15
Public Domain. Attribution appreciated
report bugs to <mailto:[email protected]>
*/

/* With gcc3.1, must omit -ansi to compile eol comments */

#include <stdio.h>
#include <stdlib.h>

static int ch, lastch;

/* ---------------- */

static void putlast(void)
{
if (0 != lastch) fputc(lastch, stdout);
lastch = ch;
ch = 0;
} /* putlast */

/* ---------------- */

/* gobble chars until star slash appears */
static int stdcomment(void)
{
int ch, lastch;

ch = 0;
do {
lastch = ch;
if (EOF == (ch = fgetc(stdin))) return EOF;
} while (!(('*' == lastch) && ('/' == ch)));
return ch;
} /* stdcomment */

/* ---------------- */

/* gobble chars until EOLine or EOF. i.e. // comments */
static int eolcomment(void)
{
int ch, lastch;

ch = '\0';
do {
lastch = ch;
if (EOF == (ch = fgetc(stdin))) return EOF;
} while (!(('\n' == ch) && ('\\' != lastch)));
return ch;
} /* eolcomment */

/* ---------------- */

/* echo chars until '"' or EOF */
static int echostring(void)
{
putlast();
if (EOF == (ch = fgetc(stdin))) return EOF;
do {
putlast();
if (EOF == (ch = fgetc(stdin))) return EOF;
} while (!(('"' == ch) && ('\\' != lastch)));
return ch;
} /* echostring */

/* ---------------- */

int main(void)
{
lastch = '\0';
while (EOF != (ch = fgetc(stdin))) {
if ('/' == lastch)
if (ch == '*') {
lastch = '\0';
if (EOF == stdcomment()) break;
ch = ' ';
putlast();
}
else if (ch == '/') {
lastch = '\0';
if (EOF == eolcomment()) break;
ch = '\n';
putlast(); // Eolcomment here
// Eolcomment line \
with continuation line.
}
else {
putlast();
}
else if (('"' == ch) && ('\\' != lastch)
&& ('\'' != lastch)) {
if ('"' != (ch = echostring())) {
fputs("\"Unterminated\" string\n", stderr);
fputs("checking for\
continuation line string\n", stderr);
fputs("checking for" "concat string\n", stderr);
return EXIT_FAILURE;
}
putlast();
}
else {
putlast();
}
} /* while */
putlast(/* embedded comment */);
return 0;
} /* main */
 
G

Gregory Pietsch

G said:
I wrote the following program to remove C89 type comments from stdin
and send it to stdout (as per exercise in K&R2) and it works but I was
hoping more experienced programmer would critique the layout/style/etc.
[I accidentally posted this before in comp.std.c, not trying to post to
several groups on purpose]

Any comments will be helpful. Thank you.

/*
****************************************************
C89/90 COMMENT REMOVER
======================

Description: This program removes all C89/90 style
commnets from a c source (file).
The program does not support C++/C99
style comments.

Input/Ouput: Input for the c source is taken from
stdin and output is to stdout.
Command-line piping of c source files
is expected.

Author: Gaya Patel

Date: Feb 24, 2005

****************************************************
*/

#include <stdio.h>

#define OUT 0
#define IN 1

int main()
{
int c; /* input character for state machine cycle */

/* state variables */
int comstate = OUT; /* state variable C, in explanation below */
int entry = OUT; /* state variable E, in explanation below */
int quostate = OUT; /* state variable Q, in explanation below */

/* --------------- */

/*************************************************************
OVERVIEW OF STATE MACHINE DESIGN:
*********************************

State machine input: c = getchar()
State machine cycle: on every c = getchar() input

State A: (C:E:Q) = (OUT: OUT: OUT) = (0: 0: 0)
-----------------------------------------------------
If input is " transition to STATE Q
If input is / transition to STATE B
Otherwise no transition

State B: (C:E:Q) = (OUT: IN: OUT) = (0: 1: 0)
-----------------------------------------------------
If input is * transition to STATE C
If input is " transition to STATE Q and putchar('/')
Otherwise transition to STATE A and putchar('/')

State C: (C:E:Q) = (IN: OUT: OUT) = (1: 0: 0)
------------------------------------------------------
If input is * transition to STATE D
Otherwise no transition

State D: (C:E:Q) = (IN: IN: OUT) = (1: 1: 0)
------------------------------------------------------
If input is / transition to STATE D
Otherwise transition to STATE C

State E (C:E:Q) = (IN: IN: IN) = (1: 1: 1)
------------------------------------------------------
Any input transition to A

STATE Q: (C:E:Q) = (OUT: OUT: IN) = (0: 0: 1)
------------------------------------------------------
If input is " transition to STATE A
Otherwise no transition

OUTPUT FORMING LOGIC SPECIAL CASE FOR STATE B
**********************************************
The output forming logic of the state machine checks the
post-transition state variables to decide whether or not
to output the input char, or to discard it (not output).
In the case of post-transition STATE B, at that point, it
is still not sure whether the '/' is the start of a
comment or not. The output forming logic simply discards
this character (assumes start of comment). Therefore,
within the state machine design, an extra '/' is outputted
upon any transition from B other than B->C.

******************************************************************/

while((c = getchar()) != EOF)
{
/* ***** state transition logic ***** */

if(comstate == OUT && entry == OUT && quostate == OUT)
{ /* IF IN STATE A */
if(c == '"')
{
quostate = IN; /* TRANSITION TO STATE Q */
}
else if(c == '/')
{
entry = IN; /* TRANSITION TO STATE B */
}
else
{
/* STAY IN STATE A */
}
}
else if(comstate == OUT && entry == IN && quostate == OUT)
{ /* IF IN STATE B */
if(c == '*')
{
comstate = IN; /* TRANSITION TO STATE C */
entry = OUT;
}
else if('"')
{
quostate = IN; /* TRANSITION TO STATE Q */
entry = OUT;
putchar('/'); /* Print '/' */
}
else
{
entry = OUT; /* TRANSITION TO STATE A */
putchar('/'); /* Print '/' */
}

}
else if(comstate == IN && entry == OUT && quostate == OUT)
{ /* IF IN STATE C */
if(c == '*')
{
entry = IN; /* TRANSITION TO STATE D */
}
else
{
/* STAY IN C */
}
}
else if(comstate == IN && entry == IN && quostate == OUT)
{ /* IF IN STATE D */
if(c == '/')
{
quostate = IN; /* TRANSITION TO STATE E */

}
else
{
entry = OUT; /* TRANSITION TO STATE C */

}
}
else if(comstate == IN && entry == IN && quostate == IN)
{ /* IF IN STATE E */
comstate = OUT;
entry = OUT; /* TRANSITION TO STATE A */
quostate = OUT;
}
else if(quostate == IN) /* IF IN STATE Q */
{
if(c == '"')
{
quostate = OUT; /* TRANSITION TO STATE A */
}
else
{
/* STAY IN STATE Q */
}
}
else
{
/* INVALID STATE */
}

/**********************************************************
OUTPUT FORMING LOGIC
********************
* The output forming logic is simply an if statement
that checks the post-transition state.

* If the state is A or Q, the output is allowed,
otherwise no output occur on that cycle

* See state B's special case in the description of
the state machine (top of module).

*********************************************************/

/* ***** output forming logic ***** */

if(c == '\n')
{
putchar('\n');
}
else if(comstate == OUT && entry == OUT)
{
putchar(c);
}

} /* end of large while loop */

return 0;
}
/**/

Right track, but you need to clean up the execution. And it's too long.

We had a contest in June 2000 regarding this assignment. For your
perusal, and for comparison, here was my entry:

/* Gregory Pietsch <[email protected]> */
#include <stdio.h>
char p[] =
"0/!10\"040\'050.001/011*!21\"/41\'/51./02*!32.!23/ "
"03*!33.!24\"004\\064.045\'005\\075.056.047.05";
int main(){int c,i,d;char s,n;s='0';while((c=getchar())
!=EOF){d=0;for(i=0;p!='\0'&&d==0;i=i+4){if(p==s&&
(p[i+1]==c||p[i+1]=='.')){if(p[i+2]=='0')putchar(c);else
if(p[i+2]=='/'){putchar('/');putchar(c);}else if(p[i+2]
==' ')putchar(' ');n=p[i+3];d=1;}}s=n;}return 0;}

-- Gregory
 
G

G Patel

Thad said:
G said:
I wrote the following program to remove C89 type comments from stdin
and send it to stdout (as per exercise in K&R2) and it works but I was
hoping more experienced programmer would critique the layout/style/etc.
[I accidentally posted this before in comp.std.c, not trying to post to
several groups on purpose]

Any comments will be helpful. Thank you.

As far as style, I think the C, E, and Q binary variables are an
unnecessary complexity. You also describes the states as A, B, C,...
You can simply define an enum with these (or better yet, mnemonic) state
names.

The code fails in several cases:
1. The end of comment in /***/ will not be detected.
2. If fails to detect a comment following /:
a = b//*days*/2;
3. It fails to omit detection of a comment start within a character
constant: '/*' (yes, that's allowed in C90).
4. It fails to protect end of string detection from escaped quotes:
" Not end:\" since quote was escaped".
5. And finally, it fails to handle the trigraph ??/, which translates to
\, thus affecting both comment delimiters and string and character
constant escapes.

Ohh my, I didn't test any of those cases (tested it using interactive
mode :S )

I am not sure what you mean by 3). You have 2 characters inside the
hypens (which is illegal unless they make up an escape sequence, right)
?

Thank you for pointing out the fringe cases (#1 was even in the source
code for the original code, but I didn't use it for testing).
 
G

G Patel

Gregory said:
Right track, but you need to clean up the execution. And it's too long.

We had a contest in June 2000 regarding this assignment. For your
perusal, and for comparison, here was my entry:

/* Gregory Pietsch <[email protected]> */
#include <stdio.h>
char p[] =
"0/!10\"040\'050.001/011*!21\"/41\'/51./02*!32.!23/ "
"03*!33.!24\"004\\064.045\'005\\075.056.047.05";
int main(){int c,i,d;char s,n;s='0';while((c=getchar())
!=EOF){d=0;for(i=0;p!='\0'&&d==0;i=i+4){if(p==s&&
(p[i+1]==c||p[i+1]=='.')){if(p[i+2]=='0')putchar(c);else
if(p[i+2]=='/'){putchar('/');putchar(c);}else if(p[i+2]
==' ')putchar(' ');n=p[i+3];d=1;}}s=n;}return 0;}

-- Gregory


It works, but what happened to the spacing? Where are the comments?
Was the contest based on number of source code lines (irrespective of #
of statement in each line)?
 
R

Rouben Rostamian

Gregory said:
Right track, but you need to clean up the execution. And it's too long.

We had a contest in June 2000 regarding this assignment. For your
perusal, and for comparison, here was my entry:

/* Gregory Pietsch <[email protected]> */
#include <stdio.h>
char p[] =
"0/!10\"040\'050.001/011*!21\"/41\'/51./02*!32.!23/ "
"03*!33.!24\"004\\064.045\'005\\075.056.047.05";
int main(){int c,i,d;char s,n;s='0';while((c=getchar())
!=EOF){d=0;for(i=0;p!='\0'&&d==0;i=i+4){if(p==s&&
(p[i+1]==c||p[i+1]=='.')){if(p[i+2]=='0')putchar(c);else
if(p[i+2]=='/'){putchar('/');putchar(c);}else if(p[i+2]
==' ')putchar(' ');n=p[i+3];d=1;}}s=n;}return 0;}

-- Gregory


It works, but what happened to the spacing? Where are the comments?
Was the contest based on number of source code lines (irrespective of #
of statement in each line)?


Actually Gregory's code is pretty clever, once you de-obfuscate it.

In particular, the code takes care to leave at least one
blank space in place of a comment. Otherwise imagine
what would happen to the input:

int/* this is a comment*/x;
 
G

Gregory Pietsch

Rouben said:
Gregory said:
Right track, but you need to clean up the execution. And it's too long.

We had a contest in June 2000 regarding this assignment. For your
perusal, and for comparison, here was my entry:

/* Gregory Pietsch <[email protected]> */
#include <stdio.h>
char p[] =
"0/!10\"040\'050.001/011*!21\"/41\'/51./02*!32.!23/ "
"03*!33.!24\"004\\064.045\'005\\075.056.047.05";
int main(){int c,i,d;char s,n;s='0';while((c=getchar())
!=EOF){d=0;for(i=0;p!='\0'&&d==0;i=i+4){if(p==s&&
(p[i+1]==c||p[i+1]=='.')){if(p[i+2]=='0')putchar(c);else
if(p[i+2]=='/'){putchar('/');putchar(c);}else if(p[i+2]
==' ')putchar(' ');n=p[i+3];d=1;}}s=n;}return 0;}

-- Gregory


It works, but what happened to the spacing? Where are the comments?
Was the contest based on number of source code lines (irrespective of #
of statement in each line)?


Actually Gregory's code is pretty clever, once you de-obfuscate it.

In particular, the code takes care to leave at least one
blank space in place of a comment. Otherwise imagine
what would happen to the input:

int/* this is a comment*/x;


Actually, the C standard mandates the space. Otherwise, you'd use a
macro with the token pasting operator "##".

Gregory Pietsch
 
G

Gregory Pietsch

G said:
Gregory said:
Right track, but you need to clean up the execution. And it's too long.

We had a contest in June 2000 regarding this assignment. For your
perusal, and for comparison, here was my entry:

/* Gregory Pietsch <[email protected]> */
#include <stdio.h>
char p[] =
"0/!10\"040\'050.001/011*!21\"/41\'/51./02*!32.!23/ "
"03*!33.!24\"004\\064.045\'005\\075.056.047.05";
int main(){int c,i,d;char s,n;s='0';while((c=getchar())
!=EOF){d=0;for(i=0;p!='\0'&&d==0;i=i+4){if(p==s&&
(p[i+1]==c||p[i+1]=='.')){if(p[i+2]=='0')putchar(c);else
if(p[i+2]=='/'){putchar('/');putchar(c);}else if(p[i+2]
==' ')putchar(' ');n=p[i+3];d=1;}}s=n;}return 0;}

-- Gregory


It works, but what happened to the spacing?


What spacing?
Where are the comments?

I ran the code on itself. ;-)
Was the contest based on number of source code lines (irrespective of #
of statement in each line)?

None of the other entries was as short as this.

What tripped me up (and a lot of the other entries) before I fixed mine
was that you could divide by a character (e.g. 6/'\2') or even a string
(e.g. 6/"\2"[0]) even though this, of course, is only of interest to
compiler writers and folks who love obfuscated coding. I only had to
add a few characters to p to cover those two, but others had a problem
with it.

Gregory
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,132
Latest member
TeresaWcq1
Top