How to remove // comments

Richard Heathfield · Oct 20, 2006

Keith Thompson said:

Also, why do you use trigraphs rather than digraphs?

Because not all compiler vendors have caught up with 1995 (let alone 1999!).

Richard Heathfield · Oct 20, 2006

Ben Bacarisse said:

I think there is a deeper irony. Did you relax you compiler options get
this far?
No.

If so, it allowed the non standard nested comment to pass.
I get a syntax error at the word "easy".

Ah, that may explain it. I appear to have omitted to grab that introductory
comment when compiling the code.

Keith Thompson · Oct 20, 2006

Richard Heathfield said:
Walter Bright said:

If trigraphs were *not* supported in the Standard, you'd have a heck of a
job getting the same source base to run on, say, MS-DOS (or, nowadays,
Windows) and MVS. Just because you don't use 'em yourself, that doesn't
mean they're not useful.

The source would have to be translated between EBCDIC and ASCII
anyway. If trigraphs weren't supported by the standard, some other
solution (or even the same one?) would undoubtedly be supported by
mainframe compilers, and there would be utilities that would peform
both EBCDIC<->ASCII translation and whatever mapping is necessary.

Keith Thompson · Oct 20, 2006

jacob navia said:
This is NONSENSE for all users that are NOT EBCDIC and do NOT work in
mainframes. By the way, the venerable 3270 is DEAD SINCE CONCEPTION
and one of the nice things of the microcomputers that appeared in the
eighties was this wonderful KEYBOARDS where we could type any character
we wish... Nice isn't it?

Tell that to "Jalapeno", a real live trigraph user who's been posting
in this very thread.

Walter Bright · Oct 20, 2006

Jalapeno said:
Character translation is only necessary if the text originates on an
ASCII system. Since all the "home grown" code here (and that supplied
by IBM) originates on EBCDIC systems absolutly no translations are
necessary and trigraphs are useful. All the world is not a PC. The
standard acknowledges that. I also understand that you don't find much
reason to have trigraphs supported. Some people use them, a lot. IBM's
Mainframes have'nt disappeared, they've just been renamed "Servers" ;o).

I understand that. My (badly explained) point was that since trigraphs
failed to make C source code portable, trigraphs shouldn't have been
part of the C standard.

Mark McIntyre · Oct 20, 2006

Me neither. But I do not support trigraphs anyway.

Just to be clear, you confirm that your C implementation is
deliberately nonconforming.

They are an unnecessary feature.

And you feel able to speak for the *entire* C programming community
when you make that statement, and the C standards committee of experts
from throughout the world are wrong.

We had several lebgthy discussions about this in
comp.std.c.

No doubt.

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

Mark McIntyre · Oct 20, 2006

I guess you have never seen a system without the following chars in
its char set.

ISTR that Jacob believes that only Intel 32-bit windows platforms
exist, and all other Osen are a figment of everyone's imagination.

I mean, who could possibly build a machine doesn't have a # or {
symbol on the keyboard or in the character set? Other than IBM, Dec
and Apple of course. Who don't exist.

And he wonders why he attracts flames.
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

Mark McIntyre · Oct 20, 2006

By the way, the venerable 3270 is DEAD SINCE CONCEPTION

What a complete mutt you are. There are entire banks out there whose
entire back offices run entirely on IBM mainframes with 3270s hanging
off them, Sure, emulators these days but still 3270s.

and one of the nice things of the microcomputers that appeared in the
eighties was this wonderful KEYBOARDS where we could type any character
we wish..

Go on then, type a # on a UK G3 Apple Mac keyboard. Or on a Tektronix
4100 keyboard, if memory serves me correctly (or was it { and } they
don't have?). And while we're at it, try £ on any US keyboard.
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

Mark McIntyre · Oct 20, 2006

This is NONSENSE

Have you noticed that by making a series of pointless throwaway
inflammatory remarks, you have diverted all attention from your code?

Nobody is bothering to read it any more. Thats a shame as it might
have been interesting.

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

Mark McIntyre · Oct 20, 2006

No. MSVC for instance will pre-proccess your statement to
return 1

Only if invoked in non-conforming mode. Remember that MSVC is not a
C99 compiler, it adheres to C89.

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

jacob navia · Oct 20, 2006

Mark said:
Have you noticed that by making a series of pointless throwaway
inflammatory remarks, you have diverted all attention from your code?

Nobody is bothering to read it any more. Thats a shame as it might
have been interesting.

You are right.

Excuse me for this polemic.

jacob

CBFalconer · Oct 21, 2006

jacob said:
Walter Bright wrote:
.... snip ...

Me neither. But I do not support trigraphs anyway. They are an
unnecessary feature. We had several lebgthy discussions about
this in comp.std.c.

Consider the following scenario. Joe Q Customer has this large
monstrous set of source files, containing a few hundred K lines.
It is C89 compatible, and was used on IBMery or some such without
those characters, so it uses trigraphs throughout. It compiles and
executes correctly on any standards compliant C system.

Now Joe wants to port it to a PC, and he unsuspectingly gets your
compiler to do the job. Many thousands of errors, about 3 or 4 per
line. Will Joe turn to you for any future business? Or will he
run around in circles badmouthing your system? Or do you think he
will laboriously revise all that source to satisfy your peculiar
attitude?

Keith Thompson · Oct 21, 2006

jacob navia said:
You are right.

Excuse me for this polemic.

jacob, this is the second time recently that I've seen you admit to an
error or misjudgement. I just wanted to say, with no sarcasm or
criticism intended, that this is A Good Thing. Thank you.

Walter Bright · Oct 21, 2006

Richard said:
Walter Bright said:

Trigraphs are a worthless feature.

Click to expand...

This "worthless feature" is sometimes the only way you can get C code to
compile on a particular implementation, because the native character set of
the implementation doesn't contain such fancy characters as { or [ - so to
dismiss it as worthless is to display mere parochialism. I've worked on a
system that had no end of trouble with [ and ] but was quite at home with
??( and ??)

EBCDIC is parochialism, not ASCII. ASCII covers 99.99999% of the systems
out there. No sane person is going to invent a new character encoding
that doesn't include ASCII.

Trigraphs would be great if they solved the problem you mentioned. But
they don't. People overwhelmingly write C code using fancy characters {
and [, and that source code fails on EBCDIC systems. You're going to
have to run the source through a translator whether trigraphs are in the
standard or not.

So what have trigraphs in the Standard bought you? Nothing. They don't
even work with RADIX50.

Nevertheless, they are in the standard and C compilers should implement
them. Digital Mars C does.

Walter Bright
www.digitalmars.com C, C++, D programming language compilers

Walter Bright · Oct 21, 2006

jacob said:
Why should *I* bother about that?

Because:

1) It's only about 10-15 lines of code to implement, and that's far
easier than arguing about it.

2) Because standards compliance is important, even if one doesn't agree
with all of it.

jxh · Oct 21, 2006

jacob said:
Recently, a heated debate started because of poor mr heathfield
was unable to compile a program with // comments.

Here is a utility for him, so that he can (at last) compile my
programs

The code below is considerably larger, but it should get the job done.
It actually removes all comments.

--
James

/*
* cstripc: A C program to strip comments from C files.
* Usage:
* cstripc [file [...]]
* cstripc [-t]
*
* The '-t' options is used for testing. It prints some pointers to
strings
* that are interlaced with comment characters.
*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/*****************/
/**** GLOBALS ****/
/*****************/

static const char *progname;
static int debug_flag;

/**********************/
/**** MAIN PROGRAM ****/
/**********************/

static void print_usage(void);
static void print_test(void);

static FILE * open_input_file(const char *filename);
static void close_input_file(FILE *infile);
static void parse_input_file(FILE *infile);

int
main(int argc, char *argv[])
{
progname = argv[0];
if (progname == 0) {
progname = "cstripc";
}

while (argc > 1) {

if ((*argv[1] != '-') || (strcmp(argv[1], "-") == 0)) {
break;
}

if (strcmp(argv[1], "-t") == 0) {
print_test();
exit(0);
} else if (strcmp(argv[1], "-d") == 0) {
debug_flag = 1;
} else {
fprintf(stderr, "%s: Unrecognized option '%s'\n",
progname, argv[1]);
print_usage();
exit(EXIT_FAILURE);
}

--argc;
++argv;
}

if (argc <= 1) {
parse_input_file(stdin);
exit(0);
}

while (argc > 1) {
FILE *infile;

parse_input_file(infile = open_input_file(argv[1]));
close_input_file(infile);

--argc;
++argv;
}
}

/**************************/
/**** PRINT USAGE/TEST ****/
/**************************/

static const char *usage_string =
"%s: A C program to strip comments from C files.\n"
"Usage:\n"
" %s [file [...]]\n"
" %s [-t]\n"
"\n"
"The '-t' options is used for testing. It prints some pointers to
strings\n"
"that are interlaced with comment characters.\n"
;

static void
print_usage(void)
{
fprintf(stderr, usage_string, progname, progname, progname);
}

static const char *a;
static const char *b;
static const char *c;

static void
print_test(void)
{
if (a) puts(a);
if (b) puts(b);
if (c) puts(c);
}

/*******************************/
/**** OPEN/CLOSE INPUT FILE ****/
/*******************************/

static const char *input_file_name;

static FILE *
open_input_file(const char *filename)
{
FILE *infile;

input_file_name = filename;

if (filename == 0) {
return 0;
}

if (strcmp(filename, "-") == 0) {
return stdin;
}

infile = fopen(filename, "r");
if (infile == 0) {
fprintf(stderr, "%s: Could not open '%s' for reading.\n",
progname, filename);
}

return infile;
}

static void
close_input_file(FILE *infile)
{
if (infile) {
if (infile != stdin) {
if (fclose(infile) == EOF)
fprintf(stderr, "%s, Could not close '%s'.\n",
progname, input_file_name);
} else {
clearerr(stdin);
}
}
}

/**************************/
/**** PARSE INPUT FILE ****/
/**************************/

typedef struct scan_state scan_state;
typedef struct scan_context scan_context;

struct scan_context {
const scan_state *ss;
char *sbuf;
unsigned sbufsz;
unsigned sbufcnt;
};

struct scan_state {
const scan_state *(*scan)(scan_context *ctx, int input);
const char *name;
};

static scan_context initial_scan_context;

static void
parse_input_file(FILE *infile)
{
int c;
scan_context ctx;

if (infile == 0) {
return;
}

ctx = initial_scan_context;

while ((c = fgetc(infile)) != EOF) {
if (debug_flag) {
fprintf(stderr, "%s\n", ctx.ss->name);
}
ctx.ss = ctx.ss->scan(&ctx, c);
}
}

/***********************/
/**** STATE MACHINE ****/
/***********************/

/*
*

***************************************************************************
* Assume input is a syntactically correct C program.
*
* The basic algorithm is:
* Scan character by character:
* Treat trigraphs as a single character.
* If the sequence does not start a comment, emit the sequence.
* Otherwise,
* Scan character by character:
* Treat trigraphs as a single character.
* Treat the sequence '\\' '\n' as no character.
* If the sequence does not end a comment, continue consuming.
* Otherwise, emit a space, and loop back to top.

***************************************************************************
*
*/

#define SCAN_STATE_DEFINE(name) \
static const scan_state * name##_func(scan_context *ctx, int input); \
static const scan_state name##_state = { name##_func, #name }

SCAN_STATE_DEFINE(normal);
SCAN_STATE_DEFINE(normal_maybe_tri_1);
SCAN_STATE_DEFINE(normal_maybe_tri_2);
SCAN_STATE_DEFINE(string);
SCAN_STATE_DEFINE(string_maybe_tri_1);
SCAN_STATE_DEFINE(string_maybe_tri_2);
SCAN_STATE_DEFINE(string_maybe_splice);
SCAN_STATE_DEFINE(char);
SCAN_STATE_DEFINE(char_maybe_tri_1);
SCAN_STATE_DEFINE(char_maybe_tri_2);
SCAN_STATE_DEFINE(char_maybe_splice);
SCAN_STATE_DEFINE(slash);
SCAN_STATE_DEFINE(slash_maybe_tri_1);
SCAN_STATE_DEFINE(slash_maybe_tri_2);
SCAN_STATE_DEFINE(slash_maybe_splice);
SCAN_STATE_DEFINE(slashslash);
SCAN_STATE_DEFINE(slashslash_maybe_tri_1);
SCAN_STATE_DEFINE(slashslash_maybe_tri_2);
SCAN_STATE_DEFINE(slashslash_maybe_splice);
SCAN_STATE_DEFINE(slashsplat);
SCAN_STATE_DEFINE(slashsplat_splat);
SCAN_STATE_DEFINE(slashsplat_splat_maybe_tri_1);
SCAN_STATE_DEFINE(slashsplat_splat_maybe_tri_2);
SCAN_STATE_DEFINE(slashsplat_splat_maybe_splice);

#define SCAN_STATE(name) (&name##_state)

static scan_context initial_scan_context = { SCAN_STATE(normal), 0, 0,
0 };

static void sbuf_append_char(scan_context *ctx, int c);
static void sbuf_append_string(scan_context *ctx, char *s);
static void sbuf_clear(scan_context *ctx);
static void sbuf_emit(scan_context *ctx);

static const scan_state *
normal_func(scan_context *ctx, int input)
{
switch (input) {
case '?': sbuf_emit(ctx);
sbuf_append_char(ctx, input);
return SCAN_STATE(normal_maybe_tri_1);
case '"': sbuf_emit(ctx);
putchar(input);
return SCAN_STATE(string);
case '\'': sbuf_emit(ctx);
putchar(input);
return SCAN_STATE(char);
case '/': sbuf_emit(ctx);
sbuf_append_char(ctx, input);
return SCAN_STATE(slash);
default: sbuf_emit(ctx);
putchar(input);
return SCAN_STATE(normal);
}
}

static const scan_state *
normal_maybe_tri_1_func(scan_context *ctx, int input)
{
switch (input) {
case '?': sbuf_append_char(ctx, input);
return SCAN_STATE(normal_maybe_tri_2);
default: sbuf_emit(ctx);
return SCAN_STATE(normal)->scan(ctx, input);
}
}

static const scan_state *
normal_maybe_tri_2_func(scan_context *ctx, int input)
{
switch (input) {
case '?': putchar(input);
return SCAN_STATE(normal_maybe_tri_2);
case '=':
case '(':
case ')':
case '<':
case '>':
case '!':
case '\'':
case '-':
case '/': sbuf_emit(ctx);
putchar(input);
return SCAN_STATE(normal);
default: sbuf_emit(ctx);
return SCAN_STATE(normal)->scan(ctx, input);
}
}

static const scan_state *
string_func(scan_context *ctx, int input)
{
switch (input) {
case '?': sbuf_emit(ctx);
sbuf_append_char(ctx, input);
return SCAN_STATE(string_maybe_tri_1);
case '"': sbuf_emit(ctx);
putchar(input);
return SCAN_STATE(normal);
case '\\': sbuf_emit(ctx);
sbuf_append_char(ctx, input);
return SCAN_STATE(string_maybe_splice);
default: sbuf_emit(ctx);
putchar(input);
return SCAN_STATE(string);
}
}

static const scan_state *
string_maybe_tri_1_func(scan_context *ctx, int input)
{
switch (input) {
case '?': sbuf_append_char(ctx, input);
return SCAN_STATE(string_maybe_tri_2);
default: sbuf_emit(ctx);
return SCAN_STATE(string)->scan(ctx, input);
}
}

static const scan_state *
string_maybe_tri_2_func(scan_context *ctx, int input)
{
switch (input) {
case '?': putchar(input);
return SCAN_STATE(string_maybe_tri_2);
case '/': sbuf_append_char(ctx, input);
return SCAN_STATE(string_maybe_splice);
case '=':
case '(':
case ')':
case '<':
case '>':
case '!':
case '\'':
case '-': sbuf_emit(ctx);
putchar(input);
return SCAN_STATE(string);
default: sbuf_emit(ctx);
return SCAN_STATE(string)->scan(ctx, input);
}
}

static const scan_state *
string_maybe_splice_func(scan_context *ctx, int input)
{
switch (input) {
case '\n':
default: sbuf_emit(ctx);
putchar(input);
return SCAN_STATE(string);
}
}

static const scan_state *
char_func(scan_context *ctx, int input)
{
switch (input) {
case '?': sbuf_emit(ctx);
sbuf_append_char(ctx, input);
return SCAN_STATE(char_maybe_tri_1);
case '\'': sbuf_emit(ctx);
putchar(input);
return SCAN_STATE(normal);
case '\\': sbuf_emit(ctx);
sbuf_append_char(ctx, input);
return SCAN_STATE(char_maybe_splice);
default: sbuf_emit(ctx);
putchar(input);
return SCAN_STATE(char);
}
}

static const scan_state *
char_maybe_tri_1_func(scan_context *ctx, int input)
{
switch (input) {
case '?': sbuf_append_char(ctx, input);
return SCAN_STATE(char_maybe_tri_2);
default: sbuf_emit(ctx);
return SCAN_STATE(char)->scan(ctx, input);
}
}

static const scan_state *
char_maybe_tri_2_func(scan_context *ctx, int input)
{
switch (input) {
case '?': putchar(input);
return SCAN_STATE(char_maybe_tri_2);
case '/': sbuf_append_char(ctx, input);
return SCAN_STATE(char_maybe_splice);
case '=':
case '(':
case ')':
case '<':
case '>':
case '!':
case '\'':
case '-': sbuf_emit(ctx);
putchar(input);
return SCAN_STATE(char);
default: sbuf_emit(ctx);
return SCAN_STATE(char)->scan(ctx, input);
}
}

static const scan_state *
char_maybe_splice_func(scan_context *ctx, int input)
{
switch (input) {
case '\n':
default: sbuf_emit(ctx);
putchar(input);
return SCAN_STATE(char);
}
}

static const scan_state *
slash_func(scan_context *ctx, int input)
{
switch (input) {
case '?': sbuf_append_char(ctx, input);
return SCAN_STATE(slash_maybe_tri_1);
case '\\': sbuf_append_char(ctx, input);
return SCAN_STATE(slash_maybe_splice);
case '/': sbuf_clear(ctx);
return SCAN_STATE(slashslash);
case '*': sbuf_clear(ctx);
return SCAN_STATE(slashsplat);
default: sbuf_emit(ctx);
return SCAN_STATE(normal)->scan(ctx, input);
}
}

static const scan_state *
slash_maybe_tri_1_func(scan_context *ctx, int input)
{
switch (input) {
case '?': return SCAN_STATE(slash_maybe_tri_2);
default: sbuf_emit(ctx);
return SCAN_STATE(normal)->scan(ctx, input);
}
}

static const scan_state *
slash_maybe_tri_2_func(scan_context *ctx, int input)
{
switch (input) {
case '?': sbuf_emit(ctx);
sbuf_append_string(ctx, "??");
return SCAN_STATE(normal_maybe_tri_2);
case '/': sbuf_append_char(ctx, '?');
sbuf_append_char(ctx, input);
return SCAN_STATE(slash_maybe_splice);
case '=':
case '(':
case ')':
case '<':
case '>':
case '!':
case '\'':
case '-': sbuf_append_char(ctx, '?');
sbuf_append_char(ctx, input);
sbuf_emit(ctx);
return SCAN_STATE(normal);
default: sbuf_append_char(ctx, '?');
sbuf_emit(ctx);
return SCAN_STATE(normal)->scan(ctx, input);
}
}

static const scan_state *
slash_maybe_splice_func(scan_context *ctx, int input)
{
switch (input) {
case '\n': sbuf_append_char(ctx, input);
return SCAN_STATE(slash);
default: sbuf_emit(ctx);
return SCAN_STATE(normal)->scan(ctx, input);
}
}

static const scan_state *
slashslash_func(scan_context *ctx, int input)
{
/* UNUSED */ ctx = ctx;
switch (input) {
case '?': return SCAN_STATE(slashslash_maybe_tri_1);
case '\\': return SCAN_STATE(slashslash_maybe_splice);
case '\n': putchar(' ');
putchar(input);
return SCAN_STATE(normal);
default: return SCAN_STATE(slashslash);
}
}

static const scan_state *
slashslash_maybe_tri_1_func(scan_context *ctx, int input)
{
switch (input) {
case '?': return SCAN_STATE(slashslash_maybe_tri_2);
default: return SCAN_STATE(slashslash)->scan(ctx, input);
}
}

static const scan_state *
slashslash_maybe_tri_2_func(scan_context *ctx, int input)
{
switch (input) {
case '?': return SCAN_STATE(slashslash_maybe_tri_2);
case '/': return SCAN_STATE(slashslash_maybe_splice);
case '=':
case '(':
case ')':
case '<':
case '>':
case '!':
case '\'':
case '-': return SCAN_STATE(slashslash);
default: return SCAN_STATE(slashslash)->scan(ctx, input);
}
}

static const scan_state *
slashslash_maybe_splice_func(scan_context *ctx, int input)
{
switch (input) {
case '\n': return SCAN_STATE(slashslash);
default: return SCAN_STATE(slashslash)->scan(ctx, input);
}
}

static const scan_state *
slashsplat_func(scan_context *ctx, int input)
{
/* UNUSED */ ctx = ctx;
switch (input) {
case '*': return SCAN_STATE(slashsplat_splat);
default: return SCAN_STATE(slashsplat);
}
}

static const scan_state *
slashsplat_splat_func(scan_context *ctx, int input)
{
switch (input) {
case '?': return SCAN_STATE(slashsplat_splat_maybe_tri_1);
case '\\': return SCAN_STATE(slashsplat_splat_maybe_splice);
case '/': putchar(' ');
return SCAN_STATE(normal);
default: return SCAN_STATE(slashsplat)->scan(ctx, input);
}
}

static const scan_state *
slashsplat_splat_maybe_tri_1_func(scan_context *ctx, int input)
{
switch (input) {
case '?': return SCAN_STATE(slashsplat_splat_maybe_tri_2);
default: return SCAN_STATE(slashsplat)->scan(ctx, input);
}
}

static const scan_state *
slashsplat_splat_maybe_tri_2_func(scan_context *ctx, int input)
{
switch (input) {
case '/': return SCAN_STATE(slashsplat_splat_maybe_splice);
case '=':
case '(':
case ')':
case '<':
case '>':
case '!':
case '\'':
case '-': return SCAN_STATE(slashsplat);
default: return SCAN_STATE(slashsplat)->scan(ctx, input);
}
}

static const scan_state *
slashsplat_splat_maybe_splice_func(scan_context *ctx, int input)
{
switch (input) {
case '\n': return SCAN_STATE(slashsplat_splat);
default: return SCAN_STATE(slashsplat)->scan(ctx, input);
}
}

/*************************/
/**** BUFFER HANDLING ****/
/*************************/

static void
sbuf_append_char(scan_context *ctx, int c)
{
if (ctx->sbuf == 0) {
ctx->sbuf = malloc(ctx->sbufsz = 128);
} else if (ctx->sbufcnt == ctx->sbufsz) {
char *p = realloc(ctx->sbuf, ctx->sbufsz *= 2);
if (p == 0) {
fprintf(stderr, "%s: memory allocation failure\n",
progname);
exit(EXIT_FAILURE);
}
ctx->sbuf = p;
}

ctx->sbuf[ctx->sbufcnt++] = c;
ctx->sbuf[ctx->sbufcnt] = '\0';
}

static void
sbuf_append_string(scan_context *ctx, char *s)
{
while (*s != '\0') {
sbuf_append_char(ctx, *s++);
}
}

static void
sbuf_clear(scan_context *ctx)
{
ctx->sbufcnt = 0;
if (ctx->sbuf) {
ctx->sbuf[ctx->sbufcnt] = '\0';
}
}

static void
sbuf_emit(scan_context *ctx)
{
if (ctx->sbuf == 0 || ctx->sbufcnt == 0) {
return;
}

printf("%s", ctx->sbuf);
sbuf_clear(ctx);
}

/********************/
/**** TEST CASES ****/
/********************/

/* a comment */
/\
* a comment split */
/\
\
* a comment split twice */
/*
block comment
*/
/* comment, trailing delimiter split *\
/
/* comment, trailing delimiter split twice *\
\
/
/* comment, trailing delimiter split once, and again by trigraph *\
??/
/

static const char *a = /* comment in code line "*/"Hello,
"/**/"World!";
static const char *b = /\
* comment on code line split */ "Hello, " /\
\
* comment on code line split twice */ "World!";

#define FOO ??/* this does not start a comment */

#if defined(__STDC__) && (__STDC__ == 1)
#if defined(__STD_VERSION__) && (__STD_VERSION__ >= 199901L)
//*** MORE TEST CASES ***//
/\
/ // comment split
/\
\
/ // comment split twice
static const char *c = // // comment on code line
"Hello, " /\
/ // comment on code line split
"World!" /\
\
/ // comment on code line split twice.
;

#define BAR ??// this does not start a comment

// This is a // comment \
on two lines

#else
static const char *c = "STDC without STD_VERSION";
#endif
#endif

CBFalconer · Oct 21, 2006

jxh said:
The code below is considerably larger, but it should get the job
done. It actually removes all comments.

.... snip code ...

If you just want to delete all comments, my public domain uncmnt.c
is considerably shorter. 109 lines in place of your 740 odd. It
doesn't handle trigraphs. It does maintain the original line
numbering. See:

<http://cbfalconer.home.att.net/download/>

It should be fairly easily modified to convert the comments.

Richard Heathfield · Oct 21, 2006

Walter Bright said:

Richard said:
Richard said:

Walter Bright said:

Trigraphs are a worthless feature.

Click to expand...

This "worthless feature" is sometimes the only way you can get C code to
compile on a particular implementation, because the native character set
of the implementation doesn't contain such fancy characters as { or [ -
so to dismiss it as worthless is to display mere parochialism. I've
worked on a system that had no end of trouble with [ and ] but was quite
at home with ??( and ??)

Click to expand...

EBCDIC is parochialism, not ASCII.

I didn't say ASCII was parochialism. I said that an attitude that assumes it
is.

ASCII covers 99.99999% of the systems
out there.

Nevertheless, there are still an awful lot of mainframes around, and they
are a very important part of the C world.

No sane person is going to invent a new character encoding
that doesn't include ASCII.

....unless it makes business sense or technical sense to do that, which it
might, one day. (The Microsoft Office guys had much the same opinion of int
- "the compiler guys wouldn't change the size of an int on us - they know
it'd break all our code", but the compiler guys changed it anyway.

Trigraphs would be great if they solved the problem you mentioned. But
they don't. People overwhelmingly write C code using fancy characters {
and [, and that source code fails on EBCDIC systems. You're going to
have to run the source through a translator whether trigraphs are in the
standard or not.

That's mostly true, yes, although I did work on one site which required the
programmers to use trigraphs in their code (which was written and debugged
on PCs before being moved up to the mainframe for testing).

CBFalconer · Oct 21, 2006

Richard said:
.... snip ...

That's mostly true, yes, although I did work on one site which
required the programmers to use trigraphs in their code (which
was written and debugged on PCs before being moved up to the
mainframe for testing).

A useful pair of filter utilities would be:

entrigph
untrigph

I don't know if it is possible to cater to all possible source.

Peter Nilsson · Oct 21, 2006

Keith said:
Fascinating. There have been raging arguments about trigraphs both
here and in comp.std.c for years. I think you're the first person
I've seen who actually *uses* them.

Old Mac programmers (pre OS-X) certainly new of the ??' trigraph
because
it cropped up in the multibyte character constant '????' that was used
as a
default file type. Even though such code is obviously platform
specific, you
would still see the better quality programs using '???\?' to avoid
potential
trigraph translation.

// comments	35	Apr 26, 2008
A simple parser	121	Oct 14, 2006
Text processing	29	Sep 26, 2011
Command Line Arguments	0	Mar 7, 2023
Working with files	1	Dec 10, 2021
Serial port	5	Jun 2, 2013
hexump.c	79	Sep 9, 2011
Taking a stab at getline	40	Feb 7, 2013

How to remove // comments

Richard Heathfield

Richard Heathfield

Keith Thompson

Keith Thompson

Walter Bright

Mark McIntyre

Mark McIntyre

Mark McIntyre

Mark McIntyre

Mark McIntyre

jacob navia

CBFalconer

Keith Thompson

Walter Bright

Walter Bright

jxh

CBFalconer

Richard Heathfield

CBFalconer

Peter Nilsson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads