Something about stripping C/C++ comments in perldoc

X

Xicheng Jia

Hi folks:

I am recently reading Jeffery Friedl's book "Mastering Regular
Expressions"(O'Reilly, 2nd edition), and found that something in
perldoc might be out of date and not fully updated with Perl's
development.

perldoc -q comment

this gives me a C comments stripper(created by Jeffrey Friedl and later
modified by Fred Curtis.):

s#/\*[^*]*\*+([^/*][^*]*\*+)*/¦("(\\.¦[^"\\])*"¦'(\\.¦[^'\\])*'¦.[^/"'\\]*)#defined
$2 ? $2 : ""#gse;

I think there are several parts which are not optimized or can be
simplified from Perl regex's flavor:

1) /\*[^*]*\*+([^/*][^*]*\*+)*/
this pattern is to remove a normal C comment in form of /* ..... */,
which is developed when there is no lazy quantifiers. As Jeffery
metioned in his book, a much simpler pattern can be:
/\*.*?\*/ and this one is obviously much easier to be understood..

2) "(\\.¦[^"\\])*"
this pattern is to capture all contents in a C string(double-quoted
stuff), and the unrolling version of this pattern
"[^"\\]*(?:\\.[^"\\]*)*" developed by Jeffery can be much more
efficient(as he mentioned in his book). A similar approach can be done
with the single-quoted stuff..

3) several non-capturing parentheses could be modified to(?: ) form
which can somehow optimize the performace of the regex.

According to the above, some modification can be made, and the s///
expression can be written to, i.e.:

s#/\*.*?\*/|//.*?\n|("[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*'|[^'"/]+)#
$1 or "" #gse

or in another form:

s{
/\*.*?\*/ ## strip normal C comments
| ## or
//[^\n]* ## strip C++ comments
| ## or
( ## capture $1
"[^"\\]*(?:\\.[^"\\]*)*" ## double-quoted stuff
| ## or
'[^'\\]*(?:\\.[^'\\]*)*' ## single-quoted stuff
| ## or
[^"'/]+ ## strings that guarantee a non-comment
) ## end of capturing $1
}{ $1 or "" }gsxe

which I think might be better than the one in 'perldoc -q comment'.. I
didnt do very much experiment on this s/// expressions though. Just
some of my $0.02.. Thanks for any comments,

Xicheng
=====
USENET is a classroom, for me.:)
 
X

Xicheng Jia

Xicheng said:
Hi folks:

I am recently reading Jeffery Friedl's book "Mastering Regular
Expressions"(O'Reilly, 2nd edition), and found that something in
perldoc might be out of date and not fully updated with Perl's
development.

perldoc -q comment

this gives me a C comments stripper(created by Jeffrey Friedl and later
modified by Fred Curtis.):

s#/\*[^*]*\*+([^/*][^*]*\*+)*/¦("(\\.¦[^"\\])*"¦'(\\.¦[^'\\])*'¦.[^/"'\\]*)#defined
$2 ? $2 : ""#gse;

I think there are several parts which are not optimized or can be
simplified from Perl regex's flavor:

1) /\*[^*]*\*+([^/*][^*]*\*+)*/
this pattern is to remove a normal C comment in form of /* ..... */,
which is developed when there is no lazy quantifiers. As Jeffery
metioned in his book, a much simpler pattern can be:
/\*.*?\*/ and this one is obviously much easier to be understood..

2) "(\\.¦[^"\\])*"
this pattern is to capture all contents in a C string(double-quoted
stuff), and the unrolling version of this pattern
"[^"\\]*(?:\\.[^"\\]*)*" developed by Jeffery can be much more
efficient(as he mentioned in his book). A similar approach can be done
with the single-quoted stuff..

3) several non-capturing parentheses could be modified to(?: ) form
which can somehow optimize the performace of the regex.

According to the above, some modification can be made, and the s///
expression can be written to, i.e.:

=>
s#/\*.*?\*/|//.*?\n|("[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*'|[^'"/]+)#
=> $1 or "" #gse

//.*?\n should be //[^\n]*

and an one-liner testing line under Linux can roughly be written as:
(note: removed all single-quote testing part):

perl -0777pe '
s#/\*.*?\*/|//[^\n]*|("[^"\\]*(?:\\.[^"\\]*)*"| [^"/]+)# $1 or ""
#gse
' myfile.cpp

Xicheng
or in another form:

s{
/\*.*?\*/ ## strip normal C comments
| ## or
//[^\n]* ## strip C++ comments
| ## or
( ## capture $1
"[^"\\]*(?:\\.[^"\\]*)*" ## double-quoted stuff
| ## or
'[^'\\]*(?:\\.[^'\\]*)*' ## single-quoted stuff
| ## or
[^"'/]+ ## strings that guarantee a non-comment
) ## end of capturing $1
}{ $1 or "" }gsxe

which I think might be better than the one in 'perldoc -q comment'.. I
didnt do very much experiment on this s/// expressions though. Just
some of my $0.02.. Thanks for any comments,

Xicheng
=====
USENET is a classroom, for me.:)
 
L

Lukas Mai

Xicheng Jia said:
s#/\*[^*]*\*+([^/*][^*]*\*+)*/¦("(\\.¦[^"\\])*"¦'(\\.¦[^'\\])*'¦.[^/"'\\]*)#defined
$2 ? $2 : ""#gse;

I think there are several parts which are not optimized or can be
simplified from Perl regex's flavor:

1) /\*[^*]*\*+([^/*][^*]*\*+)*/
this pattern is to remove a normal C comment in form of /* ..... */,
which is developed when there is no lazy quantifiers. As Jeffery
metioned in his book, a much simpler pattern can be:
/\*.*?\*/ and this one is obviously much easier to be understood..

There's rumors on the internets that non-greedy quantifiers are slower
than their normal counterparts. I don't know if that's true, but .*?
still feels "unclean" to me.

[other improvements]
s#/\*.*?\*/|//.*?\n|("[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*'|[^'"/]+)#
$1 or "" #gse

or in another form:

s{
/\*.*?\*/ ## strip normal C comments
| ## or
//[^\n]* ## strip C++ comments
| ## or
( ## capture $1
"[^"\\]*(?:\\.[^"\\]*)*" ## double-quoted stuff
| ## or
'[^'\\]*(?:\\.[^'\\]*)*' ## single-quoted stuff
| ## or
[^"'/]+ ## strings that guarantee a non-comment
) ## end of capturing $1
}{ $1 or "" }gsxe

which I think might be better than the one in 'perldoc -q comment'.. I
didnt do very much experiment on this s/// expressions though. Just
some of my $0.02.. Thanks for any comments,

Of course, this regex is still incomplete because it completely ignores
trigraphs and continuation lines:

/??/
* this is a comment */

??/ is a trigraph for \, \<newline> is removed, then /* ... */ is parsed
as a comment.

Another problem is that comments are semantically equivalent to
whitespace, so something like "int/**/main" should turn into "int main",
not "intmain".

Here's my own version:

#!/usr/local/bin/perl -p0777

s!
/
(?: (?: \\ | \?\?/ ) \n )*
(?:
/
(?:
(?: \\ | \?\?/ ) \n
|
[^\n]
)*
|
\*
[^*]* \*+
(?: (?: \\ | \?\?/ ) \n )*
(?:
[^/*]
[^*]* \*+
(?: (?: \\ | \?\?/ ) \n )*
)*
(/)
)
|
(
"
(?:
(?: \\ | \?\?/ ) .
|
[^"]
)*
"
|
'
(?:
(?: \\ | \?\?/ ) .
|
[^']
)*
'
|
. [^'"/]*
)
!(defined $1 ? ' ' : '') . $2!gsex
__END__
 
R

robic0

Hi folks:

I am recently reading Jeffery Friedl's book "Mastering Regular
Expressions"(O'Reilly, 2nd edition), and found that something in
perldoc might be out of date and not fully updated with Perl's
development.

perldoc -q comment

this gives me a C comments stripper(created by Jeffrey Friedl and later
modified by Fred Curtis.):

s#/\*[^*]*\*+([^/*][^*]*\*+)*/¦("(\\.¦[^"\\])*"¦'(\\.¦[^'\\])*'¦.[^/"'\\]*)#defined
$2 ? $2 : ""#gse;

I think there are several parts which are not optimized or can be
simplified from Perl regex's flavor:

1) /\*[^*]*\*+([^/*][^*]*\*+)*/
this pattern is to remove a normal C comment in form of /* ..... */,
which is developed when there is no lazy quantifiers. As Jeffery
metioned in his book, a much simpler pattern can be:
/\*.*?\*/ and this one is obviously much easier to be understood..

2) "(\\.¦[^"\\])*"
this pattern is to capture all contents in a C string(double-quoted
stuff), and the unrolling version of this pattern
"[^"\\]*(?:\\.[^"\\]*)*" developed by Jeffery can be much more
efficient(as he mentioned in his book). A similar approach can be done
with the single-quoted stuff..

3) several non-capturing parentheses could be modified to(?: ) form
which can somehow optimize the performace of the regex.

According to the above, some modification can be made, and the s///
expression can be written to, i.e.:

s#/\*.*?\*/|//.*?\n|("[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*'|[^'"/]+)#
$1 or "" #gse

or in another form:

s{
/\*.*?\*/ ## strip normal C comments
| ## or
//[^\n]* ## strip C++ comments

what if you have something like this:
// comments: ... /*<newline>
some code example <newline>
more code // embedded comment, code example <newline>
/* more code and comments <newline>
*/ <newline>
// comments <newline>
*/ <newline>
It does matter if it won't compile but then you have to invoke
the compiler and parse its output.

The opposite construction as well:
/* .... /* .... */ this is left in */
| ## or
( ## capture $1
"[^"\\]*(?:\\.[^"\\]*)*" ## double-quoted stuff
| ## or
'[^'\\]*(?:\\.[^'\\]*)*' ## single-quoted stuff
| ## or
[^"'/]+ ## strings that guarantee a non-comment
) ## end of capturing $1
}{ $1 or "" }gsxe

which I think might be better than the one in 'perldoc -q comment'.. I
didnt do very much experiment on this s/// expressions though. Just
some of my $0.02.. Thanks for any comments,

Xicheng
=====
USENET is a classroom, for me.:)

I don't know if the above won't compile on todays compilers, it didn't (if the
code was bad) on Vc6 and below.
For '//' the end delimeter might be the eol or if continuations are allowed,
the eol on the next line. A 'rolling' regexp parse (global) will have problems
with nesting.

Doesen't seem to be a defined standard on comments. There may be, dunno.
XML runs into the same problem with COMMENT/CDATA statement.
The difference might be that the XML standard has clearly defined path
of precedence. Its chiseled in stone. There's no ambiguity.
Your code may work for perfectly constructed comments (the ones that compile within
C/C++ code) as the idea of such exists in your mind, but don't fool yourself as to
the flaws in this regexp.

Its not really flawed for what it does, in your mind,
its that the idea is a conceptual *error*.

The glaring flaw is that s///g is not compatable with this, *if* nesting will
be taken into account and allowed. I don't think there's a standards commitee for
C/C++ comments, compilers give you what you get.

If you want to persue an *all cases* approach check the just posted RXParse xml parser on how it
effectively deals with COMMENT/CDATA.
 
R

robic0

Xicheng Jia said:
s#/\*[^*]*\*+([^/*][^*]*\*+)*/¦("(\\.¦[^"\\])*"¦'(\\.¦[^'\\])*'¦.[^/"'\\]*)#defined
$2 ? $2 : ""#gse;

I think there are several parts which are not optimized or can be
simplified from Perl regex's flavor:

1) /\*[^*]*\*+([^/*][^*]*\*+)*/
this pattern is to remove a normal C comment in form of /* ..... */,
which is developed when there is no lazy quantifiers. As Jeffery
metioned in his book, a much simpler pattern can be:
/\*.*?\*/ and this one is obviously much easier to be understood..

There's rumors on the internets that non-greedy quantifiers are slower
than their normal counterparts. I don't know if that's true, but .*?
still feels "unclean" to me.
Hogwash!!!


[other improvements]
s#/\*.*?\*/|//.*?\n|("[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*'|[^'"/]+)#
$1 or "" #gse

or in another form:

s{
/\*.*?\*/ ## strip normal C comments
| ## or
//[^\n]* ## strip C++ comments
| ## or
( ## capture $1
"[^"\\]*(?:\\.[^"\\]*)*" ## double-quoted stuff
| ## or
'[^'\\]*(?:\\.[^'\\]*)*' ## single-quoted stuff
| ## or
[^"'/]+ ## strings that guarantee a non-comment
) ## end of capturing $1
}{ $1 or "" }gsxe

which I think might be better than the one in 'perldoc -q comment'.. I
didnt do very much experiment on this s/// expressions though. Just
some of my $0.02.. Thanks for any comments,

Of course, this regex is still incomplete because it completely ignores
trigraphs and continuation lines:
huh, trigraphs?
/??/
* this is a comment */

??/ is a trigraph for \, \<newline> is removed, then /* ... */ is parsed
as a comment.
huh?

Another problem is that comments are semantically equivalent to
whitespace, so something like "int/**/main" should turn into "int main",
not "intmain".

int/**/main doesen't compile on my machine
Here's my own version:

#!/usr/local/bin/perl -p0777

s!
/
(?: (?: \\ | \?\?/ ) \n )*
(?:
/
(?:
(?: \\ | \?\?/ ) \n
|
[^\n]
)*
|
\*
[^*]* \*+
(?: (?: \\ | \?\?/ ) \n )*
(?:
[^/*]
[^*]* \*+
(?: (?: \\ | \?\?/ ) \n )*
)*
(/)
)
|
(
"
(?:
(?: \\ | \?\?/ ) .
|
[^"]
)*
"
|
'
(?:
(?: \\ | \?\?/ ) .
|
[^']
)*
'
|
. [^'"/]*
)
!(defined $1 ? ' ' : '') . $2!gsex
__END__

/* .... /* .... */ whats this? */
 
X

Xicheng Jia

Lukas said:
Xicheng Jia said:
s#/\*[^*]*\*+([^/*][^*]*\*+)*/¦("(\\.¦[^"\\])*"¦'(\\.¦[^'\\])*'¦.[^/"'\\]*)#defined
$2 ? $2 : ""#gse;

I think there are several parts which are not optimized or can be
simplified from Perl regex's flavor:

1) /\*[^*]*\*+([^/*][^*]*\*+)*/
this pattern is to remove a normal C comment in form of /* ..... */,
which is developed when there is no lazy quantifiers. As Jeffery
metioned in his book, a much simpler pattern can be:
/\*.*?\*/ and this one is obviously much easier to be understood..

=> There's rumors on the internets that non-greedy quantifiers are
slower
=> than their normal counterparts. I don't know if that's true, but .*?
=> still feels "unclean" to me.
From Jeffery's book "Mastering Regular Expressions" (2nd edition
O'Reilly)

"Lazy versus Greedy": Page 256
"It's not always obvious which is best......... If the data is random,
and you have no idea which will be more likely, use a greedy
quantifier, as they are generally optimized a bit better than
non-greedy quantifier, especially when what follows in the regex
disallows the character following lazy quantifier
optimization(page-249)."


"Specific versus Lazy" page 257
"Generally, using a negated class is much more efficient than a lazy
quantifier. One exception is Perl, because it has that character
following lazy quantifier optimization"
From the above, because Perl supports "character following lazy
quantifier optimization", I feel that non-greedy quantifiers are not as
bad as the rumors you heard. :)

moreover, In Jeffery's book page 272 the 4th paragraph:
".......So, with modern versions of Perl, I'd just use /\*.*?\*/ to
match C comments and be done with it."

=> [other improvements]
s#/\*.*?\*/|//.*?\n|("[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*'|[^'"/]+)#
$1 or "" #gse

or in another form:

s{
/\*.*?\*/ ## strip normal C comments
| ## or
//[^\n]* ## strip C++ comments
| ## or
( ## capture $1
"[^"\\]*(?:\\.[^"\\]*)*" ## double-quoted stuff
| ## or
'[^'\\]*(?:\\.[^'\\]*)*' ## single-quoted stuff
| ## or
[^"'/]+ ## strings that guarantee a non-comment
) ## end of capturing $1
}{ $1 or "" }gsxe

which I think might be better than the one in 'perldoc -q comment'.. I
didnt do very much experiment on this s/// expressions though. Just
some of my $0.02.. Thanks for any comments,

=> Of course, this regex is still incomplete because it completely
ignores
=> trigraphs and continuation lines:
=> /??/
=> * this is a comment */
=>
=> ??/ is a trigraph for \, \<newline> is removed, then /* ... */ is
parsed
=> as a comment.
=>
=> Another problem is that comments are semantically equivalent to
=> whitespace, so something like "int/**/main" should turn into "int
main",
=> not "intmain".

I guess the regex we discussed so far was written for traditional C
instead of ANSI C. as far as I know, in traditional C(old K&R),
int/**/main is parsed into "intmain" instead of "int main", and it also
does not support trigraphs.. :) ..

=> Here's my own version:
#!/usr/local/bin/perl -p0777
=> s!
=> /
=> (?: (?: \\ | \?\?/ ) \n )*
=> (?:
=> /
=> (?:
=> (?: \\ | \?\?/ ) \n
=> |
=> [^\n]
=> )*
=> |
=> \*
=> [^*]* \*+
=> (?: (?: \\ | \?\?/ ) \n )*
=> (?:
=> [^/*]
=> [^*]* \*+
=> (?: (?: \\ | \?\?/ ) \n )*
=> )*
=> (/)
=> )
=> |
=> (
=> "
=> (?:
=> (?: \\ | \?\?/ ) .
=> |
=> [^"]
=> )*
=> "
=> |
=> '
=> (?:
=> (?: \\ | \?\?/ ) .
=> |
=> [^']
=> )*
=> '
=> |
=> . [^'"/]*
=> )
=> !(defined $1 ? ' ' : '') . $2!gsex

Could you please group your patterns and add some comments on them.
that way, we can know what you did and how? Many thanks..

Xicheng
=====
USENET is a classroom, for me:)
 
X

Xicheng Jia

robic0 said:
Hi folks:

I am recently reading Jeffery Friedl's book "Mastering Regular
Expressions"(O'Reilly, 2nd edition), and found that something in
perldoc might be out of date and not fully updated with Perl's
development.

perldoc -q comment

this gives me a C comments stripper(created by Jeffrey Friedl and later
modified by Fred Curtis.):

s#/\*[^*]*\*+([^/*][^*]*\*+)*/¦("(\\.¦[^"\\])*"¦'(\\.¦[^'\\])*'¦.[^/"'\\]*)#defined
$2 ? $2 : ""#gse;

I think there are several parts which are not optimized or can be
simplified from Perl regex's flavor:

1) /\*[^*]*\*+([^/*][^*]*\*+)*/
this pattern is to remove a normal C comment in form of /* ..... */,
which is developed when there is no lazy quantifiers. As Jeffery
metioned in his book, a much simpler pattern can be:
/\*.*?\*/ and this one is obviously much easier to be understood..

2) "(\\.¦[^"\\])*"
this pattern is to capture all contents in a C string(double-quoted
stuff), and the unrolling version of this pattern
"[^"\\]*(?:\\.[^"\\]*)*" developed by Jeffery can be much more
efficient(as he mentioned in his book). A similar approach can be done
with the single-quoted stuff..

3) several non-capturing parentheses could be modified to(?: ) form
which can somehow optimize the performace of the regex.

According to the above, some modification can be made, and the s///
expression can be written to, i.e.:

s#/\*.*?\*/|//.*?\n|("[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*'|[^'"/]+)#
$1 or "" #gse

or in another form:

s{
/\*.*?\*/ ## strip normal C comments
| ## or
//[^\n]* ## strip C++ comments

what if you have something like this:
// comments: ... /*<newline>
some code example <newline>
more code // embedded comment, code example <newline>
/* more code and comments <newline>
*/ <newline>
// comments <newline>
*/ <newline>
It does matter if it won't compile but then you have to invoke
the compiler and parse its output.

The opposite construction as well:
/* .... /* .... */ this is left in */
| ## or
( ## capture $1
"[^"\\]*(?:\\.[^"\\]*)*" ## double-quoted stuff
| ## or
'[^'\\]*(?:\\.[^'\\]*)*' ## single-quoted stuff
| ## or
[^"'/]+ ## strings that guarantee a non-comment
) ## end of capturing $1
}{ $1 or "" }gsxe

which I think might be better than the one in 'perldoc -q comment'.. I
didnt do very much experiment on this s/// expressions though. Just
some of my $0.02.. Thanks for any comments,

Xicheng
=====
USENET is a classroom, for me.:)

I don't know if the above won't compile on todays compilers, it didn't (if the
code was bad) on Vc6 and below.

=> For '//' the end delimeter might be the eol or if continuations
=> are allowed, the eol on the next line.

what do you mean "eol", isn't that "\n"? or you mean there is a
backslash at the end of line and thus the next line should be
continuous line?? huh, I guess that would be a problem:)

=> A 'rolling' regexp parse (global) will have problems with nesting.

I am trying Jeffery's unrolling version of the regex..
Doesen't seem to be a defined standard on comments. There may be, dunno.
XML runs into the same problem with COMMENT/CDATA statement.
The difference might be that the XML standard has clearly defined path
of precedence. Its chiseled in stone. There's no ambiguity.
Your code may work for perfectly constructed comments (the ones that compile within
C/C++ code) as the idea of such exists in your mind, but don't fool yourself as to
the flaws in this regexp.

I knew the regex can NOT handle everything, and I just want to learn
something from trying it..:) thanks anyway for your suggestions..:)
 
X

Xicheng Jia

Lukas said:
Xicheng Jia said:
[other improvements]

s#/\*.*?\*/|//.*?\n|("[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*'|[^'"/]+)#
$1 or "" #gse

or in another form:

s{
/\*.*?\*/ ## strip normal C comments
| ## or
//[^\n]* ## strip C++ comments
| ## or
( ## capture $1
"[^"\\]*(?:\\.[^"\\]*)*" ## double-quoted stuff
| ## or
'[^'\\]*(?:\\.[^'\\]*)*' ## single-quoted stuff
| ## or
[^"'/]+ ## strings that guarantee a non-comment
) ## end of capturing $1
}{ $1 or "" }gsxe

which I think might be better than the one in 'perldoc -q comment'.. I
didnt do very much experiment on this s/// expressions though. Just
some of my $0.02.. Thanks for any comments,

Of course, this regex is still incomplete because it completely ignores
trigraphs and continuation lines:
=> /??/
=> * this is a comment */
=>
=> ??/ is a trigraph for \, \<newline> is removed, then /* ... */ is
parsed
=> as a comment.

I just checked the trigraph ??/ which is exactly a backslash so it
comes to a similar situation as robic0 has proposed, when there is a
trailing backslask at the same line with a "//" comment..... However,
writting C code the following ways are quite unusual, isn't it?? :)

/\
* this is a comment *\
/

// this is \
a comment
__________________________
Another problem is that comments are semantically equivalent to
whitespace, so something like "int/**/main" should turn into "int main",
not "intmain".

For ANSI C, this kind of comments is converted into a SPACE at
pre-processing stage. so fixing it might be as easy as changing the
replacement part of the regex from:

{ $1 or "" }gsxe to { $1 or " " }gsxe

Xicheng. :)
 
L

Lukas Mai

robic0 schrob:
what if you have something like this:
// comments: ... /*<newline>

That's a single comment.
some code example <newline>
more code // embedded comment, code example <newline>

Code followed by a comment.
/* more code and comments <newline>
*/ <newline>

A comment.
// comments <newline>

A comment.
*/ <newline>

Syntax error, expecting value before /.
It does matter if it won't compile but then you have to invoke
the compiler and parse its output.

The opposite construction as well:
/* .... /* .... */ this is left in */
|--------------------------|

Comment, code, syntax error.
I don't know if the above won't compile on todays compilers, it didn't
(if the code was bad) on Vc6 and below.
For '//' the end delimeter might be the eol or if continuations are
allowed, the eol on the next line. A 'rolling' regexp parse (global)
will have problems with nesting.

How about actually reading the relevant docs instead of babbling?
Doesen't seem to be a defined standard on comments. There may be,
dunno. XML runs into the same problem with COMMENT/CDATA statement.
The difference might be that the XML standard has clearly defined path
of precedence. Its chiseled in stone. There's no ambiguity.
Your code may work for perfectly constructed comments (the ones that
compile within C/C++ code) as the idea of such exists in your mind,
but don't fool yourself as to the flaws in this regexp.

More babbling.
Its not really flawed for what it does, in your mind,
its that the idea is a conceptual *error*.

The glaring flaw is that s///g is not compatable with this, *if*
nesting will be taken into account and allowed. I don't think there's
a standards commitee for C/C++ comments, compilers give you what you
get.

Uh. I recommend you take a look at ISO 9899:1999. There is no separate
standard for C comments because they're part of C.
If you want to persue an *all cases* approach check the just posted
RXParse xml parser on how it effectively deals with COMMENT/CDATA.

How does an XML parser help with tokenizing C?
 
R

robic0

robic0 schrob:

That's a single comment.


Code followed by a comment.


A comment.


A comment.


Syntax error, expecting value before /.

|--------------------------|

Comment, code, syntax error.


How about actually reading the relevant docs instead of babbling?


More babbling.


Uh. I recommend you take a look at ISO 9899:1999. There is no separate
standard for C comments because they're part of C.


How does an XML parser help with tokenizing C?

You must be from another planet, you sound like you have written all these
exceptions and caveats..... I mean other than the written English you write
here
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top