Determing if a line is a comment

C

Christopher

Hello all,

I've got a question of comment logic. I'm trying to determine if a
given line of C/C++/C#/Java is a comment or not. Can you please
comment on my logic.

I need to define some terms first:
starts with = has only white space on the line untill it reaches ...
first preceeding valid /* = looking back, the first /* from the current
point in the file where the line does not have a // before the /*

a line is a pure comment line if:
the line starts with //
or
the line starts wih /*
or
the first preceeding valid /* occurs after the first preceeding */
 
P

Phlip

Christopher said:
a line is a pure comment line if:
the line starts with //
or
the line starts wih /*
or
the first preceeding valid /* occurs after the first preceeding */

Parsing is an endless topic. Consider this comment:

std::string foo = " /* not a comment */ ";

Now from here, you could borrow the source to 'astyle', which probably has a
primitive parser in it.

Or you could write a Regular Expression that matches comments.

Or you could punt, and ignore comments in strings.

So the question arises why you want to parse your source to find comments?

This might be for a metric, such as "number of comments", or "number of
non-comment code lines". If so, understand that there are many more
important metrics (such as number of lines _removed_). The LOC metric can be
easily abused in many ways.

You can get a cheap approximation of the number of code statements,
disregarding structural lines like { or }, by counting the number of ";\n"
formations. The end-of-statement delimiter...
 
E

Eric Pruneau

Christopher said:
Hello all,

I've got a question of comment logic. I'm trying to determine if a
given line of C/C++/C#/Java is a comment or not. Can you please
comment on my logic.

I need to define some terms first:
starts with = has only white space on the line untill it reaches ...
first preceeding valid /* = looking back, the first /* from the current
point in the file where the line does not have a // before the /*

a line is a pure comment line if:
the line starts with //
or
the line starts wih /*
or
the first preceeding valid /* occurs after the first preceeding */

What about tab (\t characters)..... a tab is not a white space!
 
E

Eric Pruneau

Victor Bazarov said:
Eric said:
[..]
What about tab (\t characters)..... a tab is not a white space!

Huh?

If a line begin with a tab and after it has // it is a pure comment...
but when you read the text file the tab character and the space character
are
2 different characters...
 
D

David Harmon

On 8 Aug 2006 10:19:08 -0700 in comp.lang.c++, "Christopher"
first preceeding valid /* = looking back, the first /* from the current
point in the file where the line does not have a // before the /*

I think you will do better by scanning forward than by trying to go
back. Don't forget lines with /**/ followed by code. Don't forget
"/*" strings. Set a flag if you see a comment, another if you see
non-comment, at the end of the line you have your answer.
 
R

red floyd

Eric said:
Victor Bazarov said:
Eric said:
[..]
What about tab (\t characters)..... a tab is not a white space!
Huh?

If a line begin with a tab and after it has // it is a pure comment...
but when you read the text file the tab character and the space character
are
2 different characters...

There is a difference. a tab char is not a space char, but they are
both "white space". There's a difference.
 
D

David Harmon

On Tue, 8 Aug 2006 14:27:44 -0400 in comp.lang.c++, "Eric Pruneau"
but when you read the text file the tab character and the space character
are
2 different characters...

isspace('\t') still returns true.
 
P

Phlip

David said:
I think you will do better by scanning forward than by trying to go
back. Don't forget lines with /**/ followed by code. Don't forget
"/*" strings. Set a flag if you see a comment, another if you see
non-comment, at the end of the line you have your answer.

If you then say "set a flag if you have a string", you have multiple
booleans where you should have a true state table. Add a couple more
booleans, and you instantly get tangled code.

So start with a Regular Expression. You could probably Google for [regex C++
comment] and hit one right away.

red said:
There is a difference. a tab char is not a space char, but they are
both "white space". There's a difference.

Call them "blanks". Sometimes they are not spaces, and sometimes they are
not white...
 
D

David Harmon

I think you will do better by scanning forward than by trying to go
back. Don't forget lines with /**/ followed by code. Don't forget
"/*" strings. Set a flag if you see a comment, another if you see
non-comment, at the end of the line you have your answer.

If you then say "set a flag if you have a string", you have multiple
booleans where you should have a true state table. Add a couple more
booleans, and you instantly get tangled code.[/QUOTE]

The flags do not and any complexity to the state table or whatever
you are using to scan. The flags merely collect information on what
you have seen, and don't change the scanning behavior in any way
whatsoever.
So start with a Regular Expression. You could probably Google for [regex C++
comment] and hit one right away.

They might well be viable, but I don't see it. At first thought,
constructing a regex to handle all the possibilities sounds like a
nightmare. Perhaps you could expand with an example.
 
C

Christopher

David,

Yeah, I see how going forward would be a lot easier. Unfortunately
that really isn't much of an option.

I'm taking a unified diff output and I'm trying to determine if the
output is a line of code or a comment. To determine if the line is a
comment, I'll have to actually look at the file. The diff simply gives
me a good starting point. I'm really trying to avoid parsing the
entire file.

I'm thinking about doing a forward parse through the entire file and
then capturing the line #s of fully commented lines.

As for concerns about using this as the only metric, i agree.
Obviously only measuring what was added does not fully capture how much
work was done. Nevertheless, it can be a useful metric when you use it
appropriately.


In the mean time, I'm simply not going to distinguish between a sloc
and a comment. They both represent effort.

-Chris
 
C

Christopher

For what it's worth to everyone, I've basically concluded that I'll
have to get to know flex.

best of luck!

-Christopher
 
D

Default User

Phlip said:
David Harmon wrote:

Call them "blanks". Sometimes they are not spaces, and sometimes they
are not white...

Call them "white space" because that's what the Standard does.




Brian
 
D

David Harmon

On 8 Aug 2006 12:12:51 -0700 in comp.lang.c++, "Christopher"
I'm taking a unified diff output and I'm trying to determine if the
output is a line of code or a comment. To determine if the line is a
comment, I'll have to actually look at the file. The diff simply gives
me a good starting point.

To find a position in a file, given a line number, you have to scan
anyway. To find if you are in a comment or not (for sure) you have
to go back an unbounded amount. I think your goal should be to scan
the file *only once*. Saving a list of line numbers sounds good.

char bogus[] = "What the heck\
/* do you call */\
this?";
 
R

red floyd

Phlip said:
Call them "blanks". Sometimes they are not spaces, and sometimes they are
not white...


2.1/3 "The source file is decomposed into preprocessing tokens (2.4) and
sequences of white-space characters (including comments)."

2.4.2 "Preprocessing tokens can be separated by /white space/; this
consists of comments (2.7), or /white-space characters/ (space,
horizontal tab, new-line, vertical tab, and form-feed), or both."

Given that the Standard calls it white space, I'll do the same.
 
A

Alex Vinokur

Christopher said:
Hello all,

I've got a question of comment logic. I'm trying to determine if a
given line of C/C++/C#/Java is a comment or not. Can you please
comment on my logic.
[snip]

Look at https://sourceforge.net/projects/cncc/

-----------------------------------------------------
NAME
cncc - count C/C++ source lines and bytes

SYNOPSIS
cncc [OPTIONS]... [FILE]...

DESCRIPTION
Count code-lines, empty-lines, comment-lines,
code-fields, empty-fields, comment-fields of C/C++-sources
which have been successfully compiled.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,682
Members
48,796
Latest member
Greg L.

Latest Threads

Top