Flex source for parsing files with multiline comments

U

Uwe Ziegenhagen

Hello,

my fellows and me implement a c++ tool that is able to divide blank/tab
separated files into <number>, <text>, <c-singlelinecomment> and
<multilinecomment>. So far it's not working bad, we have just one
problem if I call the a.exe that gcc compiles with the following
textfile (./a.exe < test.txt) he does not match the multiline comments
correctly.

*test.txt contains:

123 456
hello
5674
/* hello hello hellp
something
more */
123

*output of a.exe < test.txt is:

Number
Number

Command

Number

hello hello hellp
something
more Comments
Number

That's bad the hellos should be suppressed, only "comments" should be
returned.

Any idea?

UWe
################################## fourth.l ##############
%{
#define NUMBER 400
#define COMMENT 401
#define TEXT 402
#define COMMAND 403
#define schlump 405
#define mlcomment 406

/* <COMMENTS>\n {return mlcomment; }
<COMMENTS>.\n {return mlcomment; }

<COMMENTS>"*/"[ \t]*\n { BEGIN 0; return mlcomment;}
<COMMENTS>"*/" { BEGIN 0; return mlcomment;}

*/
%}

%x COMMENTS

%%
[ \t]*"/*".*"*/"[ \t]*\n {return mlcomment; }
[ \t]*"/*" { BEGIN COMMENTS;}

<COMMENTS>.* "*/"[ \t]*\n { BEGIN 0; return mlcomment;}
<COMMENTS>.*"*/" { BEGIN 0; return mlcomment;}

[+-]?(([0-9]+)|([0-9]*\.[0-9]+)([eE][-+]?[0-9]+)?) {return NUMBER;}
(\/\/)+[^\n]* {return COMMENT;}
\"[^\"\n]*\" {return TEXT;}
[0-9]+[a-zA-Z]+ {return schlump;}
[a-zA-Z][a-zA-Z0-9]+ {return COMMAND;}
.. ;
..*\n ;

%%
#include <stdio.h>

int yywrap(){return 1;}

main (argc,argv)
int argc;
char *argv[];

{
int val;

//

while (val = yylex()) {
switch (val)
{
case 400 :
{printf ("Number\n");break;}
case 401 :
{printf ("Comment\n");break;}
case 402 :
{printf ("Text\n");break;}
case 403 :
{printf ("Command\n");break;}
case 405 :
{printf ("Schlumpf\n");break;}
case 406 :
{printf ("Comments\n");break;}
default :
printf("etwas anderes\n");
}
}
}
 
I

Ivan Vecerina

| my fellows and me implement a c++ tool that is able to divide blank/tab
| separated files into <number>, <text>, <c-singlelinecomment> and
| <multilinecomment>. So far it's not working bad, we have just one
| problem if I call the a.exe that gcc compiles with the following
| textfile (./a.exe < test.txt) he does not match the multiline comments
| correctly.
NB: specific tools such as Flex are OT in this NG ... but I'll take on...

| That's bad the hellos should be suppressed, only "comments" should be
| returned.
|
| Any idea?

I believe your multiline comment handling does not have to be that
complicated.
What about using a single rule such as:

"/*"([^\*]|\*[^/])"*/" { return mlcomment; }

hth,
 
J

Jim Fischer

Uwe said:
Thank you very much, what is the best group for lex and bison stuff?

I don't know of any newsgroups for these programs, but 'flex' and
'bison' are GNU programs so you might take a look at the newsgroup

gnu.utils.help

FWIW, there are also mailing lists you can subscribe to -- i.e., you ask
questions (and read/respond to other people's questions) via email
rather than by posting messages to a newsgroup server. Visit the GNU
'flex' and 'bison' web sites for more info:

http://www.gnu.org/software/flex/flex.html
http://www.gnu.org/software/bison/bison.html
 
F

Frank Schmitt

Uwe Ziegenhagen said:
Thank you very much, what is the best group for lex and bison stuff?

I'd try comp.compilers and comp.compilers.tools

HTH & kind regards
frank
 
Joined
Jan 27, 2010
Messages
1
Reaction score
0
Ivan Vecerina said:
"Uwe Ziegenhagen" <[email protected]> wrote in message
news:[email protected]...
| my fellows and me implement a c++ tool that is able to divide blank/tab
| separated files into <number>, <text>, <c-singlelinecomment> and
| <multilinecomment>. So far it's not working bad, we have just one
| problem if I call the a.exe that gcc compiles with the following
| textfile (./a.exe < test.txt) he does not match the multiline comments
| correctly.
NB: specific tools such as Flex are OT in this NG ... but I'll take on...

| That's bad the hellos should be suppressed, only "comments" should be
| returned.
|
| Any idea?

I believe your multiline comment handling does not have to be that
complicated.
What about using a single rule such as:

"/*"([^\*]|\*[^/])"*/" { return mlcomment; }

hth,
--
Ivan Vecerina, Dr. med. <> Brainbench MVP for C++ <>

You wrote: "/*"([^\*]|\*[^/])"*/" { return mlcomment; }

This will only allow comments of the form /*<single character>*/.

I think you meant: "/*"([^\*]|\*[^/])*"*/"

Cheers
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top