Parser to list function names in C++?

H

Henrik Goldman

Hi,

I would like to create a simplistic parser which goes through each .h file
and finds each function prototype (or inline implementation) along with
class names and member functions.

Examples:

test.h:

void f1();
inline int f2() {return 0;}

class A
{
void f3();
}

How would I aproach this from a simple viewpoint without a steep learning
curve. I know there exist a dozen parsers which are all pretty advanced and
requires lots of background knowledge but for my simple needs I think it
might be a bit overkill.
The parser should be in C++ too since rest of the app is also C++.

Any ideas how to proceed?

-- Henrik
 
G

Gianni Mariani

Henrik said:
Hi,

I would like to create a simplistic parser which goes through each .h file
and finds each function prototype (or inline implementation) along with
class names and member functions.

Examples:

test.h:

void f1();
inline int f2() {return 0;}

class A
{
void f3();
}

How would I aproach this from a simple viewpoint without a steep learning
curve. I know there exist a dozen parsers which are all pretty advanced and
requires lots of background knowledge but for my simple needs I think it
might be a bit overkill.
The parser should be in C++ too since rest of the app is also C++.

Any ideas how to proceed?

A true C++ parser is alot of work.

You could take an open source program that has a parser and teach it to
do what you want.

Perhaps you can look at doxygen or gcc.


G
 
J

jussij

Henrik said:
I would like to create a simplistic parser which goes through
each .h file and finds each function prototype (or inline
implementation) along with class names and member
functions.
....
Any ideas how to proceed?

One approach would be to use a regular expression engine
to do the searching.

For example if I load your 'test.h' example header file
into Zeus and search for this regular expression:

[_a-z0-9]+[ &*\t]+[_a-z0-9 \t]*[_a-z0-9]+[ \t]*[(]+

it only finds these lines:

void f1();
inline int f2() {return 0;}
void f3();

Jussi Jumppanen
Zeus For Windows - "The ultimate programmer's editor/IDE"
http://www.zeusedit.com
 
C

CTG

you first of all sit down and work out the rules:
examples:

declaration of each function has a '(' followed by a ')' and a ';'
semicolon at the end except the in case of inline one.


I dont think its hard at all.
 
E

Evan

Henrik said:
Hi,

I would like to create a simplistic parser which goes through each .h file
and finds each function prototype (or inline implementation) along with
class names and member functions.

Examples:

test.h:

void f1();
inline int f2() {return 0;}

class A
{
void f3();
}

How would I aproach this from a simple viewpoint without a steep learning
curve. I know there exist a dozen parsers which are all pretty advanced and
requires lots of background knowledge but for my simple needs I think it
might be a bit overkill.

There are sort of two approaches I see. One is to use text pattern
matching like jussij suggests. (Though remember to also search for A-Z
and if you want to be pedantic, stuff like $ that you can also use in
identifiers but probably no one actually does. Also his won't spot
things like constructors (no return value), functions where there are
newlines in the whitespace (you can't use grep for those), operators,
and probably some other special cases.) There's a variant of this which
would use something like Flex to create a lexer, in which case you just
have to deal with whole tokens. This would might be easier if you know
at least a little Flex (or the ideas behind it) and can find the file
that GCC uses or something to do their lexing. Then again, it might
not.

The problem with that is that I'm not sure how hard it would be to get
just the lines in question. I mean, I know that jussij probably didn't
spent a lot of time working on that and could get something more to the
point with some more effort, but I suspect that it would be very
difficult to get something that works in full generality. At the same
time, if your results don't have to be perfect, this solution could be
very lightweight, even to the point of running a slightly modified
version of jussij's regex over your code with grep.

Now, as for if you want exact answers, you might have to go with one of
those parsers. I'll just give a shoutout for one that I know personally
called Elsa. It is complete and accurate enough to parse its own source
then output the source again in a form where it can be compiled and the
rebuilt version used to run the regression suite. At least, I think it
is, though I'm not quite sure how, because I'm currently fixing a
number of "pretty-printing" bugs that block correct translation of the
GCC 3.4 headers. (I'm working on a project that uses it for
source-to-source transformations.) There is one semi-show-stopping bug
in the parsing end though, which is that code containing endl or flush
confuses it. However, replacing endl with "\n" except in the definition
(I use a regex for telling apart uses and the definition; it's not
perfect either) will let things work right. (I know it's not quite
semantics preserving.) However, if you can stand to do that change,
it's quite easy to write an extension that will do what you want.
http://www.cs.berkeley.edu/~smcpeak/elkhound/sources/elsa/semgrep.cc
has about a two and a half page long program that is "semantic grep";
you give it a variable name, and it will tell you all the places a
variable with that name is declared or used. On the other hand, if you
want to include it in another project... probably this is not the best
option. See www.cubewano.org/oink.

So pro with the parser approach is that it's very robust modulo bugs in
the implementation (in the case of Elsa, which will hopefully go away
in the fairly near future... Mozilla is eyeing the Oink project --
which now more or less includes Elsa -- for helping them), but the cons
are that it is pretty much by definition quite heavyweight. And there
are of course other options here. The other one that might be useful is
OpenC++, though I don't know much about that project. You could try to
hack the GCC front end. That's all the open-source c++ parsers I know
of.

Evan Driscoll
 
A

AnonMail2005

Henrik said:
Hi,

I would like to create a simplistic parser which goes through each .h file
and finds each function prototype (or inline implementation) along with
class names and member functions.

Examples:

test.h:

void f1();
inline int f2() {return 0;}

class A
{
void f3();
}

How would I aproach this from a simple viewpoint without a steep learning
curve. I know there exist a dozen parsers which are all pretty advanced and
requires lots of background knowledge but for my simple needs I think it
might be a bit overkill.
The parser should be in C++ too since rest of the app is also C++.

Any ideas how to proceed?

-- Henrik
Your tool to do this will depend on what you want to do with
the output.

As someone else mentioned, you could get the output using
doxygen. I spent a day and a half playing around with it's
options and got it to producde what you need plus a ton of
other dependency related diagrams - class dependencies,
include file dependencies, and function call dependencies.

It's very flexible. I produced html output but it can also
producde XML output which can then be processed by some
other program.
 
H

Henrik Goldman

Hi,
As someone else mentioned, you could get the output using
doxygen. I spent a day and a half playing around with it's
options and got it to producde what you need plus a ton of
other dependency related diagrams - class dependencies,
include file dependencies, and function call dependencies.

It's very flexible. I produced html output but it can also
producde XML output which can then be processed by some
other program.

That actually sounds like a very useful idea. I just had a quick look and it
certainly looks interesting. It seems to give what I need but generates alot
of output so I must look into which files needs to be parsed etc.

-- Henrik
 
H

Henrik Goldman

Hi Evan,

Thanks for the suggestions.

I did look into Elsa but found it rather huge for my simple needs. Basically
I am trying to create an obfuscator which just changes names of functions
and classes. Elsa can probably do alot more then just this but the time to
learn how things work far superseeds the needs for my project.

-- Henrik
 
D

Default User

CTG said:
you first of all sit down and work out the rules:


Please don't top-post. Your replies belong following or interspersed
with properly trimmed quotes. See the majority of other posts in the
newsgroup, or the group FAQ list:
examples:

declaration of each function has a '(' followed by a ')' and a ';'
semicolon at the end except the in case of inline one.

How do you distinguish that from a function call?
I dont think its hard at all.

That probably means you haven't thought enough.

Such prototype declarations are not required by the language.

You have to be able to handle this as well:


void f()
{
return;
}

int main()
{
f();
return 0;
}


So no semicolon and no inline keyword to help. I recommend not trying
to roll your own on this. Use one of the prefab programs mentioned
elsewhere.



Brian
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,045
Latest member
DRCM

Latest Threads

Top