Parser to list function names in C++?

Discussion in 'C++' started by Henrik Goldman, Dec 5, 2006.

  1. Hi,

    I would like to create a simplistic parser which goes through each .h file
    and finds each function prototype (or inline implementation) along with
    class names and member functions.

    Examples:

    test.h:

    void f1();
    inline int f2() {return 0;}

    class A
    {
    void f3();
    }

    How would I aproach this from a simple viewpoint without a steep learning
    curve. I know there exist a dozen parsers which are all pretty advanced and
    requires lots of background knowledge but for my simple needs I think it
    might be a bit overkill.
    The parser should be in C++ too since rest of the app is also C++.

    Any ideas how to proceed?

    -- Henrik
    Henrik Goldman, Dec 5, 2006
    #1
    1. Advertising

  2. Henrik Goldman wrote:
    > Hi,
    >
    > I would like to create a simplistic parser which goes through each .h file
    > and finds each function prototype (or inline implementation) along with
    > class names and member functions.
    >
    > Examples:
    >
    > test.h:
    >
    > void f1();
    > inline int f2() {return 0;}
    >
    > class A
    > {
    > void f3();
    > }
    >
    > How would I aproach this from a simple viewpoint without a steep learning
    > curve. I know there exist a dozen parsers which are all pretty advanced and
    > requires lots of background knowledge but for my simple needs I think it
    > might be a bit overkill.
    > The parser should be in C++ too since rest of the app is also C++.
    >
    > Any ideas how to proceed?


    A true C++ parser is alot of work.

    You could take an open source program that has a parser and teach it to
    do what you want.

    Perhaps you can look at doxygen or gcc.


    G
    Gianni Mariani, Dec 5, 2006
    #2
    1. Advertising

  3. Henrik Goldman

    Guest

    Henrik Goldman wrote:

    > I would like to create a simplistic parser which goes through
    > each .h file and finds each function prototype (or inline
    > implementation) along with class names and member
    > functions.
    > ....
    > Any ideas how to proceed?


    One approach would be to use a regular expression engine
    to do the searching.

    For example if I load your 'test.h' example header file
    into Zeus and search for this regular expression:

    [_a-z0-9]+[ &*\t]+[_a-z0-9 \t]*[_a-z0-9]+[ \t]*[(]+

    it only finds these lines:

    void f1();
    inline int f2() {return 0;}
    void f3();

    Jussi Jumppanen
    Zeus For Windows - "The ultimate programmer's editor/IDE"
    http://www.zeusedit.com
    , Dec 5, 2006
    #3
  4. I suggest you have a look at flex/bison, or ANTLR.

    Joseph.
    Joseph Paterson, Dec 6, 2006
    #4
  5. Henrik Goldman

    CTG Guest

    you first of all sit down and work out the rules:
    examples:

    declaration of each function has a '(' followed by a ')' and a ';'
    semicolon at the end except the in case of inline one.


    I dont think its hard at all.

    Henrik Goldman wrote:
    > Hi,
    >
    > I would like to create a simplistic parser which goes through each .h file
    > and finds each function prototype (or inline implementation) along with
    > class names and member functions.
    >
    > Examples:
    >
    > test.h:
    >
    > void f1();
    > inline int f2() {return 0;}
    >
    > class A
    > {
    > void f3();
    > }
    >
    > How would I aproach this from a simple viewpoint without a steep learning
    > curve. I know there exist a dozen parsers which are all pretty advanced and
    > requires lots of background knowledge but for my simple needs I think it
    > might be a bit overkill.
    > The parser should be in C++ too since rest of the app is also C++.
    >
    > Any ideas how to proceed?
    >
    > -- Henrik
    CTG, Dec 6, 2006
    #5
  6. Henrik Goldman

    Evan Guest

    Henrik Goldman wrote:
    > Hi,
    >
    > I would like to create a simplistic parser which goes through each .h file
    > and finds each function prototype (or inline implementation) along with
    > class names and member functions.
    >
    > Examples:
    >
    > test.h:
    >
    > void f1();
    > inline int f2() {return 0;}
    >
    > class A
    > {
    > void f3();
    > }
    >
    > How would I aproach this from a simple viewpoint without a steep learning
    > curve. I know there exist a dozen parsers which are all pretty advanced and
    > requires lots of background knowledge but for my simple needs I think it
    > might be a bit overkill.


    There are sort of two approaches I see. One is to use text pattern
    matching like jussij suggests. (Though remember to also search for A-Z
    and if you want to be pedantic, stuff like $ that you can also use in
    identifiers but probably no one actually does. Also his won't spot
    things like constructors (no return value), functions where there are
    newlines in the whitespace (you can't use grep for those), operators,
    and probably some other special cases.) There's a variant of this which
    would use something like Flex to create a lexer, in which case you just
    have to deal with whole tokens. This would might be easier if you know
    at least a little Flex (or the ideas behind it) and can find the file
    that GCC uses or something to do their lexing. Then again, it might
    not.

    The problem with that is that I'm not sure how hard it would be to get
    just the lines in question. I mean, I know that jussij probably didn't
    spent a lot of time working on that and could get something more to the
    point with some more effort, but I suspect that it would be very
    difficult to get something that works in full generality. At the same
    time, if your results don't have to be perfect, this solution could be
    very lightweight, even to the point of running a slightly modified
    version of jussij's regex over your code with grep.

    Now, as for if you want exact answers, you might have to go with one of
    those parsers. I'll just give a shoutout for one that I know personally
    called Elsa. It is complete and accurate enough to parse its own source
    then output the source again in a form where it can be compiled and the
    rebuilt version used to run the regression suite. At least, I think it
    is, though I'm not quite sure how, because I'm currently fixing a
    number of "pretty-printing" bugs that block correct translation of the
    GCC 3.4 headers. (I'm working on a project that uses it for
    source-to-source transformations.) There is one semi-show-stopping bug
    in the parsing end though, which is that code containing endl or flush
    confuses it. However, replacing endl with "\n" except in the definition
    (I use a regex for telling apart uses and the definition; it's not
    perfect either) will let things work right. (I know it's not quite
    semantics preserving.) However, if you can stand to do that change,
    it's quite easy to write an extension that will do what you want.
    http://www.cs.berkeley.edu/~smcpeak/elkhound/sources/elsa/semgrep.cc
    has about a two and a half page long program that is "semantic grep";
    you give it a variable name, and it will tell you all the places a
    variable with that name is declared or used. On the other hand, if you
    want to include it in another project... probably this is not the best
    option. See www.cubewano.org/oink.

    So pro with the parser approach is that it's very robust modulo bugs in
    the implementation (in the case of Elsa, which will hopefully go away
    in the fairly near future... Mozilla is eyeing the Oink project --
    which now more or less includes Elsa -- for helping them), but the cons
    are that it is pretty much by definition quite heavyweight. And there
    are of course other options here. The other one that might be useful is
    OpenC++, though I don't know much about that project. You could try to
    hack the GCC front end. That's all the open-source c++ parsers I know
    of.

    Evan Driscoll
    Evan, Dec 6, 2006
    #6
  7. Henrik Goldman

    Guest

    Henrik Goldman wrote:
    > Hi,
    >
    > I would like to create a simplistic parser which goes through each .h file
    > and finds each function prototype (or inline implementation) along with
    > class names and member functions.
    >
    > Examples:
    >
    > test.h:
    >
    > void f1();
    > inline int f2() {return 0;}
    >
    > class A
    > {
    > void f3();
    > }
    >
    > How would I aproach this from a simple viewpoint without a steep learning
    > curve. I know there exist a dozen parsers which are all pretty advanced and
    > requires lots of background knowledge but for my simple needs I think it
    > might be a bit overkill.
    > The parser should be in C++ too since rest of the app is also C++.
    >
    > Any ideas how to proceed?
    >
    > -- Henrik

    Your tool to do this will depend on what you want to do with
    the output.

    As someone else mentioned, you could get the output using
    doxygen. I spent a day and a half playing around with it's
    options and got it to producde what you need plus a ton of
    other dependency related diagrams - class dependencies,
    include file dependencies, and function call dependencies.

    It's very flexible. I produced html output but it can also
    producde XML output which can then be processed by some
    other program.
    , Dec 6, 2006
    #7
  8. Hi,

    > As someone else mentioned, you could get the output using
    > doxygen. I spent a day and a half playing around with it's
    > options and got it to producde what you need plus a ton of
    > other dependency related diagrams - class dependencies,
    > include file dependencies, and function call dependencies.
    >
    > It's very flexible. I produced html output but it can also
    > producde XML output which can then be processed by some
    > other program.


    That actually sounds like a very useful idea. I just had a quick look and it
    certainly looks interesting. It seems to give what I need but generates alot
    of output so I must look into which files needs to be parsed etc.

    -- Henrik
    Henrik Goldman, Dec 6, 2006
    #8
  9. Hi Evan,

    Thanks for the suggestions.

    I did look into Elsa but found it rather huge for my simple needs. Basically
    I am trying to create an obfuscator which just changes names of functions
    and classes. Elsa can probably do alot more then just this but the time to
    learn how things work far superseeds the needs for my project.

    -- Henrik
    Henrik Goldman, Dec 6, 2006
    #9
  10. Henrik Goldman

    Default User Guest

    CTG wrote:

    > you first of all sit down and work out the rules:



    Please don't top-post. Your replies belong following or interspersed
    with properly trimmed quotes. See the majority of other posts in the
    newsgroup, or the group FAQ list:
    <http://www.parashift.com/c++-faq-lite/how-to-post.html>

    > examples:
    >
    > declaration of each function has a '(' followed by a ')' and a ';'
    > semicolon at the end except the in case of inline one.


    How do you distinguish that from a function call?

    >
    > I dont think its hard at all.


    That probably means you haven't thought enough.

    Such prototype declarations are not required by the language.

    You have to be able to handle this as well:


    void f()
    {
    return;
    }

    int main()
    {
    f();
    return 0;
    }


    So no semicolon and no inline keyword to help. I recommend not trying
    to roll your own on this. Use one of the prefab programs mentioned
    elsewhere.



    Brian
    Default User, Dec 6, 2006
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Paddy McCarthy
    Replies:
    3
    Views:
    707
    Anthony J Bybell
    Sep 24, 2004
  2. Bob
    Replies:
    1
    Views:
    381
    Lucas Tam
    Jul 30, 2004
  3. Ares Lagae
    Replies:
    8
    Views:
    447
    Ares Lagae
    Sep 24, 2004
  4. News123
    Replies:
    2
    Views:
    463
    John Machin
    Nov 26, 2008
  5. Sfdesigner Sfdesigner
    Replies:
    5
    Views:
    163
    Chris Shea
    Aug 13, 2007
Loading...

Share This Page