C++ Source Reverse Engineer - How to write a parser ?

Discussion in 'C++' started by Herby, Jun 6, 2007.

  1. Herby

    Herby Guest

    Hi,

    Im interested in Reverse Engineering C++ source code into a form more
    comprehensible than the source itself.

    I want to write a basic one myself, obviously i need to write a parser
    for the source code.
    Although this has some overlap with say a compiler it would also seem
    significantly different too.

    Can anyone provide me with links etc on how one would go about writing
    such a parser?
    No doubt i would also need a reference to the syntax rules of C++ etc.


    Secondary can anyone recommend a good tool that currently exists to do
    the job?

    Thanks.
     
    Herby, Jun 6, 2007
    #1
    1. Advertising

  2. Herby

    Ian Collins Guest

    Herby wrote:
    > Hi,
    >
    > Im interested in Reverse Engineering C++ source code into a form more
    > comprehensible than the source itself.
    >
    > I want to write a basic one myself, obviously i need to write a parser
    > for the source code.
    > Although this has some overlap with say a compiler it would also seem
    > significantly different too.
    >
    > Can anyone provide me with links etc on how one would go about writing
    > such a parser?
    > No doubt i would also need a reference to the syntax rules of C++ etc.
    >

    The gcc source.

    --
    Ian Collins.
     
    Ian Collins, Jun 6, 2007
    #2
    1. Advertising

  3. Herby wrote:
    > Im interested in Reverse Engineering C++ source code into a form more
    > comprehensible than the source itself.
    >
    > I want to write a basic one myself, obviously i need to write a parser
    > for the source code.
    > Although this has some overlap with say a compiler it would also seem
    > significantly different too.
    >
    > Can anyone provide me with links etc on how one would go about writing
    > such a parser?
    > No doubt i would also need a reference to the syntax rules of C++ etc.


    If you have to ask this question you should IMHO better start with a smaller
    project.

    > Secondary can anyone recommend a good tool that currently exists to do
    > the job?


    I don't know if its a good one, because its a bit outdated but may be worth
    a try: gccxml uses the gcc frontend to parse the sources and creates a xml
    output which can be easily read.

    Mathias
     
    Mathias Waack, Jun 6, 2007
    #3
  4. Herby

    Zeppe Guest

    Herby wrote:
    > Hi,
    >
    > Im interested in Reverse Engineering C++ source code into a form more
    > comprehensible than the source itself.


    ambitious goal, I think, but I hope you will succeed somehow :)
    The problem that I see is that a human-written source code is usually
    the most comprehensible expression of an algorithm that embodies all the
    details about the algorithm itself. Of course you can find some sort of
    compromise: for example, there are UML programs that are able to sketch
    diagrams from the source files.

    > I want to write a basic one myself, obviously i need to write a parser
    > for the source code.
    > Although this has some overlap with say a compiler it would also seem
    > significantly different too.


    You are right. Especially if you decide the abstraction level that you
    want to stop at, it may be much simpler.

    > Can anyone provide me with links etc on how one would go about writing
    > such a parser?


    I have a link and a suggestion. The link is:
    http://www.cs.berkeley.edu/~smcpeak/elkhound/sources/elsa/
    elsa is an opensource c/c++ parser, I think it's quite accurate. Of
    course that means that you need to have a deep knowledge of the c/c++
    syntax.

    the suggestion is: look at some sourcecode of opensource UML editors.
    They do a similar thing to what you are trying to do, probably you can
    find some interesting hint.

    > No doubt i would also need a reference to the syntax rules of C++ etc.


    C++ standard. There is everything, including the BNF syntax
    specification of the language.


    Regards,

    Zeppe
     
    Zeppe, Jun 6, 2007
    #4
  5. Herby

    Guest

    On Jun 6, 10:58 am, Herby <> wrote:
    > Hi,
    >
    > Im interested in Reverse Engineering C++ source code into a form more
    > comprehensible than the source itself.
    >
    > I want to write a basic one myself, obviously i need to write a parser
    > for the source code.
    > Although this has some overlap with say a compiler it would also seem
    > significantly different too.
    >
    > Can anyone provide me with links etc on how one would go about writing
    > such a parser?
    > No doubt i would also need a reference to the syntax rules of C++ etc.


    The classical approach (beside building one by hand) is using tools
    like lex and yacc (or bison). You should read on about compiler
    building (what you want to build is a compiler, if I understand you
    correctly (Translating your-own-language-tm into C++)), lexing and
    parsing.

    If you want to stay inside c++ you can use boost::spirit, which is
    similar to using yacc, but without the need to use an extra tool.

    Note that spirit is a library that basically takes a modified form of
    the EBNF syntax and embeds it into C++. Take a close look at how it is
    implemented, because the technique used might be a better approach to
    solving your problem (Just a wild guess, since I do not know what
    problem you are trying to solve).

    If you go the spirit route there is also boost::wave which is a full
    implementation of the C++ preprocessor (in fact IIRC the only FULL
    implementation of it.). Someone told me that there is also a person
    who is working on a full c++ parser using spirit, but i have not yet
    seen any further detail on it.

    --
    Fabio Fracassi
     
    , Jun 6, 2007
    #5
  6. Herby

    Ira Baxter Guest

    "Herby" <> wrote in message
    news:...
    > Hi,
    >
    > Im interested in Reverse Engineering C++ source code into a form more
    > comprehensible than the source itself.
    >
    > I want to write a basic one myself, obviously i need to write a parser
    > for the source code.


    You don't need to write a parser to do reverse engineering.
    It is probably true that to do reverse engineering, you will need a parser.

    Building a C++ parser is lot harder than people who have not
    done it think it is. You need a lexer, covering all the standard's dark
    corner requirements.
    You need a preprocessor. You need a non-standard parsing
    engine because C++ isn't LALR, and yacc won't work.
    You need a grammar not just for ANSI C++ but for the dialect
    of C++ you actually have (Sun? GNU? Microsoft?)
    If you are a realist, you'll need a symbol table telling you
    where names are defined and what they are defined as, that
    is scope accurate. Expect building a robust parser to take several
    man-years
    at a minimum; we have considerably more than that in ours
    to address the above issues.

    > Although this has some overlap with say a compiler it would also seem
    > significantly different too.


    Ours captures comments and most preprocessor conditionals unexpanded.

    > Can anyone provide me with links etc on how one would go about writing
    > such a parser?
    > No doubt i would also need a reference to the syntax rules of C++ etc.


    Check comp.compilers and various conferences on reverse engineering.
    You won't find a lot of specific detail; you'll find tantalizing hints of
    how to solve problems but that won't remove the sweat
    equity required. I've been down that route.

    > Secondary can anyone recommend a good tool that currently exists to do
    > the job?


    Depends on what you mean by "reverse engineering".
    If what you want are all the above features packaged in a form in
    which you can construct a reverse engineering tool,
    then DMS may suit your needs:

    www.semanticdesigns.com/Products/FrontEnds/CppFrontEnd.html

    If you mean "a tool that does reverse engineering", then Scientific
    Toolworks may have what you want.

    > Thanks.


    --
    Ira Baxter, CTO
    www.semanticdesigns.com
     
    Ira Baxter, Jun 6, 2007
    #6
  7. Herby

    Herby Guest

    Guys thanks for all the interesting responses.

    I have worked as a software developer for 10+ years mostly in
    maintenance mode for medium to large C++ projects. Usually these
    projects do not have some kind of design roadmap to guide you into
    them.
    I feel this is much more the reality.

    At best you have some kind of source browser within your IDE, find all
    references, goto definition etc.

    In this time i have come up with some ideas of my own that build on
    these and i would really like to try them out. So i am reversing the
    source to something more abstract allowing to reason more effectively
    with the source i may be about to modify.

    http://www.objectmentor.com/resources/downloads.html

    The about link is a script that gives some design quality metrics for
    a set of header files.
    Its a good start, but id like to write something proper and take the
    idea much further...

    Again these are some of the tools on the market -
    http://en.wikipedia.org/wiki/List_of_tools_for_static_code_analysis

    So hope this makes it clear what im trying to achieve.
     
    Herby, Jun 7, 2007
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. cppaddict
    Replies:
    11
    Views:
    27,296
    evandropaes
    Dec 25, 2008
  2. prospring
    Replies:
    1
    Views:
    540
    Scott Smith
    Nov 28, 2005
  3. Kan
    Replies:
    0
    Views:
    420
  4. padma
    Replies:
    3
    Views:
    407
    Victor Bazarov
    Oct 5, 2007
  5. Replies:
    0
    Views:
    422
Loading...

Share This Page