C parser yielding syntax tree data structure?

Discussion in 'C Programming' started by (Jamie Andrews), Apr 8, 2006.

  1. For a research project, we're looking for a reliable parser for C
    that will take an ANSI C program and yield a tree representation of
    the program (as a Java or C++ object). Of course a grammar e.g. in
    jflex/jbison that will yield the same thing is fine too. We have been
    able to find some grammars and parsers, of unknown reliability, that
    don't yield a syntax tree; we want to avoid starting with a flaky
    parser and/or adding the syntax tree code.

    Preferably the tokens in the tree will contain information
    on the line number and character number of the token, but if it
    is sufficiently easy to add that code, then we can do that too.

    Thanks for any info you can give.

    --Jamie. (efil4dreN)
    (Jamie Andrews), Apr 8, 2006
    #1
    1. Advertising

  2. "Jamie Andrews" <> wrote in message
    > For a research project, we're looking for a reliable parser for C
    > that will take an ANSI C program and yield a tree representation of
    > the program (as a Java or C++ object). Of course a grammar e.g. in
    > jflex/jbison that will yield the same thing is fine too. We have been
    > able to find some grammars and parsers, of unknown reliability, that
    > don't yield a syntax tree; we want to avoid starting with a flaky
    > parser and/or adding the syntax tree code.
    >
    > Preferably the tokens in the tree will contain information
    > on the line number and character number of the token, but if it
    > is sufficiently easy to add that code, then we can do that too.


    (Since this is cross-posted, for those on comp.lang.c: yes, I've
    posted most of these links previously...)

    I don't know which if any of these may fulfill your needs, but they may be
    worth a look. I also noticed some of the links are bad as I posted, but
    they may still help you to track them down.

    CIL - C Intermediate Language - C to C transformation
    http://manju.cs.berkeley.edu/cil/

    WCC - A C Subset Compiler (DECUS ftp links now appear to be dead...sorry)
    http://www.decus.org/libcatalog/description_html/v00281.html
    ftp://ftp.encompassus.org/lib/

    npath - C Source Complexity Measures
    http://www.geonius.com/software/tools/npath.html

    Check: A unit test framework for C
    http://check.sourceforge.net/

    CTool Library (call-graph generator, source transformations)
    http://ctool.sourceforge.net/

    Cproto automatically generates C function prototypes
    http://cproto.sourceforge.net/

    JSCPP - a C preprocessor + parser with special modes
    http://www.die-schoens.de/prg/

    CXREF C language cross referencing program
    in volume1 of comp.sources.unix:
    http://ftp.sunet.se/pub/usenet/ftp.uu.net/comp.sources.unix/

    CSur Le projet Csur (in French)
    An analyzer of code C to detect common program execution errors
    http://www.lsv.ens-cachan.fr/~goubault/Csur/csur.html

    Chico State Mini-C Compiler (CSMCC) is a student training load-and-go
    compiler (incomplete, teaching tool)
    http://www.ecst.csuchico.edu/~sameerg/compproj.html
    http://www.ecst.csuchico.edu/~hilzer/csci250/proj/

    Edward Willink's C++ grammars:
    http://www.computing.surrey.ac.uk/research/dsrg/fog/
    (some of the links have an extra text '/v' in them, just delete)

    ISO C/C++ grammars version 1.2 (c-c++-grammars-1.2.tar.gz)
    http://www.sigala.it/sandro/download.php

    A C99 Parser, a recursive decent parser
    http://www.mazumdar.demon.co.uk/c_parser.html

    Ctags generates an index (or tag) file of language objects
    http://ctags.sourceforge.net/

    Cdecl English<->C translator for C declarations
    cdecl in volume6 of comp.sources.unix:
    cdecl2 in volume14 of comp.sources.unix:
    http://ftp.sunet.se/pub/usenet/ftp.uu.net/comp.sources.unix/


    Rod Pemberton
    Rod Pemberton, Apr 9, 2006
    #2
    1. Advertising

  3. (Jamie Andrews)

    tmp123 Guest

    (Jamie Andrews) wrote:
    > For a research project, we're looking for a reliable parser for C
    > that will take an ANSI C program and yield a tree representation of
    > the program (as a Java or C++ object). ...


    I've not done it, but, if I should solve the same problem, my first
    step will be see if a C compiler can "dump" the tree in a readable
    format. By example, gcc allows the options -fdump-tree-xxxx

    It could work, ... or not.
    tmp123, Apr 9, 2006
    #3
  4. (Jamie Andrews) wrote:

    > For a research project, we're looking for a reliable parser for C
    > that will take an ANSI C program and yield a tree representation of
    > the program (as a Java or C++ object). Of course a grammar e.g. in
    > jflex/jbison that will yield the same thing is fine too. We have been
    > able to find some grammars and parsers, of unknown reliability, that
    > don't yield a syntax tree; we want to avoid starting with a flaky
    > parser and/or adding the syntax tree code.


    On my search for a C++ Parser that yields an AST, I tried two
    parsers, that look fine for C, while not being able to parse all C++
    constructs.

    - C or C++ grammar for ANTLR (http://www.antlr.org/grammar/list)

    - ELSA/Elkhound (http://www.cs.berkeley.edu/~smcpeak/elkhound/)

    I am currently using ELSA, hoping the few remaining bugs
    (resp. C++) are fixed some time.

    Cheers,
    Arndt
    Arndt Muehlenfeld, Apr 13, 2006
    #4
  5. (Jamie Andrews)

    Ira Baxter Guest

    "Jamie Andrews" <> wrote in message
    > For a research project, we're looking for a reliable parser for C
    > that will take an ANSI C program and yield a tree representation of
    > the program (as a Java or C++ object). ...


    The DMS Software Reengineering Toolkit provides a full ANSI C front
    end, with preprocessor, builds ASTs and symbol table information, and
    provides facilities for constructing custom analyzers and
    source-to-source transformations. See
    http://www.semdesigns.com/Products/FrontEnds/CFrontEnd.html



    --
    Ira Baxter, CTO
    www.semanticdesigns.com
    Ira Baxter, Apr 13, 2006
    #5
  6. On Wednesday 12 April 2006 22:48, Arndt Muehlenfeld wrote:

    > (Jamie Andrews) wrote:
    >> For a research project, we're looking for a reliable parser for C
    >> that will take an ANSI C program and yield a tree representation of
    >> the program (as a Java or C++ object). Of course a grammar e.g. in
    >> jflex/jbison that will yield the same thing is fine too. We have been
    >> able to find some grammars and parsers, of unknown reliability, that
    >> don't yield a syntax tree; we want to avoid starting with a flaky
    >> parser and/or adding the syntax tree code.


    Consider ROSE
    http://www.llnl.gov/CASC/rose/

    I understand that another version is due within a month or two.

    -paul-
    --
    Paul E. Black ()
    Paul E. Black, Apr 14, 2006
    #6
  7. > > For a research project, we're looking for a reliable parser for C
    > > that will take an ANSI C program and yield a tree representation of
    > > the program (as a Java or C++ object). ...


    > I don't know which if any of these may fulfill your needs, but they may be
    > worth a look. I also noticed some of the links are bad as I posted, but
    > they may still help you to track them down.
    >
    > CIL - C Intermediate Language - C to C transformation
    > http://manju.cs.berkeley.edu/cil/
    >
    > WCC - A C Subset Compiler (DECUS ftp links now appear to be dead...sorry)
    > http://www.decus.org/libcatalog/description_html/v00281.html
    > ftp://ftp.encompassus.org/lib/
    >
    > npath - C Source Complexity Measures
    > http://www.geonius.com/software/tools/npath.html
    >
    > Check: A unit test framework for C
    > http://check.sourceforge.net/
    >
    > CTool Library (call-graph generator, source transformations)
    > http://ctool.sourceforge.net/
    >
    > Cproto automatically generates C function prototypes
    > http://cproto.sourceforge.net/
    >
    > JSCPP - a C preprocessor + parser with special modes
    > http://www.die-schoens.de/prg/
    >
    > CXREF C language cross referencing program
    > in volume1 of comp.sources.unix:
    > http://ftp.sunet.se/pub/usenet/ftp.uu.net/comp.sources.unix/
    >
    > CSur Le projet Csur (in French)
    > An analyzer of code C to detect common program execution errors
    > http://www.lsv.ens-cachan.fr/~goubault/Csur/csur.html
    >
    > Chico State Mini-C Compiler (CSMCC) is a student training load-and-go
    > compiler (incomplete, teaching tool)
    > http://www.ecst.csuchico.edu/~sameerg/compproj.html
    > http://www.ecst.csuchico.edu/~hilzer/csci250/proj/
    >
    > Edward Willink's C++ grammars:
    > http://www.computing.surrey.ac.uk/research/dsrg/fog/
    > (some of the links have an extra text '/v' in them, just delete)
    >
    > ISO C/C++ grammars version 1.2 (c-c++-grammars-1.2.tar.gz)
    > http://www.sigala.it/sandro/download.php
    >
    > A C99 Parser, a recursive decent parser
    > http://www.mazumdar.demon.co.uk/c_parser.html
    >
    > Ctags generates an index (or tag) file of language objects
    > http://ctags.sourceforge.net/
    >
    > Cdecl English<->C translator for C declarations
    > cdecl in volume6 of comp.sources.unix:
    > cdecl2 in volume14 of comp.sources.unix:
    > http://ftp.sunet.se/pub/usenet/ftp.uu.net/comp.sources.unix/


    These additional links may be of some use. ASTRÉE appears to be great
    but I don't see any code release...

    CCURED memory safe C transformations (for CIL)
    http://manju.cs.berkeley.edu/ccured/

    C Code Checker (for CIL)
    http://www.drugphish.ch/~jonny/cca.html

    PScan Scan C files for format string overflows
    http://www.striker.ottawa.on.ca/~aland/pscan/

    CQUAL C checking through extended type qualifiers
    http://www.cs.umd.edu/~jfoster/cqual/

    Smatch - Source Matcher, C source checker for Linux Kernel
    http://smatch.sourceforge.net/

    SPLint Secure Programming Lint error detection
    http://www.splint.org

    BOON Buffer Overrun detectiON
    http://www.cs.berkeley.edu/~daw/boon/

    CZECH, project pedantic error detection
    http://pedantic.sourceforge.net/

    Flawfinder for C (in Python)
    http://www.dwheeler.com/flawfinder/

    ASTRÉE determines absence of runtime errors (in OCAML)
    http://www.astree.ens.fr/
    "In Nov. 2003, ASTRÉE was able to prove completely automatically the absence
    of any RTE in the primary flight control software of the Airbus A340
    fly-by-wire system, a program of 132,000 lines of C"


    Rod Pemberton
    Rod Pemberton, Apr 16, 2006
    #7
  8. (Jamie Andrews)

    Guest

    (Jamie Andrews) wrote:
    > For a research project, we're looking for a reliable parser for C
    > that will take an ANSI C program and yield a tree representation of
    > the program (as a Java or C++ object). Of course a grammar e.g. in
    > jflex/jbison that will yield the same thing is fine too. We have been
    > able to find some grammars and parsers, of unknown reliability, that
    > don't yield a syntax tree; we want to avoid starting with a flaky
    > parser and/or adding the syntax tree code.
    >
    > Preferably the tokens in the tree will contain information
    > on the line number and character number of the token, but if it
    > is sufficiently easy to add that code, then we can do that too.
    >
    > Thanks for any info you can give.


    Linus Torvalds of Linux fame once wrote one of these called "sparse".
    See http://freshmeat.net/projects/sparse/

    I think the latest version is in
    http://www.kernel.org/pub/scm/devel/sparse/ you need to use the "git"
    version control system to download it unfortunately.

    It doesn't give C++ or Java objects as an output. But, as you'll find
    there is not much gained by treating a whole tree as an object, the
    important thing is that it has a single root.

    I have no idea how good this parser is, you'll have to find that out
    for yourself. It's hard to tell how good any C parser is that doesn't
    get heavy use, and hard enough to tell if a parser does get heavy use.
    , Apr 22, 2006
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.

Share This Page