Choosing the right parser for parsing C headers

Discussion in 'Python' started by Jean de Largentaye, Feb 8, 2005.

  1. Hi,

    I need to parse a subset of C (a header file), and generate some unit
    tests for the functions listed in it. I thus need to parse the code,
    then rewrite function calls with wrong parameters. What I call "shaking
    the broken tree" :)
    I chose to make my UT-generator in Python 2.4. However, I am now
    encountering problems in choosing the right parser for the job. I
    struggle in choosing between the inappropriate, the out-of-date, the
    alpha, or the too-big-for-the task...

    So far I've indentified 9(!) potential candidates (Mostly taken from
    the http://www.python.org/moin/LanguageParsing page) :

    - Plex:
    Only a lexical analyser as far as I understand. Kinda RE++, no syntax
    processing
    - ply:
    Lex / Yacc for python! Tackle the Beast! Syntax processing looks
    complex..
    - Pyggy:
    Lex / Yacc -styled too. More recent, but will a 0.4 version be good
    enough?
    - PyLR:
    fast parser with core functions in C... hasn't moved since '97
    - Pyparsing:
    quick and easy parser... but I don't think it does more than lexical
    analysis
    - spark:
    Here's some wood. Now build your house.
    - yapps2 :
    yapps2+ (I hesitate to call it yapps3):
    chosen by http://www.python.org/sigs/parser-sig/towards-standard.html.
    Is the choice up-to-date?
    But will it do for parsing C?
    - TPG (Toy Parser Generator):
    looks cool
    - ANTLR (latest version from Jan 28 produces Python code) :
    Seems powerful and has a lot of support, but I don't want to have to
    use an exterior Java tool. Furthermore, does it let me control what
    happens at each stage easily, or does it just make me a compiler?

    I've omitted these: shlex, kwparsing (webpage?), PyBison, Trap
    (webpage?), DParser, and SimpleParse (I don't want the extra
    dependancy).

    I was hoping for a quick and easy choice, but got caught in the tar pit
    of Too Much Information. Parsing is a large and complex field. As an
    added handicap, I'm new to the dark minefield of parsers... I've had
    some experience with Lex/Yacc, and have some knowledge of parser
    theory, through a course on compilators. I am thus used to EBNF-style
    grammar.
    I was disappointed to see that Parser-SIG has died out.
    Would you have any ideas on which parser is best suited for the task?

    John
     
    Jean de Largentaye, Feb 8, 2005
    #1
    1. Advertising

  2. Jean de Largentaye wrote:

    > I need to parse a subset of C (a header file), and generate some unit
    > tests for the functions listed in it. I thus need to parse the code,
    > then rewrite function calls with wrong parameters. What I call "shaking
    > the broken tree" :)
    >
    > I chose to make my UT-generator in Python 2.4. However, I am now
    > encountering problems in choosing the right parser for the job. I
    > struggle in choosing between the inappropriate, the out-of-date, the
    > alpha, or the too-big-for-the task...


    why not use a real compiler?

    http://www.boost.org/libs/python/pyste/
    http://www.gccxml.org/HTML/Index.html

    </F>
     
    Fredrik Lundh, Feb 8, 2005
    #2
    1. Advertising

  3. "Jean de Largentaye" <> writes:

    > Hi,
    >
    > I need to parse a subset of C (a header file), and generate some unit
    > tests for the functions listed in it. I thus need to parse the code,
    > then rewrite function calls with wrong parameters. What I call "shaking
    > the broken tree" :)


    IMO, for parsing 'real-world' C header files, nothing can beat gccxml.

    Thomas
     
    Thomas Heller, Feb 8, 2005
    #3
  4. Jean  de Largentaye

    Miki Tebeka Guest

    Hello Jean,

    > - ply:
    > Lex / Yacc for python! Tackle the Beast! Syntax processing looks

    mini_c is a C compiler written using ply. You can just use it as is.
    http://people.cs.uchicago.edu/~varmaa/mini_c/

    HTH.
    --
    ------------------------------------------------------------------------
    Miki Tebeka <>
    http://tebeka.bizhat.com
    The only difference between children and adults is the price of the toys
     
    Miki Tebeka, Feb 8, 2005
    #4
  5. Thomas Heller wrote:

    > IMO, for parsing 'real-world' C header files, nothing can beat gccxml.


    no free tool, at least. if a budget is involved, I'd recommend checking
    out the Edison Design Group stuff.

    </F>
     
    Fredrik Lundh, Feb 8, 2005
    #5
  6. GCC-XML looks like a very interesting alternative, as Python includes
    tools to parse XML.
    The mini-C compiler looks like a step in the right direction for me.
    I'm going to look into that.
    I'm not comfortable with C++ yet, and am not sure how I'd use Pyste.

    Thanks for the information guys, you've been quite helpful!

    John
     
    Jean de Largentaye, Feb 8, 2005
    #6
  7. Jean de Largentaye wrote:

    > GCC-XML looks like a very interesting alternative, as Python includes
    > tools to parse XML.
    > The mini-C compiler looks like a step in the right direction for me.
    > I'm going to look into that.
    > I'm not comfortable with C++ yet, and am not sure how I'd use Pyste.


    to clarify, Pyste is a Python tool that uses GCCXML to generate bindings; it might
    not be something that you can use out of the box for your project, but it's definitely
    something you should study, and perhaps borrow implementation ideas from.

    </F>
     
    Fredrik Lundh, Feb 8, 2005
    #7
  8. try http://sourceforge.net/projects/pygccxml
    There are a few examples and nice ( for me ) documentation.

    Roman

    On Tue, 8 Feb 2005 13:35:57 +0100, Fredrik Lundh <> wrote:
    > Jean de Largentaye wrote:
    >
    > > GCC-XML looks like a very interesting alternative, as Python includes
    > > tools to parse XML.
    > > The mini-C compiler looks like a step in the right direction for me.
    > > I'm going to look into that.
    > > I'm not comfortable with C++ yet, and am not sure how I'd use Pyste.

    >
    > to clarify, Pyste is a Python tool that uses GCCXML to generate bindings; it might
    > not be something that you can use out of the box for your project, but it's definitely
    > something you should study, and perhaps borrow implementation ideas from.
    >
    > </F>
    >
    >
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >
     
    Roman Yakovenko, Feb 8, 2005
    #8
  9. That looks cool Roman, however, I'm behind a Corporate Firewall, is
    there any chance you could send me a cvs snapshot?

    John
     
    Jean de Largentaye, Feb 8, 2005
    #9
  10. Jean de Largentaye wrote:
    > Hi,
    >
    > I need to parse a subset of C (a header file), and generate some unit
    > tests for the functions listed in it. I thus need to parse the code,
    > then rewrite function calls with wrong parameters. What I call "shaking
    > the broken tree" :)
    > I chose to make my UT-generator in Python 2.4. However, I am now
    > encountering problems in choosing the right parser for the job. I
    > struggle in choosing between the inappropriate, the out-of-date, the
    > alpha, or the too-big-for-the task...


    Why not see if the output from a tags file generator such as ctags or
    etags will do what you want.

    I often find that some simpler tools do 95% of the work and it is easier
    to treat the other five percent as broken-input.

    try http://ctags.sourceforge.net/


    - Paddy.
     
    Paddy McCarthy, Feb 8, 2005
    #10
  11. Jean  de Largentaye

    John Machin Guest

    Jean de Largentaye wrote:
    > Hi,
    >
    > I need to parse a subset of C (a header file), and generate some unit
    > tests for the functions listed in it. I thus need to parse the code,
    > then rewrite function calls with wrong parameters. What I call

    "shaking
    > the broken tree" :)


    I was thinking "cdecl", and googling brought up this:

    http://arrowtheory.com/software/python/

    Another option, which I used recently when I had to parse a whole bunch
    of Oracle 'create table' scripts [with semi-structured comments which
    had to be mined for additional info]: write a recursive descent parser
    -- but maybe the grammar of C function declarations is too complicated
    for this.

    HTH,
    John
     
    John Machin, Feb 8, 2005
    #11
  12. Jean, Paddy

    I use "pym" to extract bits of pascal out of delphi code for documentation
    purposes. You have to add some stuff to the delphi code (in your case, C
    header), but these are added within comment blocks, and the interesting
    thing is that you add python code(!) as a kind of dynamic markup which pym
    executes while parsing the file.

    In other words, you can write python code within a comment block in your
    C-header to generate unit-tests into other files, and get that code
    executed with pym.

    Keep well
    Caleb


    On Tue, 08 Feb 2005 19:58:33 GMT, Paddy McCarthy <>
    wrote:

    > Jean de Largentaye wrote:
    >> Hi,
    >> I need to parse a subset of C (a header file), and generate some unit
    >> tests for the functions listed in it. I thus need to parse the code,
    >> then rewrite function calls with wrong parameters. What I call "shaking
    >> the broken tree" :)
    >> I chose to make my UT-generator in Python 2.4. However, I am now
    >> encountering problems in choosing the right parser for the job. I
    >> struggle in choosing between the inappropriate, the out-of-date, the
    >> alpha, or the too-big-for-the task...

    >
    > Why not see if the output from a tags file generator such as ctags or
    > etags will do what you want.
    >
    > I often find that some simpler tools do 95% of the work and it is easier
    > to treat the other five percent as broken-input.
    >
    > try http://ctags.sourceforge.net/
    >
    >
    > - Paddy.
     
    Caleb Hattingh, Feb 9, 2005
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mark Siffer

    advice on choosing right control

    Mark Siffer, Jun 16, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    315
    Trevor Benedict R
    Jun 17, 2004
  2. shesh

    Choosing the right platform

    shesh, Jul 18, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    353
    jasonkester
    Jul 18, 2005
  3. Thomas Weholt \( PRIVAT \)

    Choosing the right database-system for a small project

    Thomas Weholt \( PRIVAT \), Jun 24, 2003, in forum: Python
    Replies:
    2
    Views:
    1,071
    Thomas Weholt \( PRIVAT \)
    Jun 24, 2003
  4. Carsten Gehling

    Choosing the right framework

    Carsten Gehling, Jul 16, 2003, in forum: Python
    Replies:
    9
    Views:
    301
    Ville Vainio
    Nov 27, 2003
  5. dont bother
    Replies:
    0
    Views:
    810
    dont bother
    Mar 3, 2004
Loading...

Share This Page