Choosing the right parser for parsing C headers

  • Thread starter Jean de Largentaye
  • Start date
J

Jean de Largentaye

Hi,

I need to parse a subset of C (a header file), and generate some unit
tests for the functions listed in it. I thus need to parse the code,
then rewrite function calls with wrong parameters. What I call "shaking
the broken tree" :)
I chose to make my UT-generator in Python 2.4. However, I am now
encountering problems in choosing the right parser for the job. I
struggle in choosing between the inappropriate, the out-of-date, the
alpha, or the too-big-for-the task...

So far I've indentified 9(!) potential candidates (Mostly taken from
the http://www.python.org/moin/LanguageParsing page) :

- Plex:
Only a lexical analyser as far as I understand. Kinda RE++, no syntax
processing
- ply:
Lex / Yacc for python! Tackle the Beast! Syntax processing looks
complex..
- Pyggy:
Lex / Yacc -styled too. More recent, but will a 0.4 version be good
enough?
- PyLR:
fast parser with core functions in C... hasn't moved since '97
- Pyparsing:
quick and easy parser... but I don't think it does more than lexical
analysis
- spark:
Here's some wood. Now build your house.
- yapps2 :
yapps2+ (I hesitate to call it yapps3):
chosen by http://www.python.org/sigs/parser-sig/towards-standard.html.
Is the choice up-to-date?
But will it do for parsing C?
- TPG (Toy Parser Generator):
looks cool
- ANTLR (latest version from Jan 28 produces Python code) :
Seems powerful and has a lot of support, but I don't want to have to
use an exterior Java tool. Furthermore, does it let me control what
happens at each stage easily, or does it just make me a compiler?

I've omitted these: shlex, kwparsing (webpage?), PyBison, Trap
(webpage?), DParser, and SimpleParse (I don't want the extra
dependancy).

I was hoping for a quick and easy choice, but got caught in the tar pit
of Too Much Information. Parsing is a large and complex field. As an
added handicap, I'm new to the dark minefield of parsers... I've had
some experience with Lex/Yacc, and have some knowledge of parser
theory, through a course on compilators. I am thus used to EBNF-style
grammar.
I was disappointed to see that Parser-SIG has died out.
Would you have any ideas on which parser is best suited for the task?

John
 
F

Fredrik Lundh

Jean said:
I need to parse a subset of C (a header file), and generate some unit
tests for the functions listed in it. I thus need to parse the code,
then rewrite function calls with wrong parameters. What I call "shaking
the broken tree" :)

I chose to make my UT-generator in Python 2.4. However, I am now
encountering problems in choosing the right parser for the job. I
struggle in choosing between the inappropriate, the out-of-date, the
alpha, or the too-big-for-the task...

why not use a real compiler?

http://www.boost.org/libs/python/pyste/
http://www.gccxml.org/HTML/Index.html

</F>
 
T

Thomas Heller

Jean de Largentaye said:
Hi,

I need to parse a subset of C (a header file), and generate some unit
tests for the functions listed in it. I thus need to parse the code,
then rewrite function calls with wrong parameters. What I call "shaking
the broken tree" :)

IMO, for parsing 'real-world' C header files, nothing can beat gccxml.

Thomas
 
F

Fredrik Lundh

Thomas said:
IMO, for parsing 'real-world' C header files, nothing can beat gccxml.

no free tool, at least. if a budget is involved, I'd recommend checking
out the Edison Design Group stuff.

</F>
 
J

Jean de Largentaye

GCC-XML looks like a very interesting alternative, as Python includes
tools to parse XML.
The mini-C compiler looks like a step in the right direction for me.
I'm going to look into that.
I'm not comfortable with C++ yet, and am not sure how I'd use Pyste.

Thanks for the information guys, you've been quite helpful!

John
 
F

Fredrik Lundh

Jean said:
GCC-XML looks like a very interesting alternative, as Python includes
tools to parse XML.
The mini-C compiler looks like a step in the right direction for me.
I'm going to look into that.
I'm not comfortable with C++ yet, and am not sure how I'd use Pyste.

to clarify, Pyste is a Python tool that uses GCCXML to generate bindings; it might
not be something that you can use out of the box for your project, but it's definitely
something you should study, and perhaps borrow implementation ideas from.

</F>
 
J

Jean de Largentaye

That looks cool Roman, however, I'm behind a Corporate Firewall, is
there any chance you could send me a cvs snapshot?

John
 
P

Paddy McCarthy

Jean said:
Hi,

I need to parse a subset of C (a header file), and generate some unit
tests for the functions listed in it. I thus need to parse the code,
then rewrite function calls with wrong parameters. What I call "shaking
the broken tree" :)
I chose to make my UT-generator in Python 2.4. However, I am now
encountering problems in choosing the right parser for the job. I
struggle in choosing between the inappropriate, the out-of-date, the
alpha, or the too-big-for-the task...

Why not see if the output from a tags file generator such as ctags or
etags will do what you want.

I often find that some simpler tools do 95% of the work and it is easier
to treat the other five percent as broken-input.

try http://ctags.sourceforge.net/


- Paddy.
 
J

John Machin

Jean said:
Hi,

I need to parse a subset of C (a header file), and generate some unit
tests for the functions listed in it. I thus need to parse the code,
then rewrite function calls with wrong parameters. What I call "shaking
the broken tree" :)

I was thinking "cdecl", and googling brought up this:

http://arrowtheory.com/software/python/

Another option, which I used recently when I had to parse a whole bunch
of Oracle 'create table' scripts [with semi-structured comments which
had to be mined for additional info]: write a recursive descent parser
-- but maybe the grammar of C function declarations is too complicated
for this.

HTH,
John
 
C

Caleb Hattingh

Jean, Paddy

I use "pym" to extract bits of pascal out of delphi code for documentation
purposes. You have to add some stuff to the delphi code (in your case, C
header), but these are added within comment blocks, and the interesting
thing is that you add python code(!) as a kind of dynamic markup which pym
executes while parsing the file.

In other words, you can write python code within a comment block in your
C-header to generate unit-tests into other files, and get that code
executed with pym.

Keep well
Caleb
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top