SeeGramWrap, a C parser

Edward C. Jones · Mar 5, 2004

I have uploaded "SeeGramWrap-03.02.2004.tgz" to my webpage at
"http://members.tripod.com/~edcjones/pycode.html". SeeGramWrap parses a
piece of C code and the resulting parse tree is output in man and
machine readable form. The result can be used for program
transformations. Since a particular trnsformation algorithm may not
require all the information present in the tree, the user can select
what to output.

This program has been written and tested only under linux.

Thanks to John Mitchell and Monty Zukowski for "cgram.tgz". Every parser
generator need to have a good C grammar. Also thanks to Terrence Parr
for ANTLR (http://www.antlr.org/).

Note: I have also uploaded
pydocs.tar.gz which searchs the Python documentation.
linenum.py which prints a linenumber and a message. Useful for debugging.

========================================================================
CONTEXT

A number of large C libraries have been wrapped so they can be called by
Python. The wrapping code is repetitive and there may be a lot of it so
methods have been developed for automated wrapping.

The best-known approach is SWIG (http://www.swig.org/). For complex
wrappings, SWIG requires the writing of "typemaps", an unintuitive
process where pieces of C code you write are spliced into the wrapper
code generated by SWIG.

Another wrapper related approach is Pyrex which is found at
http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/
Pyrex has its own repetitive boilerplate that has to be written. But the
Pyrex boilerplate is so straightforward that it can be taught
algorithmically. See "Michael's Quick Guide to Pyrex" at
"http://ldots.org/pyrex-guide/".

I think that the Pyrex boilerplate is _so_ straightforward that it can
be machine generated. Therefore I have been sporatically developing
software to do this. A thoroughly buggy version of this is on my web
page, "http://members.tripod.com/~edcjones". It is called "cgram.tar.gz"
(The name will be changed). Look at it but don't use it. "SeeGramWrap"
is a major revision of the front end of "cgram.tr.gz".

I think the automatic-wrapper program can be made to work. It might be
easier to use than SWIG. It is still a lot of work to prepare complex C
header files. What we have is really a "program transformation" or "tree
transformation" problem.

I think some of the issues are:

1. Since parser generators have a long and steep learning curve, I
prefer to use them as black boxes which generate parsers which output
results that I can analyze using Python. The parser created by a parser
generator should output trees in two formats: one easy to look at and
another that a program can easily read. For examples, see below.

2. I find trees very easy to work with. I want the trees to be front and
center and highly visible. I prefer to "manipulate a tree" rather than
"fire a rule".

3. The most common type of C macro has a type as one of its arguments:

#define CAST(x, type) (type *) x

How can these be automatically wrapped for Python which is a dynamically
typed language?

888888888888888888888888888888888888888888888888888888888888888888888888
TECHNICAL OVERVIEW

I use some C grammars associated with ANTLR. The grammar package is
called "cgram". See "http://www.antlr.org/resources.html".

In "cgram" there is a java program "TestThrough.java" which parses C
code into an AST then runs a tree grammar on the AST and outputs the
original code. The tree grammar is named "GnuCEmitter.g". I work with
this grammar because the terminal tokens are printed in the correct
order. I modified the grammar turning it into a template. A piece of the
original "GnuCEmitter.g" is:

----
typeQualifier
: a:"const" { print( a ); }
| b:"volatile" { print( b ); }
;
----

The modified version is:

----
typeQualifier
: a:"const" { <@ a @> }
| b:"volatile" { <@ b @> }
;
----

In this template, strings of the form "<@ ... @>" will each be replaced
by a set of print statements. Moreover the entire rule will be wrapped
by prints. The template is used in "emitter/insert_prints.py". If
"insert_prints.py" is run the result is:

----
typeQualifier
{ if ( inputState.guessing==0 ) {
print(Open);
print("typeQualifier");
}
}
: (
a:"const" { print(Open);
print("typeQualifier.0"); print( a ); print(Close); }
| b:"volatile" { print(Open);
print("typeQualifier.1"); print( b ); print(Close); }
)
{ currentOutput.print(Close + MyTokenSep); }
;
----

If the original C program , "temp2.c", is

char* s = "ab";

The output of the modified emitter grammar is "temp2.c.data":

----
<<OPEN>> <<OPEN>> <<OPEN>>
externalList declarator expr
<<OPEN>> <<OPEN>> <<OPEN>>
externalDef pointerGroup primaryExpr
<<OPEN>> <<OPEN>> <<OPEN>>
declaration pointerGroup.0 stringConst
<<OPEN>> * <<OPEN>>
declSpecifiers <<CLOSE>> stringConst.0
<<OPEN>> <<CLOSE>> "ab"
typeSpecifier <<OPEN>> <<CLOSE>>
<<OPEN>> declarator.0 <<CLOSE>>
typeSpecifier.1 s <<CLOSE>>
char <<CLOSE>> <<CLOSE>>
<<CLOSE>> <<CLOSE>> <<CLOSE>>
<<CLOSE>> <<OPEN>> <<CLOSE>>
<<CLOSE>> initDecl.0 <<CLOSE>>
<<OPEN>> = ;
initDeclList <<CLOSE>> <<CLOSE>>
<<OPEN>> <<OPEN>> <<CLOSE>>
initDecl initializer <<CLOSE>>
----

This output can be processed by "tree.py" to produce "temp2.c.nest"

----
(externalList,
(externalDef,
(declaration,
(declSpecifiers,
(typeSpecifier,
(typeSpecifier.1, |char|))),
(initDeclList,
(initDecl,
(declarator,
(pointerGroup,
(pointerGroup.0, |*|)),
(declarator.0, |s|)),
(initDecl.0, |=|),
(initializer,
(expr,
(primaryExpr,
(stringConst,
(stringConst.0, |"ab"|))))))), |;|)))
----

or "temp2.c.src":

char * s = "ab" ;

If "temp2.c.src" is put through the entire process itself we get
"temp2.c.src.src" which is identical to "temp2.c.src". This test is done
by "docheck.py".

In the ".data" or ".nest" files the tokens from the original C code are
in the correct order. It is easy to recover

('char', '*', 's', '=', '"ab"', ';')

simple ElementTree based parser that allows entity definition map	0	Dec 4, 2013
Drawing missing in bitmap in a pure C win32 program	4	Jun 3, 2023
How to write a language parser ?	5	Feb 22, 2013
Lexical Analysis on C++	1	Oct 31, 2023
HELP:function at c returning (null)	4	Mar 21, 2024
How to try a range of hex values in C# code ?	0	Nov 19, 2022
Become a C++ programmer	5	Feb 21, 2023
Peasy: an easy but powerful parser	0	Aug 26, 2013

SeeGramWrap, a C parser

Edward C. Jones

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads