Parsing C header files with python

I

Ian McConnell

I've got a header file which lists a whole load of C functions of the form

int func1(float *arr, int len, double arg1);
int func2(float **arr, float *arr2, int len, double arg1, double arg2);

It's a numerical library so all functions return an int and accept varying
combinations of float pointers, ints and doubles.

What's the easiest way breaking down this header file into a list of
functions and their argument using python? Is there something that will
parse this (Perhaps a protoize.py) ? I don't want (or understand!) a full C
parser, just this simple case.

It seems like someone should have done something like this before, but
googling for python, header file and protoize just gives me information on
compiling python. If there isn't anything I'll have a go with regexps.

The reason of parsing the header file is because I want to generate (using
python) a wrapper allow the library to be called from a different language.
I've only got to generate this wrapper once, so the python doesn't have to
be efficient.

Thanks,
Ian
 
V

Ville Vainio

Ian> I've got a header file which lists a whole load of C functions of the form
Ian> int func1(float *arr, int len, double arg1);
Ian> int func2(float **arr, float *arr2, int len, double arg1, double arg2);

Ian> It's a numerical library so all functions return an int and
Ian> accept varying combinations of float pointers, ints and
Ian> doubles.

Ian> What's the easiest way breaking down this header file into a
Ian> list of functions and their argument using python? Is there

Well, what comes immediately to mind (I might be overlooking
something) is that the function name is immediately before '(', and
arguments come after it separated by ','. Start with regexps and work
from there...
 
P

Paul McGuire

Ian McConnell said:
I've got a header file which lists a whole load of C functions of the form

int func1(float *arr, int len, double arg1);
int func2(float **arr, float *arr2, int len, double arg1, double arg2);

It's a numerical library so all functions return an int and accept varying
combinations of float pointers, ints and doubles.

If regexp's give you pause, try this pyparsing example. It makes heavy use
of setting results names, so that the parsed tokens can be easily retrieved
from the results as if they were named attributes.

Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul


------------------------
from pyparsing import *

testdata = """
int func1(float *arr, int len, double arg1);
int func2(float **arr, float *arr2, int len, double arg1, double arg2);
"""

ident = Word(alphas, alphanums + "_")
vartype = Combine( oneOf("float double int") + Optional(Word("*")), adjacent
= False)
arglist = delimitedList( Group(vartype.setResultsName("type") +
ident.setResultsName("name")) )
functionCall = Literal("int") + ident.setResultsName("name") + \
"(" + arglist.setResultsName("args") + ")" + ";"

for fn,s,e in functionCall.scanString(testdata):
print fn.name
for a in fn.args:
print " -", a.type, a.name

------------------------
gives the following output:

func1
- float* arr
- int len
- double arg1
func2
- float** arr
- float* arr2
- int len
- double arg1
- double arg2
 
P

Paddy McCarthy

Ian McConnell said:
I've got a header file which lists a whole load of C functions of the form

int func1(float *arr, int len, double arg1);
int func2(float **arr, float *arr2, int len, double arg1, double arg2);

It's a numerical library so all functions return an int and accept varying
combinations of float pointers, ints and doubles.

What's the easiest way breaking down this header file into a list of
functions and their argument using python? Is there something that will
parse this (Perhaps a protoize.py) ? I don't want (or understand!) a full C
parser, just this simple case.
Thanks,
Ian
Would this suffice:

int func2(float **arr, float *arr2, int len, double arg1, double arg2);

''' int func1(float *arr, int len, double arg1);
int func2(float **arr, float *arr2, int len, double arg1, double arg2);

line = [word for word in re.split(r'[\s,;()]+', line) if word]
if len(line)>2:func2args[line[1]] = line[2:]

{'func1': ['float', '*arr', 'int', 'len', 'double', 'arg1'],
'func2': ['float',
'**arr',
'float',
'*arr2',
'int',
'len',
'double',
'arg1',
'double',
'arg2']}
</CODE>
 
I

Ian McConnell

Paul McGuire said:
If regexp's give you pause, try this pyparsing example. It makes heavy use
of setting results names, so that the parsed tokens can be easily retrieved
from the results as if they were named attributes.

Download pyparsing at http://pyparsing.sourceforge.net.

Thanks. Your example with pyparsing was just what I was looking for. It also
copes very nicely with newlines and spacing in the header file.
 
P

Paul McGuire

Thanks. Your example with pyparsing was just what I was looking for. It also
copes very nicely with newlines and spacing in the header file.
Ian -

It is just at this kind of one-off parsing job that I think pyparsing really
shines. I am sure that you could have accomplished this with regexp's, but
a) it would have taken at least a bit longer
b) it would have required more whitespace handline (such as function decls
that span linebreaks)
c) it would have been trickier to add other unanticipated changes (support
for other arg data types (such as char, long), embedded comments, etc.)

BTW, all it takes to make this grammar comment-immune is to add the
following statement before calling scanString():

functionCall.ignore( cStyleComment )

cStyleComment is predefined in the pyparsing module to recognize /* ... */
comments. Adding this will properly handle (i.e., skip over) definitions
like:

/*
int commentedOutFunc(float arg1, float arg2);
*/

Try that with regexp's!

-- Paul
 
M

Miki Tebeka

Hello Ian,
I've got a header file which lists a whole load of C functions of the form

int func1(float *arr, int len, double arg1);
int func2(float **arr, float *arr2, int len, double arg1, double arg2);

It's a numerical library so all functions return an int and accept varying
combinations of float pointers, ints and doubles.

What's the easiest way breaking down this header file into a list of
functions and their argument using python? Is there something that will
parse this (Perhaps a protoize.py) ? I don't want (or understand!) a full C
parser, just this simple case.
There is an ANSI-C parser in ply (http://systems.cs.uchicago.edu/ply/)
which you can use.

Bye.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top