Parsing C Preprocessor files

B

Bram Stolk

Hi there,

What could I use to parse CPP macros in Python?
I tried the Parnassus Vaults, and python lib docs, but could not
find a suitable module.

Thanks,

Bram


--
------------------------------------------------------------------------------
Bram Stolk, VR Engineer.
SARA Academic Computing Services Amsterdam, PO Box 94613, 1090 GP AMSTERDAM
email: (e-mail address removed) Phone +31-20-5923059 Fax +31-20-6683167

"Software is math. Math is not patentable."
OR
"Software is literature. Literature is not patentable." -- slashdot comment
------------------------------------------------------------------------------
 
P

Peter Hansen

Bram said:
What could I use to parse CPP macros in Python?
I tried the Parnassus Vaults, and python lib docs, but could not
find a suitable module.

Does it really need to be in Python? There are probably
dozens of free and adequate macro preprocessors out there
already.

(You might also want to clarify what you mean by "parse"
in this case... do you mean actually running the whole
preprocessor over an input file and expanding all macros,
or do you mean something else?)

-Peter
 
B

Bram Stolk

Does it really need to be in Python? There are probably
dozens of free and adequate macro preprocessors out there
already.

I want to trigger Python actions for certain nodes or states in the
parse tree. I want to traverse this tree, an be able to make
intelligent actions. For this, I want to use python.
(You might also want to clarify what you mean by "parse"
in this case... do you mean actually running the whole
preprocessor over an input file and expanding all macros,
or do you mean something else?)

Roughly speaking, I want to be able to identify sections that are
guarded with #ifdef FOO
Because conditionals can be nested, you would have to count the
ifs/endifs, and additionally, the conditional values may depend on other
preprocessor command, e.g. values may have been defined in included
files.

If I can traverse the #if/#endif tree in Python, a preprocessor file
becomes much more managable.

Bram


--
------------------------------------------------------------------------------
Bram Stolk, VR Engineer.
SARA Academic Computing Services Amsterdam, PO Box 94613, 1090 GP AMSTERDAM
email: (e-mail address removed) Phone +31-20-5923059 Fax +31-20-6683167

"Software is math. Math is not patentable."
OR
"Software is literature. Literature is not patentable." -- slashdot comment
------------------------------------------------------------------------------
 
P

Paul McGuire

Bram Stolk said:
Hi there,

What could I use to parse CPP macros in Python?
I tried the Parnassus Vaults, and python lib docs, but could not
find a suitable module.

Thanks,

Bram

Try pyparsing, at http://pyparsing.sourceforge.net . The examples include a
file scanExamples.py, that does some simple C macro parsing. This should be
pretty straightforward to adapt to matching #ifdef's and #endif's.

-- Paul
(I'm sure pyparsing is listed in Vaults of Parnassus. Why did you think it
would not be applicable?)
 
B

Bram Stolk

(I'm sure pyparsing is listed in Vaults of Parnassus. Why did you think it
would not be applicable?)

Because I searched for "parser", "macro", "preprocessor", "cpp", and none
of those searches comes up with "pyparsing". I should have searched for
"parsing" I guess.

Bram


--
------------------------------------------------------------------------------
Bram Stolk, VR Engineer.
SARA Academic Computing Services Amsterdam, PO Box 94613, 1090 GP AMSTERDAM
email: (e-mail address removed) Phone +31-20-5923059 Fax +31-20-6683167

"Software is math. Math is not patentable."
OR
"Software is literature. Literature is not patentable." -- slashdot comment
------------------------------------------------------------------------------
 
B

Bram Stolk

pyHi(),

I would like to thank the people who responded on my question about
preprocessor parsing. However, I think I will just roll my own, as I
found out that it takes a mere 16 lines of code to create a #ifdef tree.

I simply used a combination of lists and tuples. A tuple denotes a #if
block (startline,body,endline). A body is a list of lines/tuples.

This will parse the following text:

Top level line
#if foo
on foo level
#if bar
on bar level
#endif
#endif
#ifdef bla
on bla level
#ifdef q
q
#endif
#if r
r
#endif
#endif

into:

['Top level line\n', ('#if foo\n', ['on foo level\n', ('#if bar\n', ['on bar level\n'], '#endif\n')], '#endif\n'), ('#ifdef bla\n', ['on bla level\n', ('#ifdef q\n', ['q\n'], '#endif\n'), ('#if r\n', ['r\n'], '#endif\n')], '#endif\n')]

Which is very suitable for me.

Code is:

def parse_block(lines) :
retval = []
while lines :
line = lines.pop(0)
if line.find("#if") != -1 :
headline = line
b=parse_block(lines)
endline = lines.pop(0)
retval.append( (headline, b, endline) )
else :
if line.find("#endif") != -1 :
lines.insert(0, line)
return retval
else :
retval.append(line)
return retval

And pretty pretting with indentation is easy:

def traverse_block(block, indent) :
while block:
i = block.pop(0)
if type(i) == type((1,2,3)) :
print indent*"\t"+i[0],
traverse_block(i[1], indent+1)
print indent*"\t"+i[2],
else :
print indent*"\t"+i,

I think extending it with '#else' is trivial. Handling includes and
expressions is much harder ofcourse, but not immediately req'd for me.

Bram

Hi there,

What could I use to parse CPP macros in Python?
I tried the Parnassus Vaults, and python lib docs, but could not
find a suitable module.

--
------------------------------------------------------------------------------
Bram Stolk, VR Engineer.
SARA Academic Computing Services Amsterdam, PO Box 94613, 1090 GP AMSTERDAM
email: (e-mail address removed) Phone +31-20-5923059 Fax +31-20-6683167

"Software is math. Math is not patentable."
OR
"Software is literature. Literature is not patentable." -- slashdot comment
------------------------------------------------------------------------------
 
?

=?iso-8859-15?Q?Pierre-Fr=E9d=E9ric_Caillaud?=

Nice and simple algorithm, but you should use an iterator to iterate over
your lines, or else shifting your big array of lines with pop() is gonna
be very slow.

Instead of :
line = lines.pop(0)

Try :

lines = iter( some line array )

Or just pass the file handle ; python will split the lines for you.

You can replace your "while lines" with a "for" on this iterator. You'll
need to avoid pushing data in the array (think about it)...

also "#if" in line is prettier.

Another way to do it is without recursion : have an array which is your
stack, advance one level when you get a #if, go back one level at #endif ;
no more recursion.


Have fun !
 
?

=?iso-8859-15?Q?Pierre-Fr=E9d=E9ric_Caillaud?=

I thought about it and...

Here's a stackless version with #include and #if. 20 minutes in the
making...
You'll need a pen and paper to figure how the stack works though :) but
it's fun.
It uses references...


file1 = """Top level line
#if foo
on foo level
#if bar
on bar level
#endif
re foo level
#include file2
#else
not foo
#endif
top level
#ifdef bla
on bla level
#ifdef q
q
#else
not q
#endif
check
#if r
r
#endif
#endif"""

file2 = """included file:
#ifdef stuff
stuff level
#endif
"""

# simple class to process included files
class myreader( object ):
def __init__(self):
self.queue = [] # queue of iterables to be played

def __iter__(self):
return self

# insert an iterable into the current flow
def insert( self, iterator ):
self.queue.append( iterator )

def next(self):
while self.queue:
try:
return self.queue[-1].next()
except StopIteration:
self.queue.pop() # this iterable is finished, throw it away
raise StopIteration

reader = myreader()
reader.insert( iter( file1.split("\n") ))

# stackless parser !
result = []
stack = [result]
stacktop = stack[-1]

for line in reader:
ls = line.strip()
if ls.startswith( "#" ): # factor all # cases for speed
keyword = ls.split(" \t\r\n",1)[0]
if keyword == "#if":
next = []
stacktop.append( [line, next] )
stack.append( next )
stacktop = next
elif keyword == "#else":
stack.pop()
stack[-1][-1].append(line)
next = []
stack[-1][-1].append( next )
stack.append( next )
stacktop = next
elif keyword == "#endif":
stack.pop()
stack[-1][-1] = tuple( stack[-1][-1] + [line] )
elif keyword == "#include":
# I don't parse the filename... replace the iter() below by something
like open(filename)
reader.insert( iter(file2.split("\n")) )
else:
stacktop.append(line)

def printblock(block, indent=0) :
ind = "\t"*indent
for elem in block:
if type( elem ) == list:
printblock( elem, indent+1 )
elif type( elem ) == tuple:
printblock( elem, indent )
else:
print ind, elem

print result
printblock(result)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,763
Messages
2,569,562
Members
45,038
Latest member
OrderProperKetocapsules

Latest Threads

Top