Read C++ enum in python

L

Ludo

Hello,

I work in a very large project where we have C++ packages and pieces of
python code.

I've been googleing for days but what I find seems really too
complicated for what I want to do.

My business is, in python, to read enum definitions provided by the
header file of an c++ package.
Of course I could open the .h file, read the enum and transcode it by
hand into a .py file but the package is regularly updated and thus is
the enum.

My question is then simple : do we have :
- either a simple way in python to read the .h file, retrieve the c++
enum and provide an access to it in my python script
- either a simple tool (in a long-term it would be automatically run
when the c++ package is compiled) generating from the .h file a .py file
containing the python definition of the enums ?

Thank you for any suggestion.
 
M

MRAB

Ludo said:
Hello,

I work in a very large project where we have C++ packages and pieces of
python code.

I've been googleing for days but what I find seems really too
complicated for what I want to do.

My business is, in python, to read enum definitions provided by the
header file of an c++ package.
Of course I could open the .h file, read the enum and transcode it by
hand into a .py file but the package is regularly updated and thus is
the enum.

My question is then simple : do we have :
- either a simple way in python to read the .h file, retrieve the
c++ enum and provide an access to it in my python script
- either a simple tool (in a long-term it would be automatically run
when the c++ package is compiled) generating from the .h file a .py file
containing the python definition of the enums ?

Thank you for any suggestion.

Speaking personally, I'd parse the .h file using a regular expression
(re module) and generate a .py file. Compilers typically have a way of
letting you run external scripts (eg batch files in Windows or, in this
case, a Python script) when an application is compiled.
 
A

AggieDan04

Hello,

I work in a very large project where we have C++ packages and pieces of
python code.

I've been googleing for days but what I find seems really too
complicated for what I want to do.

My business is, in python, to read enum definitions provided by the
header file of an c++ package.
Of course I could open the .h file, read the enum and transcode it by
hand into a .py file but the package is regularly updated and thus is
the enum.

My question is then simple : do we have :
        - either a simple way in python to read the .h file, retrieve the c++
enum and provide an access to it in my python script

Try something like this:



file_data = open(filename).read()
# Remove comments and preprocessor directives
file_data = ' '.join(line.split('//')[0].split('#')[0] for line in
file_data.splitlines())
file_data = ' '.join(re.split(r'\/\*.*\*\/', file_data))
# Look for enums: In the first { } block after the keyword "enum"
enums = [text.split('{')[1].split('}')[0] for text in re.split(r'\benum
\b', file_data)[1:]]

for enum in enums:
last_value = -1
for enum_name in enum.split(','):
if '=' in enum_name:
enum_name, enum_value = enum_name.split('=')
enum_value = int(enum_value, 0)
else:
enum_value = last_value + 1
last_value = enum_value
enum_name = enum_name.strip()
print '%s = %d' % (enum_name, enum_value)
print
 
M

Mark Tolonen

MRAB said:
Speaking personally, I'd parse the .h file using a regular expression
(re module) and generate a .py file. Compilers typically have a way of
letting you run external scripts (eg batch files in Windows or, in this
case, a Python script) when an application is compiled.

This is what 3rd party library pyparsing is great for:

--------begin code----------
from pyparsing import *

# sample string with enums and other stuff
sample = '''
stuff before

enum hello {
Zero,
One,
Two,
Three,
Five=5,
Six,
Ten=10
}

in the middle

enum blah
{
alpha,
beta,
gamma = 10 ,
zeta = 50
}

at the end
'''

# syntax we don't want to see in the final parse tree
_lcurl = Suppress('{')
_rcurl = Suppress('}')
_equal = Suppress('=')
_comma = Suppress(',')
_enum = Suppress('enum')

identifier = Word(alphas,alphanums+'_')
integer = Word(nums)

enumValue = Group(identifier('name') + Optional(_equal + integer('value')))
enumList = Group(enumValue + ZeroOrMore(_comma + enumValue))
enum = _enum + identifier('enum') + _lcurl + enumList('list') + _rcurl

# find instances of enums ignoring other syntax
for item,start,stop in enum.scanString(sample):
id = 0
for entry in item.list:
if entry.value != '':
id = int(entry.value)
print '%s_%s = %d' % (item.enum.upper(),entry.name.upper(),id)
id += 1
--------------end code------------

Output:
HELLO_ZERO = 0
HELLO_ONE = 1
HELLO_TWO = 2
HELLO_THREE = 3
HELLO_FIVE = 5
HELLO_SIX = 6
HELLO_TEN = 10
BLAH_ALPHA = 0
BLAH_BETA = 1
BLAH_GAMMA = 10
BLAH_ZETA = 50

-Mark
 
N

Neil Hodgson

AggieDan04:
file_data = open(filename).read()
# Remove comments and preprocessor directives
file_data = ' '.join(line.split('//')[0].split('#')[0] for line in
file_data.splitlines())
file_data = ' '.join(re.split(r'\/\*.*\*\/', file_data))

For some headers I tried it didn't work until the .* was changed to a
non-greedy .*? to avoid removing from the start of the first comment to
the end of the last comment.

file_data = ' '.join(re.split(r'\/\*.*?\*\/', file_data))

Neil
 
B

Bill Davy

Mark Tolonen said:
This is what 3rd party library pyparsing is great for:

--------begin code----------
from pyparsing import *

# sample string with enums and other stuff
sample = '''
stuff before

enum hello {
Zero,
One,
Two,
Three,
Five=5,
Six,
Ten=10
}

in the middle

enum blah
{
alpha,
beta,
gamma = 10 ,
zeta = 50
}

at the end
'''

# syntax we don't want to see in the final parse tree
_lcurl = Suppress('{')
_rcurl = Suppress('}')
_equal = Suppress('=')
_comma = Suppress(',')
_enum = Suppress('enum')

identifier = Word(alphas,alphanums+'_')
integer = Word(nums)

enumValue = Group(identifier('name') + Optional(_equal +
integer('value')))
enumList = Group(enumValue + ZeroOrMore(_comma + enumValue))
enum = _enum + identifier('enum') + _lcurl + enumList('list') + _rcurl

# find instances of enums ignoring other syntax
for item,start,stop in enum.scanString(sample):
id = 0
for entry in item.list:
if entry.value != '':
id = int(entry.value)
print '%s_%s = %d' % (item.enum.upper(),entry.name.upper(),id)
id += 1
--------------end code------------

Output:
HELLO_ZERO = 0
HELLO_ONE = 1
HELLO_TWO = 2
HELLO_THREE = 3
HELLO_FIVE = 5
HELLO_SIX = 6
HELLO_TEN = 10
BLAH_ALPHA = 0
BLAH_BETA = 1
BLAH_GAMMA = 10
BLAH_ZETA = 50

-Mark


Python and pythoneers are amazing!
 
L

Ludo

Neil Hodgson a écrit :
For some headers I tried it didn't work until the .* was changed to a
non-greedy .*? to avoid removing from the start of the first comment to
the end of the last comment.

file_data = ' '.join(re.split(r'\/\*.*?\*\/', file_data))

Thank you ! I adopt it !

Cheers.
 
M

Mark Tolonen

[snip]
This is what 3rd party library pyparsing is great for:

--------begin code----------
from pyparsing import *

# sample string with enums and other stuff
sample = '''
stuff before

enum hello {
Zero,
One,
Two,
Three,
Five=5,
Six,
Ten=10
}

in the middle

enum blah
{
alpha,
beta,
gamma = 10 ,
zeta = 50
}

at the end
'''

# syntax we don't want to see in the final parse tree
_lcurl = Suppress('{')
_rcurl = Suppress('}')
_equal = Suppress('=')
_comma = Suppress(',')
_enum = Suppress('enum')

identifier = Word(alphas,alphanums+'_')
integer = Word(nums)

enumValue = Group(identifier('name') + Optional(_equal +
integer('value')))
enumList = Group(enumValue + ZeroOrMore(_comma + enumValue))
enum = _enum + identifier('enum') + _lcurl + enumList('list') + _rcurl

# find instances of enums ignoring other syntax
for item,start,stop in enum.scanString(sample):
id = 0
for entry in item.list:
if entry.value != '':
id = int(entry.value)
print '%s_%s = %d' % (item.enum.upper(),entry.name.upper(),id)
id += 1
--------------end code------------

Output:
HELLO_ZERO = 0
HELLO_ONE = 1
HELLO_TWO = 2
HELLO_THREE = 3
HELLO_FIVE = 5
HELLO_SIX = 6
HELLO_TEN = 10
BLAH_ALPHA = 0
BLAH_BETA = 1
BLAH_GAMMA = 10
BLAH_ZETA = 50

Paul McGuire (pyparsing author) reminded me that:

enum.ignore(cppStyleComment)

before scanString will skip commented out sections as well.

-Mark
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,141
Latest member
BlissKeto
Top