Extracting attributes from compiled python code or parse trees

M

Matteo

Hello-
I am trying to get Python to extract attributes in full dotted form
from compiled expression. For instance, if I have the following:

param = compile('a.x + a.y','','single')

then I would like to retrieve the list consisting of ['a.x','a.y'].
I have tried using inspect to look at 'co_names', but when I do that,
I get:
('co_names', ('a', 'x', 'y'))

with no way to determine that 'x' and 'y' are both attributes of 'a'.

The reason I am attempting this is to try and automatically determine
data dependencies in a user-supplied formula (in order to build a
dataflow network). I would prefer not to have to write my own parser
just yet.

Alternatively, I've looked at the parser module, but I am experiencing
some difficulties in that the symbol list does not seem to match that
listed in the python grammar reference (not surprising, since I am
using python2.5, and the docs seem a bit dated)

In particular:

[258,
[326,
[303,
[304,
[305,
[306,
[307,
[309,
[310,
[311,
[312,
[313,
[314,
[315,
[316, [317, [1, 'a']], [321, [23, '.'], [1,
'x']]]]]]]]]]]]]]]],
[4, ''],
[0, '']]
power

Thus, for some reason, 'a.x' seems to be interpreted as a power
expression, and not an 'attributeref' as I would have anticipated (in
fact, the symbol module does not seem to contain an 'attributeref'
symbol)

(for the curious, here is the relevant part of the AST for "a**x":
[316,
[317, [1, 'a']],
[36, '**'],
[315, [316, [317, [1, 'x']]]]
)

Anyway, I could write an AST analyzer that searches for the correct
pattern, but it would be relying on undocumented behavior, and I'm
hoping there is a better way.

(By the way, I realize that malicious users could almost certainly
subvert my proposed dependency mechanism, but for this project, I'm
guarding against Murphy, not Macchiavelli)

Thanks,
-matt
 
G

Gabriel Genellina

I am trying to get Python to extract attributes in full dotted form
from compiled expression. For instance, if I have the following:

param = compile('a.x + a.y','','single')

then I would like to retrieve the list consisting of ['a.x','a.y'].

The reason I am attempting this is to try and automatically determine
data dependencies in a user-supplied formula (in order to build a
dataflow network). I would prefer not to have to write my own parser
just yet.

If it is an expression, I think you should use "eval" instead of "single"
as the third argument to compile.
Alternatively, I've looked at the parser module, but I am experiencing
some difficulties in that the symbol list does not seem to match that
listed in the python grammar reference (not surprising, since I am
using python2.5, and the docs seem a bit dated)

Yes, the grammar.txt in the docs is a bit outdated (or perhaps it's a
simplified one), see the Grammar/Grammar file in the Python source
distribution.
In particular:

[258,
[326,
[303,
[304,
[305,
[306,
[307,
[309,
[310,
[311,
[312,
[313,
[314,
[315,
[316, [317, [1, 'a']], [321, [23, '.'], [1,
'x']]]]]]]]]]]]]]]],
[4, ''],
[0, '']]
print symbol.sym_name[316]
power

Thus, for some reason, 'a.x' seems to be interpreted as a power
expression, and not an 'attributeref' as I would have anticipated (in
fact, the symbol module does not seem to contain an 'attributeref'
symbol)

Using this little helper function to translate symbols and tokens:

names = symbol.sym_name.copy()
names.update(token.tok_name)
def human_readable(lst):
lst[0] = names[lst[0]]
for item in lst[1:]:
if isinstance(item,list):
human_readable(item)

the same tree becomes:

['eval_input',
['testlist',
['test',
['or_test',
['and_test',
['not_test',
['comparison',
['expr',
['xor_expr',
['and_expr',
['shift_expr',
['arith_expr',
['term',
['factor',
['power',
['atom', ['NAME', 'a']],
['trailer', ['DOT', '.'], ['NAME', 'x']]]]]]]]]]]]]]]],
['NEWLINE', ''],
['ENDMARKER', '']]

which is correct is you look at the symbols in the (right) Grammar file.

But if you are only interested in things like a.x, maybe it's a lot
simpler to use the tokenizer module, looking for the NAME and OP tokens as
they appear in the source expression.
 
P

Peter Otten

Matteo said:
I am trying to get Python to extract attributes in full dotted form
from compiled expression. For instance, if I have the following:

param = compile('a.x + a.y','','single')

then I would like to retrieve the list consisting of ['a.x','a.y'].
I have tried using inspect to look at 'co_names', but when I do that,

You can have a look at the compiler package. A very limited example:

import compiler
import compiler.ast
import sys

class Visitor:
def __init__(self):
self.names = []
def visitName(self, node):
self.names.append(node.name)
def visitGetattr(self, node):
dotted = []
n = node
while isinstance(n, compiler.ast.Getattr):
dotted.append(n.attrname)
n = n.expr
try:
dotted.append(n.name)
except AttributeError:
print >> sys.stderr, "ignoring", node
else:
self.names.append(".".join(reversed(dotted)))


if __name__ == "__main__":
expr = " ".join(sys.argv[1:])
visitor = Visitor()
compiler.walk(compiler.parse(expr), visitor)
print "\n".join(visitor.names)

Output:

$ python dotted_names.py "a + b * (c + sin(d.e) + x.y.z)"
a
b
c
sin
d.e
x.y.z

$ python dotted_names.py "a + b * ((c + d).e + x.y.z)"
ignoring Getattr(Add((Name('c'), Name('d'))), 'e')
a
b
x.y.z

Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top