Extracting attributes from compiled python code or parse trees

Discussion in 'Python' started by Matteo, Jul 23, 2007.

  1. Matteo

    Matteo Guest

    Hello-
    I am trying to get Python to extract attributes in full dotted form
    from compiled expression. For instance, if I have the following:

    param = compile('a.x + a.y','','single')

    then I would like to retrieve the list consisting of ['a.x','a.y'].
    I have tried using inspect to look at 'co_names', but when I do that,
    I get:

    >>> inspect.getmembers(param)[23]

    ('co_names', ('a', 'x', 'y'))

    with no way to determine that 'x' and 'y' are both attributes of 'a'.

    The reason I am attempting this is to try and automatically determine
    data dependencies in a user-supplied formula (in order to build a
    dataflow network). I would prefer not to have to write my own parser
    just yet.

    Alternatively, I've looked at the parser module, but I am experiencing
    some difficulties in that the symbol list does not seem to match that
    listed in the python grammar reference (not surprising, since I am
    using python2.5, and the docs seem a bit dated)

    In particular:

    >>> import parser
    >>> import pprint
    >>> import symbol
    >>> tl=parser.expr("a.x").tolist()
    >>> pprint.pprint(tl)


    [258,
    [326,
    [303,
    [304,
    [305,
    [306,
    [307,
    [309,
    [310,
    [311,
    [312,
    [313,
    [314,
    [315,
    [316, [317, [1, 'a']], [321, [23, '.'], [1,
    'x']]]]]]]]]]]]]]]],
    [4, ''],
    [0, '']]

    >>> print symbol.sym_name[316]

    power

    Thus, for some reason, 'a.x' seems to be interpreted as a power
    expression, and not an 'attributeref' as I would have anticipated (in
    fact, the symbol module does not seem to contain an 'attributeref'
    symbol)

    (for the curious, here is the relevant part of the AST for "a**x":
    [316,
    [317, [1, 'a']],
    [36, '**'],
    [315, [316, [317, [1, 'x']]]]
    )

    Anyway, I could write an AST analyzer that searches for the correct
    pattern, but it would be relying on undocumented behavior, and I'm
    hoping there is a better way.

    (By the way, I realize that malicious users could almost certainly
    subvert my proposed dependency mechanism, but for this project, I'm
    guarding against Murphy, not Macchiavelli)

    Thanks,
    -matt
    Matteo, Jul 23, 2007
    #1
    1. Advertising

  2. En Mon, 23 Jul 2007 18:13:05 -0300, Matteo <> escribió:

    > I am trying to get Python to extract attributes in full dotted form
    > from compiled expression. For instance, if I have the following:
    >
    > param = compile('a.x + a.y','','single')
    >
    > then I would like to retrieve the list consisting of ['a.x','a.y'].
    >
    > The reason I am attempting this is to try and automatically determine
    > data dependencies in a user-supplied formula (in order to build a
    > dataflow network). I would prefer not to have to write my own parser
    > just yet.


    If it is an expression, I think you should use "eval" instead of "single"
    as the third argument to compile.

    > Alternatively, I've looked at the parser module, but I am experiencing
    > some difficulties in that the symbol list does not seem to match that
    > listed in the python grammar reference (not surprising, since I am
    > using python2.5, and the docs seem a bit dated)


    Yes, the grammar.txt in the docs is a bit outdated (or perhaps it's a
    simplified one), see the Grammar/Grammar file in the Python source
    distribution.

    > In particular:
    >
    >>>> import parser
    >>>> import pprint
    >>>> import symbol
    >>>> tl=parser.expr("a.x").tolist()
    >>>> pprint.pprint(tl)

    >
    > [258,
    > [326,
    > [303,
    > [304,
    > [305,
    > [306,
    > [307,
    > [309,
    > [310,
    > [311,
    > [312,
    > [313,
    > [314,
    > [315,
    > [316, [317, [1, 'a']], [321, [23, '.'], [1,
    > 'x']]]]]]]]]]]]]]]],
    > [4, ''],
    > [0, '']]
    >
    >>>> print symbol.sym_name[316]

    > power
    >
    > Thus, for some reason, 'a.x' seems to be interpreted as a power
    > expression, and not an 'attributeref' as I would have anticipated (in
    > fact, the symbol module does not seem to contain an 'attributeref'
    > symbol)


    Using this little helper function to translate symbols and tokens:

    names = symbol.sym_name.copy()
    names.update(token.tok_name)
    def human_readable(lst):
    lst[0] = names[lst[0]]
    for item in lst[1:]:
    if isinstance(item,list):
    human_readable(item)

    the same tree becomes:

    ['eval_input',
    ['testlist',
    ['test',
    ['or_test',
    ['and_test',
    ['not_test',
    ['comparison',
    ['expr',
    ['xor_expr',
    ['and_expr',
    ['shift_expr',
    ['arith_expr',
    ['term',
    ['factor',
    ['power',
    ['atom', ['NAME', 'a']],
    ['trailer', ['DOT', '.'], ['NAME', 'x']]]]]]]]]]]]]]]],
    ['NEWLINE', ''],
    ['ENDMARKER', '']]

    which is correct is you look at the symbols in the (right) Grammar file.

    But if you are only interested in things like a.x, maybe it's a lot
    simpler to use the tokenizer module, looking for the NAME and OP tokens as
    they appear in the source expression.


    --
    Gabriel Genellina
    Gabriel Genellina, Jul 23, 2007
    #2
    1. Advertising

  3. Matteo

    Peter Otten Guest

    Matteo wrote:

    > I am trying to get Python to extract attributes in full dotted form
    > from compiled expression. For instance, if I have the following:
    >
    > param = compile('a.x + a.y','','single')
    >
    > then I would like to retrieve the list consisting of ['a.x','a.y'].
    > I have tried using inspect to look at 'co_names', but when I do that,


    You can have a look at the compiler package. A very limited example:

    import compiler
    import compiler.ast
    import sys

    class Visitor:
    def __init__(self):
    self.names = []
    def visitName(self, node):
    self.names.append(node.name)
    def visitGetattr(self, node):
    dotted = []
    n = node
    while isinstance(n, compiler.ast.Getattr):
    dotted.append(n.attrname)
    n = n.expr
    try:
    dotted.append(n.name)
    except AttributeError:
    print >> sys.stderr, "ignoring", node
    else:
    self.names.append(".".join(reversed(dotted)))


    if __name__ == "__main__":
    expr = " ".join(sys.argv[1:])
    visitor = Visitor()
    compiler.walk(compiler.parse(expr), visitor)
    print "\n".join(visitor.names)

    Output:

    $ python dotted_names.py "a + b * (c + sin(d.e) + x.y.z)"
    a
    b
    c
    sin
    d.e
    x.y.z

    $ python dotted_names.py "a + b * ((c + d).e + x.y.z)"
    ignoring Getattr(Add((Name('c'), Name('d'))), 'e')
    a
    b
    x.y.z

    Peter
    Peter Otten, Jul 24, 2007
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Klaus Schneider
    Replies:
    1
    Views:
    539
    Rolf Magnus
    Dec 2, 2004
  2. Manlio Perillo

    Accessing Python parse trees

    Manlio Perillo, Mar 3, 2005, in forum: Python
    Replies:
    5
    Views:
    658
    Manlio Perillo
    Mar 5, 2005
  3. Max
    Replies:
    1
    Views:
    476
    Joe Kesselman
    Sep 22, 2006
  4. lander
    Replies:
    5
    Views:
    586
    bruce barker
    Mar 5, 2008
  5. jacob navia

    Binary search trees (AVL trees)

    jacob navia, Jan 3, 2010, in forum: C Programming
    Replies:
    34
    Views:
    1,411
    Dann Corbit
    Jan 8, 2010
Loading...

Share This Page