parallel class structures for AST-based objects

Steve Howell · Nov 21, 2009

I have been writing some code that parses a mini-language, and I am
running into what I know is a pretty common design pattern problem,
but I am wondering the most Pythonic way to solve it.

Basically, I have a bunch of really simple classes that work together
to define an expression--in my oversimplified example code below,
those are Integer, Sum, and Product.

Then I want to write different modules that work with those expression
objects. In my example, I have a parallel set of classes that enable
you to evaluate an expression, and another set of objects that enable
you to pretty-print the expression.

The below code works as intended so far (tested under 2.6), but before
I go too much further with this design, I want to get a sanity check
and some ideas on other ways to represent the interrelationships
within the code. Basically, the issue here is that you have varying
behavior in two dimensions--a node right now is only a Product/Integer/
Sum so far, but I might want to introduce new concepts like
Difference, Quotient, etc. And right now the only things that you can
do to expressions is eval() them and pprint() them, but you eventually
might want to operate on the expressions in new ways, including fairly
abstract operations that go beyond a simple walking of the tree.

Here is the code:

#######
# base classes just represents the expression itself, which
# get created by a parser or unit tests
# example usage is
# expression = Product(Sum(Integer(5),Integer(2)), Integer(6))
class Integer:
def __init__(self, val):
self.val = val

class BinaryOp:
def __init__(self, a,b):
self.a = a
self.b = b

class Sum(BinaryOp):
pass

class Product(BinaryOp):
pass

########

class EvalNode:
def __init__(self, node):
self.node = node

def evaluatechild(self, child):
return EvalNode.factory(child).eval()

@staticmethod
def factory(child):
mapper = {
'Sum': SumEvalNode,
'Product': ProductEvalNode,
'Integer': IntegerEvalNode
}
return abstract_factory(child, mapper)

class SumEvalNode(EvalNode):
def eval(self):
a = self.evaluatechild(self.node.a)
b = self.evaluatechild(self.node.b)
return a + b

class ProductEvalNode(EvalNode):
def eval(self):
a = self.evaluatechild(self.node.a)
b = self.evaluatechild(self.node.b)
return a * b

class IntegerEvalNode(EvalNode):
def eval(self): return self.node.val

#######

class PrettyPrintNode:
def __init__(self, node):
self.node = node

def pprint_child(self, child):
return PrettyPrintNode.factory(child).pprint()

@staticmethod
def factory(child):
mapper = {
'Sum': SumPrettyPrintNode,
'Product': ProductPrettyPrintNode,
'Integer': IntegerPrettyPrintNode
}
return abstract_factory(child, mapper)

class SumPrettyPrintNode(PrettyPrintNode):
def pprint(self):
a = self.pprint_child(self.node.a)
b = self.pprint_child(self.node.b)
return '(the sum of %s and %s)' % (a, b)

class ProductPrettyPrintNode(PrettyPrintNode):
def pprint(self):
a = self.pprint_child(self.node.a)
b = self.pprint_child(self.node.b)
return '(the product of %s and %s)' % (a, b)

class IntegerPrettyPrintNode(PrettyPrintNode):
def pprint(self): return self.node.val

##############
# Not sure where this method really "wants to be" structurally,
# or what it should be named, but it reduces some duplication

def abstract_factory(node, node_class_mapper):
return node_class_mapper[node.__class__.__name__](node)

expression = Product(Sum(Integer(5),Integer(2)), Integer(6))

evaluator = EvalNode.factory(expression)
print evaluator.eval()

pprinter = PrettyPrintNode.factory(expression)
print pprinter.pprint()

MRAB · Nov 22, 2009

Steve said:
I have been writing some code that parses a mini-language, and I am
running into what I know is a pretty common design pattern problem,
but I am wondering the most Pythonic way to solve it.

Basically, I have a bunch of really simple classes that work together
to define an expression--in my oversimplified example code below,
those are Integer, Sum, and Product.

Then I want to write different modules that work with those expression
objects. In my example, I have a parallel set of classes that enable
you to evaluate an expression, and another set of objects that enable
you to pretty-print the expression.

The below code works as intended so far (tested under 2.6), but before
I go too much further with this design, I want to get a sanity check
and some ideas on other ways to represent the interrelationships
within the code. Basically, the issue here is that you have varying
behavior in two dimensions--a node right now is only a Product/Integer/
Sum so far, but I might want to introduce new concepts like
Difference, Quotient, etc. And right now the only things that you can
do to expressions is eval() them and pprint() them, but you eventually
might want to operate on the expressions in new ways, including fairly
abstract operations that go beyond a simple walking of the tree.

Here is the code:

#######
# base classes just represents the expression itself, which
# get created by a parser or unit tests
# example usage is
# expression = Product(Sum(Integer(5),Integer(2)), Integer(6))
class Integer:
def __init__(self, val):
self.val = val

class BinaryOp:
def __init__(self, a,b):
self.a = a
self.b = b

class Sum(BinaryOp):
pass

class Product(BinaryOp):
pass

########

class EvalNode:
def __init__(self, node):
self.node = node

def evaluatechild(self, child):
return EvalNode.factory(child).eval()

@staticmethod
def factory(child):
mapper = {
'Sum': SumEvalNode,
'Product': ProductEvalNode,
'Integer': IntegerEvalNode
}
return abstract_factory(child, mapper)

class SumEvalNode(EvalNode):
def eval(self):
a = self.evaluatechild(self.node.a)
b = self.evaluatechild(self.node.b)
return a + b

class ProductEvalNode(EvalNode):
def eval(self):
a = self.evaluatechild(self.node.a)
b = self.evaluatechild(self.node.b)
return a * b

class IntegerEvalNode(EvalNode):
def eval(self): return self.node.val

#######

class PrettyPrintNode:
def __init__(self, node):
self.node = node

def pprint_child(self, child):
return PrettyPrintNode.factory(child).pprint()

@staticmethod
def factory(child):
mapper = {
'Sum': SumPrettyPrintNode,
'Product': ProductPrettyPrintNode,
'Integer': IntegerPrettyPrintNode
}
return abstract_factory(child, mapper)

class SumPrettyPrintNode(PrettyPrintNode):
def pprint(self):
a = self.pprint_child(self.node.a)
b = self.pprint_child(self.node.b)
return '(the sum of %s and %s)' % (a, b)

class ProductPrettyPrintNode(PrettyPrintNode):
def pprint(self):
a = self.pprint_child(self.node.a)
b = self.pprint_child(self.node.b)
return '(the product of %s and %s)' % (a, b)

class IntegerPrettyPrintNode(PrettyPrintNode):
def pprint(self): return self.node.val

##############
# Not sure where this method really "wants to be" structurally,
# or what it should be named, but it reduces some duplication

def abstract_factory(node, node_class_mapper):
return node_class_mapper[node.__class__.__name__](node)

expression = Product(Sum(Integer(5),Integer(2)), Integer(6))

evaluator = EvalNode.factory(expression)
print evaluator.eval()

pprinter = PrettyPrintNode.factory(expression)
print pprinter.pprint()

I don't see the point of EvalNode and PrettyPrintNode. Why don't you
just give Integer, Sum and Product 'eval' and 'pprint' methods?

Steve Howell · Nov 22, 2009

I don't see the point of EvalNode and PrettyPrintNode. Why don't you
just give Integer, Sum and Product 'eval' and 'pprint' methods?

That's a good question, and it's the crux of my design dilemma. If
ALL I ever wanted to to with Integer/Sum/Product was to eval() and
pprint(), then I would just add those methods to Integer, Sum, and
Product, and be done with it, as you suggest. But what happens when
somebody wants to extend capability? Should every future software
developer that wants to use Integer/Sum/Product extend those classes
to get work done? What if they want to store additional state for
nodes?

Richard Thomas · Nov 22, 2009

Steve said:
Steve said:

I have been writing some code that parses a mini-language, and I am
running into what I know is a pretty common design pattern problem,
but I am wondering the most Pythonic way to solve it.

Click to expand...

Basically, I have a bunch of really simple classes that work together
to define an expression--in my oversimplified example code below,
those are Integer, Sum, and Product.

Click to expand...

Then I want to write different modules that work with those expression
objects. In my example, I have a parallel set of classes that enable
you to evaluate an expression, and another set of objects that enable
you to pretty-print the expression.

Click to expand...

The below code works as intended so far (tested under 2.6), but before
I go too much further with this design, I want to get a sanity check
and some ideas on other ways to represent the interrelationships
within the code. Basically, the issue here is that you have varying
behavior in two dimensions--a node right now is only a Product/Integer/
Sum so far, but I might want to introduce new concepts like
Difference, Quotient, etc. And right now the only things that you can
do to expressions is eval() them and pprint() them, but you eventually
might want to operate on the expressions in new ways, including fairly
abstract operations that go beyond a simple walking of the tree.

Click to expand...

Here is the code:

Click to expand...

#######
# base classes just represents the expression itself, which
# get created by a parser or unit tests
# example usage is
# expression = Product(Sum(Integer(5),Integer(2)), Integer(6))
class Integer:
def __init__(self, val):
self.val = val

Click to expand...

class BinaryOp:
def __init__(self, a,b):
self.a = a
self.b = b

Click to expand...

class Sum(BinaryOp):
pass

Click to expand...

class Product(BinaryOp):
pass

Click to expand...

########

Click to expand...

class EvalNode:
def __init__(self, node):
self.node = node

Click to expand...

def evaluatechild(self, child):
return EvalNode.factory(child).eval()

Click to expand...

@staticmethod
def factory(child):
mapper = {
'Sum': SumEvalNode,
'Product': ProductEvalNode,
'Integer': IntegerEvalNode
}
return abstract_factory(child, mapper)

Click to expand...

class SumEvalNode(EvalNode):
def eval(self):
a = self.evaluatechild(self.node.a)
b = self.evaluatechild(self.node.b)
return a + b

Click to expand...

class ProductEvalNode(EvalNode):
def eval(self):
a = self.evaluatechild(self.node.a)
b = self.evaluatechild(self.node.b)
return a * b

Click to expand...

class IntegerEvalNode(EvalNode):
def eval(self): return self.node.val

Click to expand...

#######

Click to expand...

class PrettyPrintNode:
def __init__(self, node):
self.node = node

Click to expand...

def pprint_child(self, child):
return PrettyPrintNode.factory(child).pprint()

Click to expand...

@staticmethod
def factory(child):
mapper = {
'Sum': SumPrettyPrintNode,
'Product': ProductPrettyPrintNode,
'Integer': IntegerPrettyPrintNode
}
return abstract_factory(child, mapper)

Click to expand...

class SumPrettyPrintNode(PrettyPrintNode):
def pprint(self):
a = self.pprint_child(self.node.a)
b = self.pprint_child(self.node.b)
return '(the sum of %s and %s)' % (a, b)

Click to expand...

class ProductPrettyPrintNode(PrettyPrintNode):
def pprint(self):
a = self.pprint_child(self.node.a)
b = self.pprint_child(self.node.b)
return '(the product of %s and %s)' % (a, b)

Click to expand...

class IntegerPrettyPrintNode(PrettyPrintNode):
def pprint(self): return self.node.val

Click to expand...

##############
# Not sure where this method really "wants to be" structurally,
# or what it should be named, but it reduces some duplication

Click to expand...

def abstract_factory(node, node_class_mapper):
return node_class_mapper[node.__class__.__name__](node)

Click to expand...

expression = Product(Sum(Integer(5),Integer(2)), Integer(6))

Click to expand...

evaluator = EvalNode.factory(expression)
print evaluator.eval()

Click to expand...

pprinter = PrettyPrintNode.factory(expression)
print pprinter.pprint()

Click to expand...

I don't see the point of EvalNode and PrettyPrintNode. Why don't you
just give Integer, Sum and Product 'eval' and 'pprint' methods?

This looks more structurally sound:

class Node(object):
def eval(self):
raise NotImplementedError
def pprint(self):
raise NotImplementedError

class BinaryOperatorNode(Node):
operator = None
def __init__(self, first, second):
self.first = first
self.second = second
def eval(self):
return self.operator(self.first.eval(), self.second.eval())
def pprint(self):
"%s(%s, %s)" % (type(self).__name__, self.first.pprint(),
self.second.pprint())

class Sum(BinaryOperatorNode):
operator = lambda x, y: x + y

class Product(BinaryOperatorNode):
operator = lambda x, y: x * y

I don't know what you're doing exactly but if all you need is to be
able to parse and evaluate expressions then you can get very decent
mileage out of overriding operators, to the extent that the whole
thing you are trying to do could be a single class:

class Expression(object):
def __init__(self, func):
self.func = func
def __call__(self, **context):
while isinstance(self, Expression):
self = self.func(context)
return self
def __add__(self, other):
return Expression(lambda context: self.func(context) + other)
def __mul__(self, other):
return Expression(lambda context: self.func(context) * other)
def __radd__(self, other):
return Expression(lambda context: other + self.func(context))
def __rmul__(self, other):
return Expression(lambda context: other * self.func(context))
# ... and so forth ...

def integer(value):
return Expression(lambda context: value)

def variable(name, default):
return Expression(lambda context: context.get(name, default))

X = Expression("X", 0)
expr = 2 * X + 1
assert expr(X=3) == 7

But maybe that's not what you need. No need to overengineer if it is
though, keep it simple, simple is better than complex.

Steve Howell · Nov 22, 2009

This looks more structurally sound:

class Node(object):
def eval(self):
raise NotImplementedError
def pprint(self):
raise NotImplementedError

My objection to the interface you describe is that Node defines the
type of operations that can be done to it by third-party code, which
is something that I cannot predict, and which (pscouples every
subclass of Node to eval() and pprint(), even though those subclasses
might not care about either operation. They might be reimplementing
only one of those operations, or some completely different operation.

class Sum(BinaryOperatorNode):
operator = lambda x, y: x + y

class Product(BinaryOperatorNode):
operator = lambda x, y: x * y

I do like the notion of driving up Sum/Product abstractions like
"operator" into BinaryOperatorNode, but that is sort of an artifact of
my toy example, not my main design dilemma.

I don't know what you're doing exactly but if all you need is to be
able to parse and evaluate expressions then you can get very decent
mileage out of overriding operators, to the extent that the whole
thing you are trying to do could be a single class...

class Expression(object):
def __init__(self, func):
self.func = func
def __call__(self, **context):
while isinstance(self, Expression):
self = self.func(context)
return self
def __add__(self, other):
return Expression(lambda context: self.func(context) + other)
def __mul__(self, other):
[snipped]

It is my fault for muddying the conversation with a toy example that
evaluates arithmetic expressions. My particular problem involves a
mini-language that describes how you want to descend Python data
structures.

But maybe that's not what you need. No need to overengineer if it is
though, keep it simple, simple is better than complex.

Yep! I am trying to keep things simple, but my wish to extend is not
speculative at this point; it is real. I have an expression syntax,
which I call pyDTL, that I want these operations on:

* generate two different versions of Python code that expresses the
operation
* eagerly evaluate the expression on some object
* pretty-print the expression itself
* use the expression as a prototype for stub objects to determine
what operations they allow
* etc....

All of those requirements create the need to somehow create new
objects or functions that correspond to the same AST.

I describe pyDTL here:

http://showellonprogramming.blogspot.com/2009/11/mini-ddl-for-python.html

Here is a simple example:

echo '{.foo, .bar(){.spam, .eggs} }' | python dtl/parse.py
dict(
foo = obj.foo,
bar = (lambda bar:
dict(
spam = bar.spam,
eggs = bar.eggs,
))(obj.bar()),
)

###

Gregory Ewing · Nov 22, 2009

Steve said:
My objection to the interface you describe is that Node defines the
type of operations that can be done to it by third-party code, which
is something that I cannot predict

I think you have the right idea with a mapping from node
classes to implementations of operations, but your
intermediate classes SumPrettyPrintNode etc. are not
necessary. Just put functions implementing the operations
directly into the mapping.

Another thing is that the mappings should be somewhere
global that can be extended, perhaps through a registration
interface. Putting the mapping dicts inside the node classes
doesn't gain you anything over simply using methods. All
you've done is reimplement method dispatch.

A third thing is that instead of just looking up the node
class in the mapping, you may want to walk up the MRO and
try each of the base classes until you find a match. That
way, nodes will effectively inherit implementations from
their base classes for any operations that they don't
explicitly implement.

This inheritance behaviour becomes important if you have
two developers A and B, where A adds new classes and B
adds new operations. If A and B don't talk to each other,
you necessarily end up with gaps in the matrix: A's classes
won't have implementations for B's operations.

The best you can hope for is that the missing operations
will somehow fall back gracefully on default implementations.
Inheritance allows this to be achieved: if A derives all
his classes from existing node classes, and B provides
default implementations of all his operations for the root
Node class, then some implementation will always be found
for every class/operation combination.

Now, there's actually a very easy way of implementing all
this. Your registration interface could simply be

def register_operation(klass, operation_name, function):
setattr(klass, operation_name, function)

In other words, monkey-patch the classes to add the new
functions directly as methods. Since it's so simple, you
could even do away with the registration function altogether
and just have developers insert methods directly into the
classes.

Some people might consider this an ugly hack, but the
end result is almost exactly the same, it's very simple,
and it's very efficient, making use of Python's existing
method dispatch machinery instead of reinventing it
yourself.

One reason you might want to keep the registration function,
even if all it's doing is modifying the classes, is to
make it easier to find where operations are being defined,
by searching for calls to register_operation().

Diez B. Roggisch · Nov 22, 2009

Steve said:
That's a good question, and it's the crux of my design dilemma. If
ALL I ever wanted to to with Integer/Sum/Product was to eval() and
pprint(), then I would just add those methods to Integer, Sum, and
Product, and be done with it, as you suggest. But what happens when
somebody wants to extend capability? Should every future software
developer that wants to use Integer/Sum/Product extend those classes
to get work done? What if they want to store additional state for
nodes?

What's usually done is to create visitors/matchers. Those traverse the
AST, and either only visit, or return transformed versions of it (think
e.g. algebraic optimization)

A visitor will roughly look like this:

class ASTVisitor(object):

def visit(self, node):
name = node.__class__.__name__.lower()
if hasattr(self, "visit_%s" % name):
getattr(self, "visit_%s" % name)(node)
for child in node:
self.visit(child)

You can of course chose another type of dispatch, using e.g. a generic
method.

Then you create Visitors for specific tasks - pretty-printing,
evaluation, rewriting. Those don't have the overhead of your current
design with all those factory-mapping stuff, and yet you aren't forced
to put logic into AST you don't want there.

Diez

Simon Forman · Nov 22, 2009

What's usually done is to create visitors/matchers. Those traverse the AST,
and either only visit, or return transformed versions of it (think e.g.
algebraic optimization)

A visitor will roughly look like this:

class ASTVisitor(object):

def visit(self, node):
name = node.__class__.__name__.lower()
if hasattr(self, "visit_%s" % name):
getattr(self, "visit_%s" % name)(node)
for child in node:
self.visit(child)

You can of course chose another type of dispatch, using e.g. a generic
method.

Then you create Visitors for specific tasks - pretty-printing, evaluation,
rewriting. Those don't have the overhead of your current design with all
those factory-mapping stuff, and yet you aren't forced to put logic into AST
you don't want there.

FWIW I often use John Aycock's SPARK (Scanning, Parsing, and Rewriting
Kit) for this sort of thing. It has a GenericASTTraversal which "is a
Visitor pattern according to Design Patterns."

http://pages.cpsc.ucalgary.ca/~aycock/spark/

It's apparently distributed with the python source, but it's not in
the standard library, more's the pity IMO.

There's a bit of a learning curve but it's well worth it.

~Simon

Steve Howell · Nov 22, 2009

FWIW I often use John Aycock's SPARK (Scanning, Parsing, and Rewriting
Kit) for this sort of thing. It has a GenericASTTraversal which "is a
Visitor pattern according to Design Patterns."

http://pages.cpsc.ucalgary.ca/~aycock/spark/

It's apparently distributed with the python source, but it's not in
the standard library, more's the pity IMO.

There's a bit of a learning curve but it's well worth it.

Thanks, Simon, I think something like GenericASTTraversal should work
for me in most cases.

A couple of folks have suggested an idea that is in his paper, which
is to use a single class for something PrettyPrinter, and then use
reflection to find methods on PrettyPrinter to pretty-print sums,
products, integers, etc.

Modify code with AST	2	Jun 12, 2013
Variable class name in python	0	Apr 25, 2022
ast manipulation	2	Nov 17, 2009
Idea for pure-python templates using AST.	9	Aug 16, 2011
Class decorator to capture the creation and deletion of objects	0	Feb 25, 2014
simple ElementTree based parser that allows entity definition map	0	Dec 4, 2013
Recursive class \| can you modify self directly?	10	Jul 9, 2013
Making special method names, work with __getattr__	3	Apr 23, 2010

parallel class structures for AST-based objects

Steve Howell

MRAB

Steve Howell

Richard Thomas

Steve Howell

Gregory Ewing

Diez B. Roggisch

Simon Forman

Steve Howell

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads