Pretty Scheme, ??? Python

Neil Cerutti · Jul 2, 2007

The following simple language parser and interpreter is from
chapter 2 of the textbook: _Programming Languages: Application
and Interpretation_, by Shriram Krishnamurthi.

http://www.cs.brown.edu/~sk/Publications/Books/ProgLangs/

First, I'll show the Scheme version of the program, which uses
the cool define-type and type-case macros (I'd never heard of
them before reading this book, which is too bad). I think it is
beautiful.

; Our grammar:
; <WAE> ::=
; <num>
; | { + <WAE> <WAE> }
; | { - <WAE> <WAE> }
; | {with {<id> <WAE>} <WAE>}
; | <id>
(define-type WAE
[num (n number?)]
[add (lhs WAE?)
(rhs WAE?)]
[sub (lhs WAE?)
(rhs WAE?)]
[id (name symbol?)]
[with (name symbol?) (named-expr WAE?) (named-body WAE?)])

;; parse : sexp --> WAE
;; To convert s-expressions into WAEs
(define (parse sexp)
(cond
[(number? sexp) (num sexp)]
[(symbol? sexp) (id sexp)]
[(list? sexp)
(case (first sexp)
[(+) (if (= (length sexp) 3)
(add (parse (second sexp))
(parse (third sexp)))
(error 'parse "add: expected 2 args"))]
[(-) (if (= (length sexp) 3)
(sub (parse (second sexp))
(parse (third sexp)))
(error 'parse "sub: expected 2 args"))]
[(with) (with (first (second sexp))
(parse (second (second sexp)))
(parse (third sexp)))]
[else (error 'parse "invalid expression")])]
[else (error 'parse "invalid expression")]))

;; subst : WAE symbol WAE --> WAE
;; substitutes second argument with third argument in first argument,
;; as per the rules of substitution; the resulting expression contains
;; no free instances of the second argument.
(define (subst expr sub-id val)
(type-case WAE expr
[num (n) expr]
[add (lhs rhs) (add (subst lhs sub-id val)
(subst rhs sub-id val))]
[sub (lhs rhs) (sub (subst lhs sub-id val)
(subst rhs sub-id val))]
[with (bound-id named-expr bound-body)
(if (symbol=? bound-id sub-id)
(with bound-id
(subst named-expr sub-id val)
bound-body)
(with bound-id
(subst named-expr sub-id val)
(subst bound-body sub-id val)))]
[id (v) (if (symbol=? v sub-id) val expr)]))

;; calc : WAE --> number
;; consumes a WAE and computes the corresponding number
(define (calc an-ae)
(type-case WAE an-ae
[num (n) n]
[add (lhs rhs) (+ (calc lhs) (calc rhs))]
[sub (lhs rhs) (- (calc lhs) (calc rhs))]
[with (bound-id named-expr bound-body)
(calc (subst bound-body
bound-id
(num (calc named-expr))))]
[id (v) (error 'calc "free identifier")]))

(test (calc (parse '3)) 3)
(test (calc (parse '{+ 3 4})) 7)
(test (calc (parse '{+ {- 3 4} 7})) 6)
; etc... the rest of the tests omited

The following is a translation of the above program into Python,
as best as I could. I have not included the simple s-expression
parser that I had to write as a front-end, but I'll post if if
anyone's interested. The type-case is replaced with a class
hierarchy, resulting in some extra syntax. Most of the docstrings
and tests have been omited as irrelevant.

import sexp
import operator

class Wae(object):
""" An abstract base class for representing the abstract syntax tree of a
Wae expression.

"""
def calc(self):
raise NotImplementedError

class Num(Wae):
""" This subclass represents numbers.

"""
def __init__(self, number):
self.number = number
def __repr__(self):
return '<Num %d>' % self.number
def subst_(self, sub_id, value):
return self
def calc(self):
return self.number

class Id(Wae):
""" This subclass represents an identifier.

"""
def __init__(self, name):
self.name = name
def __repr__(self):
return '<Id \'%s\'>' % self.name
def subst_(self, sub_id, value):
if self.name == sub_id:
return value
return self
def calc(self):
raise SyntaxError('Free identifier '+self.name)

class BinOp(Wae):
""" This abstract class represents binary operations.

"""
def __init__(self, lhs, rhs):
self.lhs = lhs
self.rhs = rhs
def __repr__(self):
return '<%s %r %r>' % (self.__class__.__name__, self.lhs, self.rhs)
def subst_(self, sub_id, value):
return self.__class__(self.lhs.subst_(sub_id, value),
self.rhs.subst_(sub_id, value))
def calc(self):
return self.op(self.lhs.calc(), self.rhs.calc())

class Add(BinOp):
""" This subclass represents an addition expression.

"""
def __init__(self, lhs, rhs):
super(Add, self).__init__(lhs, rhs)
self.op = operator.add

class Sub(BinOp):
""" This subclass represents a substraction expression.

"""
def __init__(self, lhs, rhs):
super(Sub, self).__init__(lhs, rhs)
self.op = operator.sub

class With(Wae):
""" This subclass represents the Wae binding expression, 'with'.

"""
def __init__(self, bound_id, named_expr, bound_body):
self.bound_id = bound_id
self.named_expr = named_expr
self.bound_body = bound_body
def __repr__(self):
return '<With \'%s\' %r %r>' % (self.bound_id,
self.named_expr, self.bound_body)
def calc(self):
return self.bound_body.subst_(
self.bound_id, Num(self.named_expr.calc())).calc()
def subst_(self, sub_id, value):
if sub_id == self.bound_id:
return With(self.bound_id,
self.named_expr.subst_(sub_id, value),
self.bound_body)
else:
return With(self.bound_id,
self.named_expr.subst_(sub_id, value),
self.bound_body.subst_(sub_id, value))

# parse : sexp --> Wae

# Wae grammar:
# Wae ::= Num
# Wae ::= ( + <Wae> <Wae> )
# Wae ::= ( - <Wae> <Wae> )
# Wae ::= ( with ( Id <Wae> ) <Wae> )
# Wae ::= Id
def parse(expr):
""" Compile a string representing an WAE expression into an object that
represents the abstract syntax tree of that expression. The AST provides a
calc method that returns the result of evaluating the expression.
Traceback (most recent call last):
...
SyntaxError: Ill-formed expression
Traceback (most recent call last):
...
SyntaxError: Ill-formed expression
3
7
20
Traceback (most recent call last):
...
SyntaxError: Free identifier x

"""
def parse_ast(ast):
if isinstance(ast, int):
return Num(ast)
elif isinstance(ast, str):
return Id(ast)
elif isinstance(ast, tuple):
op = ast[0]
if op in ['+', '-'] and len(ast) == 3:
if op == '+':
return Add(parse_ast(ast[1]), parse_ast(ast[2]))
elif op == '-':
return Sub(parse_ast(ast[1]), parse_ast(ast[2]))
elif op == 'with' and len(ast) == 3 and len(ast[1]) == 2:
return With(ast[1][0], parse_ast(ast[1][1]), parse_ast(ast[2]))
raise SyntaxError("Ill-formed expression")
return parse_ast(sexp.read(expr))

The sexp parser I wrote returns a tuple that represents the parse
tree of an s-expression, and recognizes only s-expressions,
strings and integers.

How can I make the Python more idiomatic Python?

How can I make it more "beautiful"? A type hierarchy seems
over-engineered in comparison to Scheme's type-case, but I
liked a cascade of isinstance calls (as in parse) even less.
The type hierarchy did allow me to factor out the code
duplication in the (sub ...) and (add ...) types of Scheme, and
seems like a nice benefit over type-case.

Laurent Pointal · Jul 2, 2007

Neil Cerutti wrote:
....

How can I make the Python more idiomatic Python?

Have you taken a look at pyparsing ?

http://pyparsing.wikispaces.com/

Neil Cerutti · Jul 2, 2007

Have you taken a look at pyparsing ?

Yes, I have it. PyParsing has, well, so many convenience features
they seem to shout down whatever the core features are, and I
don't know quite how to get started as a result.

Hardest of all was modifying a working PyParsing program.

As a result, I've found writing my own recursive descent parsers
much easier.

I'm probably wrong, though.

Paul McGuire · Jul 2, 2007

Yes, I have it. PyParsing has, well, so many convenience features
they seem to shout down whatever the core features are, and I
don't know quite how to get started as a result.

Hardest of all was modifying a working PyParsing program.

As a result, I've found writing my own recursive descent parsers
much easier.

I'm probably wrong, though.

from pyparsing import *

"""
Neil -

Ok, here is the step-by-step, beginning with your posted BNF. (Based
on your test cases, I think the '{}'s are really supposed to be
'()'s.)

; <WAE> ::=
; <num>
; | { + <WAE> <WAE> }
; | { - <WAE> <WAE> }
; | {with {<id> <WAE>} <WAE>}
; | <id>

The most basic building blocks in pyparsing are Literal and Word.
With these, you compose "compound" elements using And and MatchFirst,
which are bound to the operators '+' and '|' (on occasion, Or is
required, bound to operator '^', but not for this simple parser).
Since you have a recursive grammar, you will also need Forward.
Whitespace is skipped implicitly.

Only slightly more advanced is the Group class, which will impart
hierarchy and structure to the results - otherwise, everything just
comes out as one flat list of tokens. You may be able to remove these
in the final parser, depending on your results after steps 1 and 2 in
the "left for the student" part below, but they are here to help show
structure of the parsed tokens.

As convenience functions go, I think the most common are oneOf and
delimitedList. oneOf might be useful here if you want to express id
as a single-char variable; otherwise, just use Word(alphas).

At this point you should be able to write a parser for this WAE
grammar. Like the following 9-liner:
"""

LPAR = Literal("(").suppress()
RPAR = Literal(")").suppress()

wae = Forward()
num = Word(nums)
id = oneOf( list(alphas) )
addwae = Group( LPAR + "+" + wae + wae + RPAR )
subwae = Group( LPAR + "-" + wae + wae + RPAR )
withwae = Group( LPAR + "with" + LPAR + id + wae + RPAR + wae + RPAR )

wae << (addwae | subwae | withwae | num | id)

tests = """\
3
(+ 3 4)
(with (x (+ 5 5)) (+ x x))""".splitlines()

for t in tests:
print t
waeTree = wae.parseString(t)
print waeTree.asList()
print

"""
If you extract and run this script, here are the results:
3
['3']

(+ 3 4)
[['+', '3', '4']]

(with (x (+ 5 5)) (+ x x))
[['with', 'x', ['+', '5', '5'], ['+', 'x', 'x']]]

Left as an exercise for the student:
1. Define classes NumWAE, IdWAE, AddWAE, SubWAE, and WithWAE whose
__init__ methods take a ParseResults object named tokens (which you
can treat as a list of tokens), and each with a calc() method to
evaluate them accordingly.
2. Hook each class to the appropriate WAE class using setParseAction.
Hint: here is one done for you: num.setParseAction(NumWAE)
3. Modify the test loop to insert an evaluation of the parsed tree.

Extra credit: why is id last in the set of alternatives defined for
the wae expression?

-- Paul
"""

Paul McGuire · Jul 2, 2007

Yes, I have it. PyParsing has, well, so many convenience features
they seem to shout down whatever the core features are, and I
don't know quite how to get started as a result.

Click to expand...

Hardest of all was modifying a working PyParsing program.

Click to expand...

As a result, I've found writing my own recursive descent parsers
much easier.

Click to expand...

I'm probably wrong, though.

Click to expand...

from pyparsing import *

"""
Neil -

Ok, here is the step-by-step, beginning with your posted BNF. (Based
on your test cases, I think the '{}'s are really supposed to be
'()'s.)

; <WAE> ::=
; <num>
; | { + <WAE> <WAE> }
; | { - <WAE> <WAE> }
; | {with {<id> <WAE>} <WAE>}
; | <id>

The most basic building blocks in pyparsing are Literal and Word.
With these, you compose "compound" elements using And and MatchFirst,
which are bound to the operators '+' and '|' (on occasion, Or is
required, bound to operator '^', but not for this simple parser).
Since you have a recursive grammar, you will also need Forward.
Whitespace is skipped implicitly.

Only slightly more advanced is the Group class, which will impart
hierarchy and structure to the results - otherwise, everything just
comes out as one flat list of tokens. You may be able to remove these
in the final parser, depending on your results after steps 1 and 2 in
the "left for the student" part below, but they are here to help show
structure of the parsed tokens.

As convenience functions go, I think the most common are oneOf and
delimitedList. oneOf might be useful here if you want to express id
as a single-char variable; otherwise, just use Word(alphas).

At this point you should be able to write a parser for this WAE
grammar. Like the following 9-liner:
"""

LPAR = Literal("(").suppress()
RPAR = Literal(")").suppress()

wae = Forward()
num = Word(nums)
id = oneOf( list(alphas) )
addwae = Group( LPAR + "+" + wae + wae + RPAR )
subwae = Group( LPAR + "-" + wae + wae + RPAR )
withwae = Group( LPAR + "with" + LPAR + id + wae + RPAR + wae + RPAR )

wae << (addwae | subwae | withwae | num | id)

tests = """\
3
(+ 3 4)
(with (x (+ 5 5)) (+ x x))""".splitlines()

for t in tests:
print t
waeTree = wae.parseString(t)
print waeTree.asList()
print

"""
If you extract and run this script, here are the results:
3
['3']

(+ 3 4)
[['+', '3', '4']]

(with (x (+ 5 5)) (+ x x))
[['with', 'x', ['+', '5', '5'], ['+', 'x', 'x']]]

Left as an exercise for the student:
1. Define classes NumWAE, IdWAE, AddWAE, SubWAE, and WithWAE whose
__init__ methods take a ParseResults object named tokens (which you
can treat as a list of tokens), and each with a calc() method to
evaluate them accordingly.
2. Hook each class to the appropriate WAE class using setParseAction.
Hint: here is one done for you: num.setParseAction(NumWAE)
3. Modify the test loop to insert an evaluation of the parsed tree.

Extra credit: why is id last in the set of alternatives defined for
the wae expression?

-- Paul
"""- Hide quoted text -

- Show quoted text -

Oops, that should be a 10-liner - I forgot the "from pyparsing import
*" line.

-- Paul

Neil Cerutti · Jul 3, 2007

from pyparsing import *

It's always good when your messages start like that.

"""
Ok, here is the step-by-step, beginning with your posted BNF.
(Based on your test cases, I think the '{}'s are really
supposed to be '()'s.)

Yeah, the Scheme version of WAE uses curlies in examples just so
it looks slightly different from Scheme, although since the
Scheme version is built on the Scheme read function, it actually
accepts several different kinds of delimiters.

Python didn't need to do that, I thought.

My first impulse when programming this exercise was to ape the
Scheme strategy, going with a syntax analogous to Python's, using
Python's code or AST modules. But it turns out I'm not a clever
enough language designer.

Moreover, the fun of the book I mentioned is in designing the
semantics of the programs. The book basically punts parsing,
leaving it up to the 'read' function. So basically, Python gets
up to speed (except for the define-type and type-case macros)
simply by implementing a read with enough functionality for each
mini-langauge.

; <WAE> ::=
; <num>
; | { + <WAE> <WAE> }
; | { - <WAE> <WAE> }
; | {with {<id> <WAE>} <WAE>}
; | <id>

The most basic building blocks in pyparsing are Literal and
Word. With these, you compose "compound" elements using And and
MatchFirst, which are bound to the operators '+' and '|' (on
occasion, Or is required, bound to operator '^', but not for
this simple parser). Since you have a recursive grammar, you
will also need Forward. Whitespace is skipped implicitly.

Only slightly more advanced is the Group class, which will
impart hierarchy and structure to the results - otherwise,
everything just comes out as one flat list of tokens. You may
be able to remove these in the final parser, depending on your
results after steps 1 and 2 in the "left for the student" part
below, but they are here to help show structure of the parsed
tokens.

As convenience functions go, I think the most common are oneOf
and delimitedList. oneOf might be useful here if you want to
express id as a single-char variable; otherwise, just use
Word(alphas).

At this point you should be able to write a parser for this WAE
grammar. Like the following 9-liner:
"""

LPAR = Literal("(").suppress()
RPAR = Literal(")").suppress()

wae = Forward()
num = Word(nums)
id = oneOf( list(alphas) )

The above shadows 'id'; I suppose 'ident' would be better.

addwae = Group( LPAR + "+" + wae + wae + RPAR )
subwae = Group( LPAR + "-" + wae + wae + RPAR )
withwae = Group( LPAR + "with" + LPAR + id + wae + RPAR + wae + RPAR )

wae << (addwae | subwae | withwae | num | id)

tests = """\
3
(+ 3 4)
(with (x (+ 5 5)) (+ x x))""".splitlines()

for t in tests:
print t
waeTree = wae.parseString(t)
print waeTree.asList()
print

"""
If you extract and run this script, here are the results:
3
['3']

(+ 3 4)
[['+', '3', '4']]

(with (x (+ 5 5)) (+ x x))
[['with', 'x', ['+', '5', '5'], ['+', 'x', 'x']]]

How can I make it barf for testcases like '(+ 2 3))'? It doesn't
seem to expect an Eof.

Left as an exercise for the student:
1. Define classes NumWAE, IdWAE, AddWAE, SubWAE, and WithWAE whose
__init__ methods take a ParseResults object named tokens (which you
can treat as a list of tokens), and each with a calc() method to
evaluate them accordingly.
2. Hook each class to the appropriate WAE class using setParseAction.
Hint: here is one done for you: num.setParseAction(NumWAE)
3. Modify the test loop to insert an evaluation of the parsed tree.

I use doctest, so it looks quite different. On the other hand, it
actually checks that the results are correct.

Extra credit: why is id last in the set of alternatives defined
for the wae expression?

I'm not sure. When I moved it to first all my valid tests still
passed, but one of the deliberately erroneous ones caused
a different exception, e.g., '(+ 2)'. Writing my own parser does
make error handling more predictable, but probably PyParsing can
be configured to do what I want.

My AST's from the first version of the program translated easily
into your program, with almost no muss or fuss. The muss is that,
since all the __init__ functions now expect a token list instead
of named arguments, they are now cryptic, and creating AST's
manually became annoying. The fuss is that I do have to create
one in With's calc function. It should be unnecessary for the AST
objects to be so dependent upon the grammar to work correctly.
I suppose a solution would be to add another layer of abstraction
in between.

Read it and weep. The program hasn't changed much. It's still
uglier than the Scheme. Attached at the end is my original sexp
and WAE parser, for comparison with the PyParsing portion of the
program. This time I've included all the tests, but no other
useful docstrings.

""" A PyParsing solution for WAE, composed by Paul MaGuire, with
semantics by Neil.
3
Traceback (most recent call last):
...
ParseException: Expected "(" (at char 4), (line:1, col:5)
48
1
8
20
7
Traceback (most recent call last):
...
SyntaxError: Free identifier

"""
import operator
from pyparsing import *

class Num(object):
def __init__(self, tokens):
self.number = int(tokens[0])
def __repr__(self):
return '<Num %d>' % self.number
def subst(self, sub_id, value):
return self
def calc(self):
return self.number

class Id(object):
def __init__(self, tokens):
self.name = tokens[0]
def __repr__(self):
return '<Id %s>' % self.name
def __eq__(self, an_id):
return an_id.name == self.name
def subst(self, sub_id, value):
if sub_id == self:
return value
else:
return self
def calc(self):
raise SyntaxError('Free identifier')

class BinOp(object):
def __init__(self, tokens):
self.op, self.name = {'+': (operator.add, 'Add'),
'-': (operator.sub, 'Sub')}[tokens[0][0]]
self.lhs = tokens[0][1]
self.rhs = tokens[0][2]
def __repr__(self):
return '<%s %r %r>' % (self.name, self.lhs, self.rhs)
def subst(self, sub_id, value):
self.lhs = self.lhs.subst(sub_id, value)
self.rhs = self.rhs.subst(sub_id, value)
return self
def calc(self):
return self.op(self.lhs.calc(), self.rhs.calc())

class With(object):
def __init__(self, tokens):
self.bound_id = tokens[0][1]
self.named_expr = tokens[0][2]
self.bound_body = tokens[0][3]
def __repr__(self):
return '<With %s %r %r>' % (self.bound_id,
self.named_expr, self.bound_body)
def subst(self, sub_id, value):
self.named_expr = self.named_expr.subst(sub_id, value)
if sub_id != self.bound_id:
self.bound_body = self.bound_body.subst(sub_id, value)
return self
def calc(self):
tokens = [self.named_expr.calc()]
return self.bound_body.subst(self.bound_id, Num(tokens)).calc()

LPAR = Literal("(").suppress()
RPAR = Literal(")").suppress()

wae = Forward()
num = Word(nums).setParseAction(Num)
ident = Word(alphas).setParseAction(Id)

addwae = Group( LPAR + "+" + wae + wae + RPAR ).setParseAction(BinOp)
subwae = Group( LPAR + "-" + wae + wae + RPAR ).setParseAction(BinOp)
withwae = Group( LPAR + "with" + LPAR + ident +
wae + RPAR + wae + RPAR ).setParseAction(With)

wae << (addwae | subwae | withwae | num | ident)

def parse(s):
return wae.parseString(s).asList()[0]

if __name__ == '__main__':
import doctest
doctest.testmod()

Below is my simple sexp parser and wae adapter (most tests and
some docs removed for brevity). The hand-written scanner is a
result of the horrible performance and readability of the
regex-based scanners I experimented with.

""" sexp.py

A compiler of s-expressions into tuples.

Functions provided:

read(sexp) Convert a Python str that represents one s-expression into a
tuple.

For example,
('+', 4, 5)
('+', 4, 5, ('-', 8))

Only s-expressions, symbols, and integers are recognized.

Only one expression may appear in the string. Trailing
characters result in an exception.
Traceback (most recent call last):
...
ParseError: Expected EOF, got (

EBNF:
goal -> [ <expr> ] EOF
sexp -> '(' { <expr> } ')'
expr -> <sexp> | INTEGER |SYMBOL

"""

def read(sexp):
""" Read an s-expression (in the form of a string), and return a list
representing its parse tree.
400

This parser readly only one expression at a time, unlike a
scheme reader.
Traceback (most recent call last):
...
ParseError: Expected EOF, got SYMBOL
()
Traceback (most recent call last):
...
ParseError: Expected ), got EOF
('+', 4, 5)

"""
return Parser(scan(sexp)).parse()

class ParseError(SyntaxError):
pass

class Parser(object):
""" Parser for simple s-expressions.

BNF:
goal ::= <opt_expr> EOF
opt_expr ::= <expr>
opt_expr ::= ~
expr ::= <sexp>
expr ::= INTEGER
expr ::= SYMBOL
sexp ::= '(' <opt_expr_list> ')'
opt_expr_list ::= <expr_list>
opt_expr_list ::= ~
expr_list ::= <expr> <expr_tail>
expr_tail ::= <expr> <expr_tail>
expr_tail ::= ~

"""
def __init__(self, iterable):
iterator = iter(iterable)
self.token, self.value = iterator.next()
self.next = iterator.next

def match(self, want):
if want != self.token:
raise ParseError("Expected %s, got %s"
% (want, self.token))
value = self.value
if self.token != 'EOF':
self.token, self.value = self.next()
return value

def parse(self):
return self.goal()

def goal(self):
ast = self.opt_expr()
self.match('EOF')
return ast

def opt_expr(self):
if self.token == 'EOF':
return None
else:
return self.expr()

def opt_expr_list(self):
if self.token in ['(', 'INTEGER', 'SYMBOL']:
return self.expr_list()
else:
return ()

def expr_list(self):
return (self.expr(),) + self.expr_tail()

def expr_tail(self):
if self.token in ['(', 'INTEGER', 'SYMBOL']:
return (self.expr(),) + self.expr_tail()
else:
return ()

def expr(self):
if self.token == '(':
return self.sexp()
elif self.token == 'INTEGER':
return self.match('INTEGER')
elif self.token == 'SYMBOL':
return self.match('SYMBOL')

def sexp(self):
self.match('(')
ast = self.opt_expr_list()
self.match(')')
return ast

# EOF is the end of the input stream.
# INTEGER is '[0-9]+'
# SYMBOL is '[^0-9][^ ]*'

def scan(sexp):
""" Yield the tokens in sexp, in the order they are encountered.
... print token
('(', '(')
(')', ')')
('EOF', 'EOF')
... print token
('(', '(')
('SYMBOL', '+')
('INTEGER', 5)
('INTEGER', 3)
(')', ')')
('EOF', 'EOF')
... print token
Traceback (most recent call last):
...
ParseError: Ill-formed INTEGER

"""
ix = 0
length = len(sexp)
while ix < length:
if sexp[ix] in '()':
yield (sexp[ix], sexp[ix])
ix += 1
elif sexp[ix] in '0123456789':
jx = ix+1
while jx < length and sexp[jx] in '0123456789':
jx += 1
if jx < length and sexp[jx] not in ' \t\n()':
raise ParseError("Ill-formed INTEGER")
yield ('INTEGER', int(sexp[ix:jx]))
ix = jx
elif sexp[ix] not in ' \t\n':
jx = ix+1
while jx < length and sexp[jx] not in ' \t\n)(':
jx += 1
yield ('SYMBOL', sexp[ix:jx])
ix = jx
else:
ix += 1
yield ('EOF', 'EOF')

if __name__ == '__main__':
import doctest
doctest.testmod()

And finally, the WAE adapter, which is not necessarily required
by the PyParsing solution.

def parse(expr):
""" Compile a string representing an WAE expression into an object that
represents the abstract syntax tree of that expression. The AST provides a
calc method that returns the result of evaluating the expression.
<Add <Num 4> <Num 5>>

"""
def parse_ast(ast):
if isinstance(ast, int):
return Num(ast)
elif isinstance(ast, str):
return Id(ast)
elif isinstance(ast, tuple):
op = ast[0]
if op in ['+', '-'] and len(ast) == 3:
if op == '+':
return Add(parse_ast(ast[1]), parse_ast(ast[2]))
elif op == '-':
return Sub(parse_ast(ast[1]), parse_ast(ast[2]))
elif op == 'with' and len(ast) == 3 and len(ast[1]) == 2:
return With(ast[1][0], parse_ast(ast[1][1]), parse_ast(ast[2]))
raise SyntaxError("Ill-formed expression")
return parse_ast(sexp.read(expr))

The isinstance calls are quite unfortunate compared to type-case.
For example, type-case will signal an error if I haven't handled
all possible cases.

Paul McGuire · Jul 3, 2007

Neil -
Doh! I found the id() shadowing later, changed my var to id_ so as
not to stray from your BNF too much.
To force parsing to the end of string, add a StringEnd instance
where you expect there to be the end of the input string. Change:
waeTree = wae.parseString(t)
to:
waeTree = (wae + StringEnd()).parseString(t)
I agree 1000%. The pyparsing method for this is to use
setResultsName. Here is the grammar, with results names defined
to match those in your original. And you are absolutely correct,
using named fields like this makes your code MUCH more robust, and
less dependent on the grammar.

num = Combine( Optional("-") + Word(nums) ).setResultsName("n")
id_ = oneOf( list(alphas) ).setResultsName("v")
addwae = Group( LPAR + "+" + wae.setResultsName("lhs") +
wae.setResultsName("rhs") + RPAR )
subwae = Group( LPAR + "-" + wae.setResultsName("lhs") +
wae.setResultsName("rhs") + RPAR )
withwae = Group( LPAR + "with" + LPAR +
id_.setResultsName("bound_id") +
wae.setResultsName("named_expr") + RPAR +
wae.setResultsName("bound_body") + RPAR )

Now your calc methods can refer to them using:
self.tokens.lhs
self.tokens.bound_id
etc.

Here is my alternative solution (not using results names). I used
the base WAE class to accept the tokens as the initialization var,
then unpack the list into variables in each respective calc()
method. I did not see the need for a subst() method. There is a
complete s-exp parser at the pyparsing wiki examples page:
http://pyparsing.wikispaces.com/space/showimage/sexpParser.py

-- Paul

class WAE(object):
ids = {}
def __init__(self,tokens):
# need to deref element 0 because of Groups
self.tokens = tokens[0]

class NumWAE(WAE):
def calc(self):
return int(self.tokens)

class IdWAE(WAE):
def getId(self):
return self.tokens
def calc(self):
return WAE.ids[self.getId()][-1]

class BinOpWAE(WAE):
def calc(self):
op,a,b = self.tokens
return self.opfn(a.calc(), b.calc())

class AddWAE(BinOpWAE):
opfn = staticmethod(lambda a,b:a+b)

class SubWAE(BinOpWAE):
opfn = staticmethod(lambda a,b:a-b)

class WithWAE(WAE):
def calc(self):
op,varid,varexpr,withexpr = self.tokens
varname = varid.getId()
if varname not in WAE.ids:
WAE.ids[varname] = []
WAE.ids[varname].append( varexpr.calc() )
ret = withexpr.calc()
WAE.ids[varname].pop()
return ret

for expr,cls in zip((num,id_,addwae,subwae,withwae),
(NumWAE,IdWAE,AddWAE,SubWAE,WithWAE)):
expr.setParseAction(cls)

for t in tests:
print t
waeTree = wae.parseString(t)[0]
print waeTree.calc()
print

Neil Cerutti · Jul 3, 2007

Here is my alternative solution (not using results names). I used
the base WAE class to accept the tokens as the initialization var,
then unpack the list into variables in each respective calc()
method. I did not see the need for a subst() method.

I used recursion to effect substitution of id's with their
values, whereas you chose to use a dictionary of stacks. I prefer
the functional solution for being simpler, and state-less.

Thanks for the nice example. I think I've seen enough to create
something satisfying.

There seems to be a bug in my current grammar. The factory for
BinOp is not getting the correct named results. Can you see
something I've messed up?

LPAR = Literal("(").suppress()
RPAR = Literal(")").suppress()

wae = Forward()
num = Word(nums).setResultsName('number')
id_ = Word(alphas).setResultsName('name')

binop = Group( LPAR + oneOf("+ -").setResultsName('op') +
wae.setResultsName('lhs') + wae.setResultsName('rhs') + RPAR )
with_ = Group( LPAR + "with" + LPAR + id_.setResultsName('bound_id') +
wae.setResultsName('named_expr') + RPAR +
wae.setResultsName('bound_body') + RPAR )

wae << (binop | with_ | num | id_)

num.setParseAction(lambda t: Num(int(t.number)))
id_.setParseAction(lambda t: Id(t.name))
binop.setParseAction(lambda t: BinOp(t.op, t.lhs, t.rhs))
with_.setParseAction(lambda t: With(t.bound_id, t.named_expr, t.bound_body))

def parse(s):
return (wae + StringEnd()).parseString(s).asList()[0]

For test case:

'(+ 3 45)'

I get:

C:\WINNT\system32\cmd.exe /c python wae.py
**********************************************************************
File "wae.py", line 6, in __main__
Failed example:
parse('(+ 3 45)')
Exception raised:
Traceback (most recent call last):
File "c:\edconn32\python25\lib\doctest.py", line 1212, in __run
compileflags, 1) in test.globs
File "<doctest __main__[1]>", line 1, in <module>
parse('(+ 3 45)')
File "wae.py", line 122, in parse
return (wae + StringEnd()).parseString(s).asList()[0]
File "C:\edconn32\Python25\Lib\site-packages\pyparsing.py", line 906, in p
arseString
loc, tokens = self._parse( instring.expandtabs(), 0 )
File "C:\edconn32\Python25\Lib\site-packages\pyparsing.py", line 784, in _
parseNoCache
loc,tokens = self.parseImpl( instring, preloc, doActions )
File "C:\edconn32\Python25\Lib\site-packages\pyparsing.py", line 1961, in
parseImpl
loc, resultlist = self.exprs[0]._parse( instring, loc, doActions, callPr
eParse=False )
File "C:\edconn32\Python25\Lib\site-packages\pyparsing.py", line 784, in _
parseNoCache
loc,tokens = self.parseImpl( instring, preloc, doActions )
File "C:\edconn32\Python25\Lib\site-packages\pyparsing.py", line 2204, in
parseImpl
return self.expr._parse( instring, loc, doActions, callPreParse=False )
File "C:\edconn32\Python25\Lib\site-packages\pyparsing.py", line 784, in _
parseNoCache
loc,tokens = self.parseImpl( instring, preloc, doActions )
File "C:\edconn32\Python25\Lib\site-packages\pyparsing.py", line 2070, in
parseImpl
ret = e._parse( instring, loc, doActions )
File "C:\edconn32\Python25\Lib\site-packages\pyparsing.py", line 810, in _
parseNoCache
tokens = fn( instring, tokensStart, retTokens )
File "C:\edconn32\Python25\Lib\site-packages\pyparsing.py", line 658, in t
mp
return f(t)
File "wae.py", line 118, in <lambda>
binop.setParseAction(lambda t: BinOp(t.op, t.lhs, t.rhs))
File "wae.py", line 73, in __init__
'-': (operator.sub, 'Sub')}[op]
KeyError: ''
**********************************************************************

op ought to be '+' or '-'. In fact, testing showed than none of
the result names for binop are being set correctly.

Paul McGuire · Jul 3, 2007

C:\WINNT\system32\cmd.exe /c python wae.py
**********************************************************************
File "wae.py", line 6, in __main__
Failed example:
parse('(+ 3 45)')
Exception raised:
Traceback (most recent call last):
File "c:\edconn32\python25\lib\doctest.py", line 1212, in __run
compileflags, 1) in test.globs
File "<doctest __main__[1]>", line 1, in <module>
parse('(+ 3 45)')
File "wae.py", line 122, in parse
return (wae + StringEnd()).parseString(s).asList()[0]
File "C:\edconn32\Python25\Lib\site-packages\pyparsing.py", line 906, in p
arseString
loc, tokens = self._parse( instring.expandtabs(), 0 )
File "C:\edconn32\Python25\Lib\site-packages\pyparsing.py", line 784, in _
parseNoCache
loc,tokens = self.parseImpl( instring, preloc, doActions )
File "C:\edconn32\Python25\Lib\site-packages\pyparsing.py", line 1961, in
parseImpl
loc, resultlist = self.exprs[0]._parse( instring, loc, doActions, callPr
eParse=False )
File "C:\edconn32\Python25\Lib\site-packages\pyparsing.py", line 784, in _
parseNoCache
loc,tokens = self.parseImpl( instring, preloc, doActions )
File "C:\edconn32\Python25\Lib\site-packages\pyparsing.py", line 2204, in
parseImpl
return self.expr._parse( instring, loc, doActions, callPreParse=False )
File "C:\edconn32\Python25\Lib\site-packages\pyparsing.py", line 784, in _
parseNoCache
loc,tokens = self.parseImpl( instring, preloc, doActions )
File "C:\edconn32\Python25\Lib\site-packages\pyparsing.py", line 2070, in
parseImpl
ret = e._parse( instring, loc, doActions )
File "C:\edconn32\Python25\Lib\site-packages\pyparsing.py", line 810, in _
parseNoCache
tokens = fn( instring, tokensStart, retTokens )
File "C:\edconn32\Python25\Lib\site-packages\pyparsing.py", line 658, in t
mp
return f(t)
File "wae.py", line 118, in <lambda>
binop.setParseAction(lambda t: BinOp(t.op, t.lhs, t.rhs))
File "wae.py", line 73, in __init__
'-': (operator.sub, 'Sub')}[op]
KeyError: ''
**********************************************************************

op ought to be '+' or '-'. In fact, testing showed than none of
the result names for binop are being set correctly.

I think the problem is with your Groups, that they are wrapping the
ParseResults into a single-element list. Try either of the following
(but not both!):
1. Remove Group from the definitions of binop and with_.

binop = ( LPAR + oneOf("+ -").setResultsName('op') +
wae.setResultsName('lhs') + wae.setResultsName('rhs') +
RPAR )
with_ = ( LPAR + "with" + LPAR + id_.setResultsName('bound_id') +
wae.setResultsName('named_expr') + RPAR +
wae.setResultsName('bound_body') + RPAR )

2. Change the parse actions to deref the 0'th element of t for binop
and with_.

num.setParseAction(lambda t: Num(int(t.number)))
id_.setParseAction(lambda t: Id(t.name))
binop.setParseAction(lambda t: BinOp(t[0].op, t[0].lhs, t[0].rhs))
with_.setParseAction(lambda t: With(t[0].bound_id, t[0].named_expr,
t[0].bound_body))

As a troubleshooting measure, you can also create an explicit method,
and then decorate it with pyparsing's built-in @traceParseAction
decorator.

@traceParseAction
def test(t):
return BinOp(t.op,t.lhs,t.rhs)

This should print out information on the arguments being passed to
test, and the results being returned.

-- Paul

Paul McGuire · Jul 3, 2007

By the way, the next release of pyparsing will allow you to shortcut
all of these .setResultsName calls, using this notation instead:

binop = ( LPAR +
oneOf("+ -")('op') +
wae('lhs') +
wae('rhs') + RPAR )
with_ = ( LPAR + "with" + LPAR +
id_('bound_id') +
wae('named_expr') + RPAR +
wae('bound_body') + RPAR )

This is implemented in the latest version in the SF SVN repository.

-- Paul

Neil Cerutti · Jul 3, 2007

2. Change the parse actions to deref the 0'th element of t for
binop and with_.

num.setParseAction(lambda t: Num(int(t.number)))
id_.setParseAction(lambda t: Id(t.name))
binop.setParseAction(lambda t: BinOp(t[0].op, t[0].lhs, t[0].rhs))
with_.setParseAction(lambda t: With(t[0].bound_id, t[0].named_expr,
t[0].bound_body))

Thanks, that solves it.

Now to get on with this exercise and find out which version turns
out to be easier to modify and extend as the book progresses
through all the different programming language features it
covers. Scheme is bound to have an unfair advantage, though, as
the book was composed for Scheme.

I plan to see if I can create a class factory that behaves like
the define-type macro, though. That should prove rather useful.
The most tedious thing about the Python version is the cruft
required by the class hierarchy.

Recursive class \| can you modify self directly?	10	Jul 9, 2013
Python point location of intersect between two lines	0	Feb 28, 2018
Mapping of node numbers	0	May 6, 2014
dictionary of operators	2	Feb 14, 2008
python multiprocessing proxy	0	Oct 5, 2009
idea for testing tools	5	Feb 7, 2007
EAFP gone wrong	4	Feb 9, 2010
ast manipulation	2	Nov 17, 2009

Pretty Scheme, ??? Python

Neil Cerutti

Laurent Pointal

Neil Cerutti

Paul McGuire

Paul McGuire

Neil Cerutti

Paul McGuire

Neil Cerutti

Paul McGuire

Paul McGuire

Neil Cerutti

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads