derivation: sick Python trick of the week

R

Robert Brewer

WARNING:

The following is NOT all Pythonic. However, it is Python. This is just
fun--nothing I'm going to put into production. I don't have any
questions or problems--just thought I'd share.


BACKGROUND:

I've been playing around for a couple of days with parsing and
derivation ("unparsing"). My goal was and is to take snippets of Python
code and turn them into "equivalent" SQL statements. For example, the
Python snippet:

x.Group > 3 or x.Name == 'fumanchu'

....should equate to an SQL WHERE clause like:

Group > 3 or Name = 'fumanchu'

I tore into the compiler package, and hacked together a solution which
does just that. One invokes such a task with code like:

sqlstmt = derive.ADODeriver().evaluate("x.Group > 3 or x.Name ==
'fumanchu'")

However, it's parsing the output of a compiler.Transformer, which itself
is parsing an AST (lots of nested tuples), so it's not very quick. 1000
repetitions of the above example take over 1/2 second to run on my
laptop.


IT'S ALIVE!!!

In a fit of mad-scientist genius (get out the pitchforks and torches), I
wondered if it might be faster to let Python do *all* the parsing, and
just watch it work and take notes. That's what the code below does. The
sick and twisted part is how one invokes this technique:
['a', 3, <built-in function eq>, 'b', 1, <built-in function gt>, 'b',
[/QUOTE]
'((a == 3) and ((b > 1) or (b < -10)))'

Yes, in line two we run comparisons and boolean operations on
non-existent attributes of x, and discard the results! Sick. However, we
were keeping track (as the repr of x shows on line 3); we built a
postfix list of the comparisons we performed. When we call
Deriver().derive(x), the Deriver traverses that list and rebuilds the
Python code. I made a similar ADODeriver which builds SQL code (too
complex to include here; it also used a different Adapter for the
object-to-DB mapper).

I had to replace boolean 'and' and 'or' with binary calls in order to
override them, and 'not' with 'neg'. That makes it even sicker. And it's
*far* from obvious that 'x.a' should reduce to just 'a'.

But it's about 7 times as fast as the first solution. ;)


So for a host of reasons, don't ever use this. I won't. But it was
interesting.


Robert Brewer
MIS
(e-mail address removed)


---- sick_derive.py ----

import operator


def containedby(a, b):
"""Return the outcome of the test a in b. Note the operand order."""
return a in b

def startswith(a, b):
"""Return True if string starts with the prefix, otherwise return
False."""
return a.startswith(b)

def endswith(a, b):
"""Return True if the string ends with the suffix, otherwise return
False."""
return a.endswith(b)


class Operand(object):
"""Push atoms onto a postfix expression stack."""
def __init__(self, expr, name=''):
self.expr = expr
self.name = name

def __neg__(self):
self.expr.append(operator.not_)
return Operand(self.expr)

def __lt__(self, other):
self.expr.extend([self.name, other, operator.lt])
return Operand(self.expr)

def __le__(self, other):
self.expr.extend([self.name, other, operator.le])
return Operand(self.expr)

def __eq__(self, other):
if self.name:
self.expr.extend([self.name, other, operator.eq])
else:
self.expr.extend([other, operator.eq])
return Operand(self.expr)

def __ne__(self, other):
self.expr.extend([self.name, other, operator.ne])
return Operand(self.expr)

def __ge__(self, other):
self.expr.extend([self.name, other, operator.ge])
return Operand(self.expr)

def __gt__(self, other):
self.expr.extend([self.name, other, operator.gt])
return Operand(self.expr)

def __and__(self, other):
self.expr.append(operator.and_)
return Operand(self.expr)

def __or__(self, other):
self.expr.append(operator.or_)
return Operand(self.expr)

def __contains__(self, other):
self.expr.extend([self.name, other, operator.contains])
return Operand(self.expr)

def contains(self, other):
self.expr.extend([self.name, other, operator.contains])
return Operand(self.expr)

def containedby(self, other):
self.expr.extend([self.name, other, containedby])
return Operand(self.expr)

def startswith(self, other):
self.expr.extend([self.name, other, startswith])
return Operand(self.expr)

def endswith(self, other):
self.expr.extend([self.name, other, endswith])
return Operand(self.expr)


class Expression(list):
"""A postfix recorder and evaluator for operations on arbitrary
objects."""

unaries = (operator.not_, )
binaries = (operator.and_,
operator.or_,
operator.lt,
operator.le,
operator.eq,
operator.ne,
operator.gt,
operator.ge,
operator.contains,
containedby,
startswith,
endswith,
)

def __getattr__(self, name):
return Operand(self, name)

def evaluate(self, obj, ifEmpty=True):
"""Evaluate an object against self.
Names will be looked up in the object's attribute dictionary.
"""
stack = self[:]
if not stack:
return ifEmpty

def operate():
op = stack.pop()
if op in self.unaries:
a = operate()
return op(a)
elif op in self.binaries:
b = operate()
a = operate()
if a not in (True, False):
a = getattr(obj, a)
return op(a, b)
else:
return op
return operate()


class Adapter(object):
"""Transform values according to their type."""

def __init__(self):
self.default_processor = repr
self.processors = {}

def process(self, value):
try:
xform = self.processors[type(value)]
except KeyError:
xform = self.default_processor
return xform(value)


class Deriver(object):
"""Derive an Expression according to a grammar."""

def __init__(self, adapter=Adapter()):
f = adapter.process
self.unaries = {operator.not_: lambda x: "(not %s)" % x}
self.binaries = {operator.and_: lambda x, y: "(%s and %s)" % (x,
y),
operator.or_: lambda x, y: "(%s or %s)" % (x,
y),
operator.lt: lambda x, y: "(%s < %s)" % (x,
f(y)),
operator.le: lambda x, y: "(%s <= %s)" % (x,
f(y)),
operator.eq: lambda x, y: "(%s == %s)" % (x,
f(y)),
operator.ne: lambda x, y: "(%s != %s)" % (x,
f(y)),
operator.gt: lambda x, y: "(%s > %s)" % (x,
f(y)),
operator.ge: lambda x, y: "(%s >= %s)" % (x,
f(y)),
operator.contains: lambda x, y: "(%s in %s)" %
(f(y), x),
containedby: lambda x, y: "(%s in %s)" % (x,
f(y)),
startswith: lambda x, y: "(%s.startswith(%s))"
% (x, f(y)),
endswith: lambda x, y: "(%s.endswith(%s))" %
(x, f(y)),
}
self.ifEmpty = ''

def derive(self, expr):
try:
stack = expr[:]
except TypeError, x:
x.args += (type(expr), )
raise x

if not stack:
return self.ifEmpty

def operate():
op = stack.pop()
if op in self.unaries:
a = operate()
return self.unaries[op](a)
elif op in self.binaries:
b = operate()
a = operate()
return self.binaries[op](a, b)
else:
return op
return operate()
 
G

Greg Ewing (using news.cis.dfn.de)

Robert said:
In a fit of mad-scientist genius (get out the pitchforks and torches), I
wondered if it might be faster to let Python do *all* the parsing, and
just watch it work and take notes. That's what the code below does. The
sick and twisted part is how one invokes this technique:

It's not really all that sick, or at least it's not a new
idea. I'm sure there's at least one DB module out there
somewhere that uses this technique.
Yes, in line two we run comparisons and boolean operations on
non-existent attributes of x, and discard the results! Sick.

That part is perhaps a bit too twisted. It's not really
necessary, you could just as well design it so you say

expr = (x.a == 3) & ((x.b > 1) | (x.b < -10))
> And it's
> *far* from obvious that 'x.a' should reduce to just 'a'.

In the case of database queries, you can make this seem
much more natural by having 'x' represent some meaningful
object such as a table.

db = open_database("favourite_fruits")
fruits = db.fruits
query = (fruits.kind == "apple") && (fruits.tastiness >= 3)
I had to replace boolean 'and' and 'or' with binary calls in order to
override them, and 'not' with 'neg'. That makes it even sicker.

Yes, that's the main nuisance with this technique. I have
a PEP idea floating around to make and/or/not overridable,
for this very purpose. Hmmm... does that make me sick?-)
 
M

Michael Hudson

Greg Ewing (using news.cis.dfn.de) said:
Yes, that's the main nuisance with this technique. I have
a PEP idea floating around to make and/or/not overridable,
for this very purpose. Hmmm... does that make me sick?-)

I'd say so.

Cheers,
mwh
 
H

has

Greg Ewing (using news.cis.dfn.de) said:
It's not really all that sick, or at least it's not a new
idea.

And a lot less sick that hacking the language compiler in my
opinion/experience... (I came to Python from a certain language whose
use of clever-clever compiler hackery makes it ludicrously tetchy
about how you structure your code. ("No, no! You must write it THIS
way!") Such contextual confusion and pain in the brain that you
wouldn't believe, the moment you try writing anything but the most
trivial code in the most trivial fashion.)

I'm sure there's at least one DB module out there
somewhere that uses this technique.

Can't speak for DB modules, but must confess to using a close
variation of this technique in my current project - a
MacPython-to-Apple Event Manager bridge - where it's used to assemble
AEM references and test expressions.

Main difference in my design to Robert's is that call capture objects
don't share mutable state. This means users can safely define new
references/test expressions by extending existing ones if they like,
rather than having to create a new one from scratch each time; the
original object won't be changed by this. eg:

import sick_derive

x = sick_derive.Expression()

e1 = ((x.b > 1) | (x.b < -10))
e2 = (e1 == 4)

print sick_deriver.Deriver().derive(e2.expr)
--> (((b > 1) or (b < -10)) == 4)

print sick_derive.Deriver().derive(e1.expr)
--> (((b > 1) or (b < -10)) == 4) # !!! e1 has also changed!


vs:

from appscript import its

e3 = its.name_extension.isin(['jpg', 'jpeg'])
e4 = e3 .OR (its.file_type == 'JPEG')

print e4
# --> OR(its.name_extension.isin(['jpg', 'jpeg']), its.file_type
== 'JPEG')

print e3
# --> its.name_extension.isin(['jpg', 'jpeg'])



In the case of database queries, you can make this seem
much more natural by having 'x' represent some meaningful
object such as a table.

db = open_database("favourite_fruits")
fruits = db.fruits
query = (fruits.kind == "apple") && (fruits.tastiness >= 3)

Once you solve the mutating state problem, you can simply have your
module define a single (generic) 'root' object that users can use to
build any expression. (Appscript stores this object in its global
'its' variable, allowing it to be exported.)

Yes, that's the main nuisance with this technique.

Yup. (Trouble with these sorts of exercises is sooner or later you
start knocking against the boundaries of what the language can do,
and'll just tie yourself in knots with outrageous acrobatics if you're
not careful.)

I have
a PEP idea floating around to make and/or/not overridable,
for this very purpose. Hmmm... does that make me sick?-)

Raised this idea on the PythonMac SIG last week; it died a quick death
once someone pointed out that boolean operators also perform flow
control (lazy evaluation of operands). Implementing a suitable
override mechanism would be a non-trivial exercise (and seems unlikely
the BDFL would support it).
 
G

Greg Ewing (using news.cis.dfn.de)

has said:
Raised this idea on the PythonMac SIG last week; it died a quick death
once someone pointed out that boolean operators also perform flow
control

I do have a plan for handling that. Yes, it'll be non-trivial
(although not too complicated, I hope), and yes, Guido probably
won't like it much, but you never know 'till you try...
 
H

has

Greg Ewing (using news.cis.dfn.de) said:
I do have a plan for handling that. Yes, it'll be non-trivial
(although not too complicated, I hope), and yes, Guido probably
won't like it much, but you never know 'till you try...

You certainly have my interest...

has
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,053
Latest member
billing-software

Latest Threads

Top