Help improve program for parsing simple rules

pruebauno · Apr 16, 2009

Another interesting task for those that are looking for some
interesting problem:
I inherited some rule system that checks for programmers program
outputs that to be ported: given some simple rules and the values it
has to determine if the program is still working correctly and give
the details of what the values are. If you have a better idea of how
to do this kind of parsing please chime in. I am using tokenize but
that might be more complex than it needs to be. This is what I have
come up so far:

rules=[
'( A - B ) = 0',
'(A + B + C + D + E + F + G + H + I) = J',
'(A + B + C + D + E + F + G + H) = I',
'(A + B + C + D + E + F) = G',
'(A + B + C + D + E) = (F + G + H + I + J)',
'(A + B + C + D + E) = (F + G + H + I)',
'(A + B + C + D + E) = F',
'(A + B + C + D) = (E + F + G + H)',
'(A + B + C) = (D + E + F)',
'(A + B) = (C + D + E + F)',
'(A + B) = (C + D)',
'(A + B) = (C - D + E - F - G + H + I + J)',
'(A + B) = C',
'(A + B) = 0',
'(A+B+C+D+E) = (F+G+H+I+J)',
'(A+B+C+D) = (E+F+G+H)',
'(A+B+C+D)=(E+F+G+H)',
'(A+B+C)=(D+E+F)',
'(A+B)=(C+D)',
'(A+B)=C',
'(A-B)=C',
'(A/(B+C))',
'(G + H) = I',
'-0.99 LE ((A+B+C)-(D+E+F+G)) LE 0.99',
'-0.99 LE (A-(B+C)) LE 0.99',
'-1000.00 LE A LE 0.00',
'-5000.00 LE A LE 0.00',
'A < B',
'A < 7000',
'A = -(B)',
'A = C',
'A = 0',
'A GT 0',
'A GT 0.00',
'A GT 7.00',
'A LE B',
'A LT -1000.00',
'A LT -5000',
'A LT 0',
'A=(B+C+D)',
'A=B',
'I = (G + H)',
'0.00 LE A LE 4.00',
'4.00 LT A LE 7.00'
]
vars_={'A': 0, 'B': 1.1, 'C': 2.2, 'D': 3.3, 'E': 4.4, 'F': 5.5, 'G':
6.6, 'H':7.7, 'I':8.8, 'J':9.9}

import tokenize as TOK
import StringIO as SIO

COMP_REPLACERS={'LT':'<', 'GT':'>', 'LE':'<=', 'GE':'>=', '=':'==',
'=>':'=>', '=<':'=<'}

def get_tokens(string):
return [x[:2] for x in TOK.generate_tokens(SIO.StringIO
(string).readline)][:-1]

def replace_comps(toks,repl):
return [(TOK.OP, repl[x[1]]) if x[1] in repl else x for x in toks]

def replace_names(norm,vars_):
return [(TOK.NUMBER,str(vars_.get(x[1],x[1]))) if x[0]==TOK.NAME
else x for x in norm]

def split_seccions(subs,comp_split):
l=[]
g=[]
for type_,value in subs:
if value in comp_split:
g.append((l,value))
l=[]
else:
l.append((type_,value))
g.append((l,''))
return g

def all_seccions(grps):
return [(TOK.untokenize(lst),comper) for lst,comper in grps]

def calc_seccions(rep):
return [(str(eval(comp,{},{})),comper) for comp,comper in rep]

def calc_deltas(calc):
return [eval(calc[0]+'-'+calc[i+1][0],{},{}) for i in range(len
(calc)-1)]

def main():
for cur_rule in rules[20:26]:
tokens=get_tokens(cur_rule)
normal=replace_comps(tokens,COMP_REPLACERS)
subst=replace_names(normal,vars_)
groups=split_seccions(subst,COMP_REPLACERS.values())
rep=all_seccions(groups)
rep_out=''.join(x[0]+x[1] for x in rep)
calc=calc_seccions(rep)
calc_out=''.join(x[0]+x[1] for x in calc)
deltas=calc_deltas(calc)
result=eval(calc_out,{},{})
print
print 'Values:',', '.join(str(key)+': '+str(val) for key,val
in sorted(vars_.iteritems()))
print 'Read rule: ',cur_rule
print 'Used rule: ',TOK.untokenize(normal)
print 'Substitution:',rep_out
print 'Calculation: ',calc_out
print 'Result: ','Successful' if result else 'Failed'
if not result and '==' in calc_out:
print 'Difference: ',', '.join(map(str,deltas))
print '='*40

if __name__=='__main__': main()

Aaron Brady · Apr 16, 2009

Another interesting task for those that are looking for some
interesting problem:
I inherited some rule system that checks for programmers program
outputs that to be ported: given some simple rules and the values it
has to determine if the program is still working correctly and give
the details of what the values are. If you have a better idea of how
to do this kind of parsing please chime in. I am using tokenize but
that might be more complex than it needs to be. This is what I have
come up so far:

rules=[
'( A - B ) = 0',
'(A + B + C + D + E + F + G + H + I) = J',
'(A + B + C + D + E + F + G + H) = I',
'(A + B + C + D + E + F) = G',
'(A + B + C + D + E) = (F + G + H + I + J)',
'(A + B + C + D + E) = (F + G + H + I)',
'(A + B + C + D + E) = F',
'(A + B + C + D) = (E + F + G + H)',
'(A + B + C) = (D + E + F)',
'(A + B) = (C + D + E + F)',
'(A + B) = (C + D)',
'(A + B) = (C - D + E - F - G + H + I + J)',
'(A + B) = C',
'(A + B) = 0',
'(A+B+C+D+E) = (F+G+H+I+J)',
'(A+B+C+D) = (E+F+G+H)',
'(A+B+C+D)=(E+F+G+H)',
'(A+B+C)=(D+E+F)',
'(A+B)=(C+D)',
'(A+B)=C',
'(A-B)=C',
'(A/(B+C))',
'(G + H) = I',
'-0.99 LE ((A+B+C)-(D+E+F+G)) LE 0.99',
'-0.99 LE (A-(B+C)) LE 0.99',
'-1000.00 LE A LE 0.00', snip
def main():
for cur_rule in rules[20:26]:
tokens=get_tokens(cur_rule)
normal=replace_comps(tokens,COMP_REPLACERS)
subst=replace_names(normal,vars_)
groups=split_seccions(subst,COMP_REPLACERS.values())
rep=all_seccions(groups)
rep_out=''.join(x[0]+x[1] for x in rep)
calc=calc_seccions(rep)
calc_out=''.join(x[0]+x[1] for x in calc)
deltas=calc_deltas(calc)
result=eval(calc_out,{},{})

snip

You are using 'eval' which isn't safe if its inputs are dangerous. If
you are controlling the inputs, you might be interested in the
optional arguments to 'eval'.
False

The syntax is slightly different for Python 2. For the replacement of
'LE', I assume you require spaces on both sides.
False

If you need something more flexible, the 'ast' module gives you more
options. The string has to be a valid Python module to start with.

FYI, have you checked order of precedence in your custom rule set to
match Python's?

John Machin · Apr 17, 2009

COMP_REPLACERS={'LT':'<', 'GT':'>', 'LE':'<=', 'GE':'>=', '=':'==',
'=>':'=>', '=<':'=<'}

What do the '=>' and '=<' represent? Why are you replacing each by
itself?

pruebauno · Apr 17, 2009

Another interesting task for those that are looking for some
interesting problem:
I inherited some rule system that checks for programmers program
outputs that to be ported: given some simple rules and the values it
has to determine if the program is still working correctly and give
the details of what the values are. If you have a better idea of how
to do this kind of parsing please chime in. I am using tokenize but
that might be more complex than it needs to be. This is what I have
come up so far:

Click to expand...

rules=[
'( A - B ) = 0',
'(A + B + C + D + E + F + G + H + I) = J',
'(A + B + C + D + E + F + G + H) = I',
'(A + B + C + D + E + F) = G',
'(A + B + C + D + E) = (F + G + H + I + J)',
'(A + B + C + D + E) = (F + G + H + I)',
'(A + B + C + D + E) = F',
'(A + B + C + D) = (E + F + G + H)',
'(A + B + C) = (D + E + F)',
'(A + B) = (C + D + E + F)',
'(A + B) = (C + D)',
'(A + B) = (C - D + E - F - G + H + I + J)',
'(A + B) = C',
'(A + B) = 0',
'(A+B+C+D+E) = (F+G+H+I+J)',
'(A+B+C+D) = (E+F+G+H)',
'(A+B+C+D)=(E+F+G+H)',
'(A+B+C)=(D+E+F)',
'(A+B)=(C+D)',
'(A+B)=C',
'(A-B)=C',
'(A/(B+C))',
'(G + H) = I',
'-0.99 LE ((A+B+C)-(D+E+F+G)) LE 0.99',
'-0.99 LE (A-(B+C)) LE 0.99',
'-1000.00 LE A LE 0.00', snip
def main():
for cur_rule in rules[20:26]:
tokens=get_tokens(cur_rule)
normal=replace_comps(tokens,COMP_REPLACERS)
subst=replace_names(normal,vars_)
groups=split_seccions(subst,COMP_REPLACERS.values())
rep=all_seccions(groups)
rep_out=''.join(x[0]+x[1] for x in rep)
calc=calc_seccions(rep)
calc_out=''.join(x[0]+x[1] for x in calc)
deltas=calc_deltas(calc)
result=eval(calc_out,{},{})

Click to expand...

snip

You are using 'eval' which isn't safe if its inputs are dangerous. If
you are controlling the inputs, you might be interested in the
optional arguments to 'eval'.

False

The syntax is slightly different for Python 2. For the replacement of
'LE', I assume you require spaces on both sides.

False

If you need something more flexible, the 'ast' module gives you more
options. The string has to be a valid Python module to start with.

FYI, have you checked order of precedence in your custom rule set to
match Python's?

I know about evals implication of safety. Rules are defined by the
programmers so I don't worry too much about it at this point. They
should know better than messing up their application server. Unless
there is some easier way to do it I am willing to take the risk.
Precedence is standard math, we can always add some extra parenthesis
to the rules, I don't thing the old system would mind.

I thought about using eval with a locals dictionary, but they want
output of the intermediate values. I want to avoid the situation where
the intermediate output does not match what eval is doing.

pruebauno · Apr 17, 2009

What do the '=>' and '=<' represent? Why are you replacing each by
itself?

because of this:
groups=split_seccions(subst,COMP_REPLACERS.values())

I didn't want to create a separate variable. Those are dubious anyway,
I haven't seen one in all of our current rules and it generates a
syntax error in Python so I took them out:

COMP_REPLACERS={'LT':'<', 'GT':'>', 'LE':'<=', 'GE':'>=', '=':'=='}

And before somebody else points it out. This line:

(TOK.NUMBER,str(vars_.get(x[1],x[1]))) if x[0]==TOK.NAME

should really be:

(TOK.NUMBER,str(vars_[x[1]])) if (x[0]==TOK.NAME and x[1] in vars_)

Paul McGuire · Apr 17, 2009

Another interesting task for those that are looking for some
interesting problem:
I inherited some rule system that checks for programmers program
outputs that to be ported: given some simple rules and the values it
has to determine if the program is still working correctly and give
the details of what the values are. If you have a better idea of how
to do this kind of parsing please chime in. I am using tokenize but
that might be more complex than it needs to be. This is what I have
come up so far:

I've been meaning to expand on pyparsing's simpleArith.py example for
a while, to include the evaluation of the parsed tokens. Here is the
online version, http://pyparsing.wikispaces.com/file/view/eval_arith.py,
it will be included in version 1.5.2 (coming shortly). I took the
liberty of including your rule set as a list of embedded test cases.

-- Paul

John Machin · Apr 17, 2009

I've been meaning to expand on pyparsing's simpleArith.py example for
a while, to include the evaluation of the parsed tokens. Here is the
online version,http://pyparsing.wikispaces.com/file/view/eval_arith.py,
it will be included in version 1.5.2 (coming shortly). I took the
liberty of including your rule set as a list of embedded test cases.

Hi Paul,

I don't see how it can handle the chained relop in the last two
testcases e. g. '0.00 LE A LE 4.00' -- unless relops are chained by
default in your parser.

Cheers,
John

Paul McGuire · Apr 17, 2009

I don't see how it can handle the chained relop in the last two
testcases e. g. '0.00 LE A LE 4.00' -- unless relops are chained by
default in your parser.

John -

First of all, to respect precedence of operations, higher level
precedences are parsed and grouped first. If you left off the parse
actions and just printed out the parse tree created by the example
(using asList()), for "A + B * C" you would get ['A', '+', [ 'B', '*',
'C' ]]. If you expand that test case to "A + B * C + D", you would
get ['A', '+', [ 'B', '*', 'C' ], '+', 'D' ]. This is counter to the
conventional infix parser that would create [['A', '+', [ 'B', '*',
'C' ]], '+', 'D' ], in which binary operators typicaly return
'operand' 'operator' 'operand' triples, and either operand might be a
nested parse tree.

As it happens, when using pyparsing's operatorPrecedence helper, *all*
binary operators at the same precedence level are actually parsed in a
single chain.

This is why you see this logic in EvalAddOp.eval:

def eval(self):
sum = self.value[0].eval()
for op,val in operatorOperands(self.value[1:]):
if op == '+':
sum += val.eval()
if op == '-':
sum -= val.eval()
return sum

operatorOperands is a little generator that returns operator-operand
pairs, beginning at the second (that is, the "1th") token in the
list. You can't just do the simple evaluation of "operand1 operator
operand2", you have to build up the sum by first evaluating operand1,
and then iterating over the operator-operand pairs in the rest of the
list. Same thing for the muliplication operators.

For the comparison operators, things are a little more involved.
"operand1 operator1 operand2 operator2 operand3" (as in "0.00 LE A LE
4.00") has to evaluate as

op1 operator1 op2 AND op2 operator2 op3

So EvalComparisonOp's eval method looks like:

def eval(self):
val1 = self.value[0].eval()
ret = True
for op,val in operatorOperands(self.value[1:]):
fn = EvalComparisonOp.opMap[op]
val2 = val.eval()
ret = ret and fn(val1,val2)
val1 = val2
return ret

The first term is evaluated and stored in val1. Then each
comparisonop-operand pair is extracted, the operand is eval()'ed and
stored in val2, and the comparison method that is mapped to the
comparisonop is called using val1 and val2. Then, to move on to do
the next comparison, val2 is stored into val1, and the we iterate to
the next comparison-operand pair. In fact, not only does this handle
"0.00 LE A LE 4.00", but it could also evaluate "0.00 LE A LE 4.00 LE
E > D". (I see that I should actually do some short-circuiting here -
if ret is false after calling fn(val1,val2), I should just break out
at that point. I'll have that fixed in the online version shortly.)

-- Paul

Aaron Brady · Apr 17, 2009

On Apr 16, 10:57 am, (e-mail address removed) wrote:

Another interesting task for those that are looking for some
interesting problem:
I inherited some rule system that checks for programmers program
outputs that to be ported: given some simple rules and the values it
has to determine if the program is still working correctly and give
the details of what the values are. If you have a better idea of how
to do this kind of parsing please chime in. I am using tokenize but
that might be more complex than it needs to be. This is what I have
come up so far:

Click to expand...

snip

def main():
for cur_rule in rules[20:26]:
tokens=get_tokens(cur_rule)
normal=replace_comps(tokens,COMP_REPLACERS)
subst=replace_names(normal,vars_)
groups=split_seccions(subst,COMP_REPLACERS.values())
rep=all_seccions(groups)
rep_out=''.join(x[0]+x[1] for x in rep)
calc=calc_seccions(rep)
calc_out=''.join(x[0]+x[1] for x in calc)
deltas=calc_deltas(calc)
result=eval(calc_out,{},{})

Click to expand...

snip

Click to expand...

snip

a= '-1000.00 < A < 0.00'
eval( a, { 'A': -100 } ) snip
a= '-1000.00 LE A LE 0.00'
b= a.replace( ' LE ', ' <= ' )

Click to expand...

Click to expand...

snip

I know about evals implication of safety. Rules are defined by the
programmers so I don't worry too much about it at this point. They
should know better than messing up their application server. Unless
there is some easier way to do it I am willing to take the risk.
Precedence is standard math, we can always add some extra parenthesis
to the rules, I don't thing the old system would mind.

I thought about using eval with a locals dictionary, but they want
output of the intermediate values. I want to avoid the situation where
the intermediate output does not match what eval is doing.

I take you to need the operands to the individual comparison
operators; that is, each side of each comparison. There might be a
way using abstract syntax trees. Or, a basic 're' split can simplify
it.

import re
a= '-1000.00 LE A LE 0.00'
b= re.split( r'\s(LT|GT|LE|GE|=|<|>|<=|>=)\s', a )
b ['-1000.00', 'LE', 'A', 'LE', '0.00']
COMP_REPLACERS={'LT':'<', 'GT':'>', 'LE':'<=', 'GE':'>=', '=':'=='}
c= [ COMP_REPLACERS.get( x, x ) for x in b ]
c ['-1000.00', '<=', 'A', '<=', '0.00']
vars_={'A': 0, 'B': 1.1, 'C': 2.2, 'D': 3.3, 'E': 4.4, 'F': 5.5, 'G': .... 6.6, 'H':7.7, 'I':8.8, 'J':9.9}
d= [ str( vars_.get( x, x ) ) for x in c ]
d ['-1000.00', '<=', '0', '<=', '0.00']
eval( ''.join( d ) )

Click to expand...

Click to expand...

True

The 'dict.get( x, x )' expression returns the value of the entry for
x, or x itself if it is not present.

No promises. I didn't think it all the way through.

Aaron Brady · Apr 17, 2009

not only does this handle
"0.00 LE A LE 4.00", but it could also evaluate "0.00 LE A LE 4.00 LE
E > D". (I see that I should actually do some short-circuiting here -
if ret is false after calling fn(val1,val2), I should just break out
at that point. I'll have that fixed in the online version shortly.)

Hi, not to offend; I don't know your background. One thing I like
about Python is it and the docs are careful about short-circuiting
conditions. ISTR that C left some of those details up to the compiler
at one point.
.... print( 'in f' )
.... return 10
....in f
Truein f
in f
True

Therefore, if op{n} has side effects, 'op1 operator1 op2 AND op2
operator2 op3' is not equivalent to 'op1 optor1 op2 optor2 op3'.

Paul McGuire · Apr 17, 2009

Hi, not to offend; I don't know your background.

Courtesy on Usenet!!! I'm going to go buy a lottery ticket!

Not to worry, I'm a big boy. People have even called my baby ugly,
and I manage to keep my blood pressure under control.

One thing I like
about Python is it and the docs are careful about short-circuiting
conditions. ISTR that C left some of those details up to the compiler
at one point.

... print( 'in f' )
... return 10
...>>> 0<f()<20

in f
True>>> 0<f() and f()<20

in f
in f
True

Therefore, if op{n} has side effects, 'op1 operator1 op2 AND op2
operator2 op3' is not equivalent to 'op1 optor1 op2 optor2 op3'.

Interesting point, but I don't remember that "A < B < C" is valid C
syntax, are you perhaps thinking of a different language?

By luck, my implementation of EvalComparisonOp.eval does in fact
capture the post-eval value of op2, so that if its evaluation caused
any side effects, they would not be repeated.

-- Paul

pruebauno · Apr 17, 2009

I've been meaning to expand on pyparsing's simpleArith.py example for
a while, to include the evaluation of the parsed tokens. Here is the
online version,http://pyparsing.wikispaces.com/file/view/eval_arith.py,
it will be included in version 1.5.2 (coming shortly). I took the
liberty of including your rule set as a list of embedded test cases.

-- Paul

That is fine with me. I don't know how feasible it is for me to use
pyparsing for this project considering I don't have admin access on
the box that is eventually going to run this. To add insult to injury
Python is in the version 2->3 transition (I really would like to push
the admins to install 3.1 by the end of the year before the amount of
code written by us gets any bigger) meaning that any third party
library is an additional burden on the future upgrade. I can't
remember if pyparsing is pure Python. If it is I might be able to
include it alongside my code if it is not too big.

Aaron Brady · Apr 17, 2009

Courtesy on Usenet!!! I'm going to go buy a lottery ticket!

Not to worry, I'm a big boy. People have even called my baby ugly,
and I manage to keep my blood pressure under control.

Interesting point, but I don't remember that "A < B < C" is valid C
syntax, are you perhaps thinking of a different language?

By luck, my implementation of EvalComparisonOp.eval does in fact
capture the post-eval value of op2, so that if its evaluation caused
any side effects, they would not be repeated.

-- Paul

It was very hazy in my memory and is easily solved by a web search.
Perhaps I was thinking of the short-circuiting of 'and', and mixing it
up with the warning about parentheses in macros.

-Macros? NO, Batman.
-I'm afraid so Robin. Macros.

Paul McGuire · Apr 17, 2009

That is fine with me. I don't know how feasible it is for me to use
pyparsing for this project considering I don't have admin access on
the box that is eventually going to run this. To add insult to injury
Python is in the version 2->3 transition (I really would like to push
the admins to install 3.1 by the end of the year before the amount of
code written by us gets any bigger) meaning that any third party
library is an additional burden on the future upgrade. I can't
remember if pyparsing is pure Python. If it is I might be able to
include it alongside my code if it is not too big.- Hide quoted text -

- Show quoted text -

It *is* pure Python, and consists of a single source file for the very
purpose of ease-of-inclusion. A number of projects include their own
versions of pyparsing for version compatibility management, matplotlib
is one that comes to mind.

The upcoming version 1.5.2 download includes a pyparsing_py3.py file
for Python 3 compatibility, I should have that ready for users to
download *VERY SOON NOW*!

-- Paul

pruebauno · Apr 20, 2009

It *is* pure Python, and consists of a single source file for the very
purpose of ease-of-inclusion. A number of projects include their own
versions of pyparsing for version compatibility management, matplotlib
is one that comes to mind.

The upcoming version 1.5.2 download includes a pyparsing_py3.py file
for Python 3 compatibility, I should have that ready for users to
download *VERY SOON NOW*!

-- Paul

Thanks,
I will consider it. I have to admit that although it looks like it is
a very good solution, it is also longer and more complex than my
current code. Having to explicitly define standard python evaluation
and semantics is a bit overkill.

Blue J Ciphertext Program	2	Nov 22, 2023
Need Help: Program to Accept 2 Matrices and Show their Sum	0	Aug 21, 2022
My Status, Ciphertext	2	Nov 28, 2023
Python3 string slicing	2	Dec 11, 2023
help me to improve this program please	1	May 30, 2014
Help for my project in the last minute	0	Apr 23, 2022
Python battle game help	2	Feb 23, 2023
An empty initializer is invalid for an array with unspecified bound	0	Jul 1, 2020

Help improve program for parsing simple rules

pruebauno

Aaron Brady

John Machin

pruebauno

pruebauno

Paul McGuire

John Machin

Paul McGuire

Aaron Brady

Aaron Brady

Paul McGuire

pruebauno

Aaron Brady

Paul McGuire

pruebauno

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads