Problem with Lexical Scope

Discussion in 'Python' started by jslowery@gmail.com, Dec 12, 2005.

  1. Guest

    I am not completely knowledgable about the status of lexical scoping in
    Python, but it was my understanding that this was added in a long time
    ago around python2.1-python2.2

    I am using python2.4 and the following code throws a "status variable"
    not found in the inner-most function, even when I try to "global" it.

    def collect(fields, reducer):
    def rule(record):
    status = True
    def _(x, y):
    cstat = reducer(x, y)
    if status and not cstat:
    status = False
    return y
    return reduce(_, [record[field] for field in fields])
    return rule

    What gives?
     
    , Dec 12, 2005
    #1
    1. Advertising

  2. Guest

    Well, I think I found a nasty little hack to get around it, but I still
    don't see why it doesn't work in the regular way.

    def collect(fields, reducer):
    def rule(record):
    # Nasty hack b/c can't get lexical scoping of status to work
    status = [True]
    def _(x, y, s=status):
    cstat = reducer(x, y)
    if s[0] and not cstat:
    s[0] = False
    return y
    reduce(_, [record[field] for field in fields])
    return status[0]
    return rule
     
    , Dec 12, 2005
    #2
    1. Advertising

  3. Steve Holden Guest

    wrote:
    > I am not completely knowledgable about the status of lexical scoping in
    > Python, but it was my understanding that this was added in a long time
    > ago around python2.1-python2.2
    >
    > I am using python2.4 and the following code throws a "status variable"
    > not found in the inner-most function, even when I try to "global" it.
    >
    > def collect(fields, reducer):
    > def rule(record):
    > status = True
    > def _(x, y):
    > cstat = reducer(x, y)
    > if status and not cstat:
    > status = False
    > return y
    > return reduce(_, [record[field] for field in fields])
    > return rule
    >
    > What gives?
    >

    What's happening is that the interpreter, when it compiles the inner
    function _(), sees an assignment to status and so assumes it is local to
    the _() function. Consequently, since you reference it inside _() before
    assignment you get (I presume) an exception reporting an unbound local
    variable.

    The scoping rules do work when you obey them:

    >>> def f1(a, b):

    ... s = a+b
    ... def _(x):
    ... return s+x
    ... return _
    ...
    >>> func = f1(12, 13)
    >>> func(10)

    35
    >>>


    Here the nested lexical scopes rule isn't that helpful given the
    overriding nature of assignment within an inner scope. Using global will
    simply put the status variable at *module* scope, which also isn't what
    you want.

    regards
    Steve
    --
    Steve Holden +44 150 684 7255 +1 800 494 3119
    Holden Web LLC www.holdenweb.com
    PyCon TX 2006 www.python.org/pycon/
     
    Steve Holden, Dec 12, 2005
    #3
  4. Guest

    That does make sense. So there is no way to modify a variable in an
    outer lexical scope? Is the use of a mutable data type the only way?

    I'm trying to think of a case that would create semantic ambiguity when
    being able to modify a variable reference in an outer scope, but I
    cannot (which is probably short-sighted). Is this behavior correct
    because of performance or perhaps because of a certain design of the
    interpreter?
     
    , Dec 12, 2005
    #4
  5. Duncan Booth Guest

    wrote:

    > I am using python2.4 and the following code throws a "status variable"
    > not found in the inner-most function, even when I try to "global" it.
    >
    > def collect(fields, reducer):
    > def rule(record):
    > status = True
    > def _(x, y):
    > cstat = reducer(x, y)
    > if status and not cstat:
    > status = False
    > return y
    > return reduce(_, [record[field] for field in fields])
    > return rule
    >
    > What gives?


    You rebind status in the inner function. Binding to a name makes it local
    to that function (unless declared global). A name in an enclosing function
    is neither local nor global so you cannot rebind it.

    Ways round this include storing the status in a mutable value, making it an
    attribute of a class, or rewriting the code to not require setting an outer
    variable in the first place.

    The last of these is the easiest in this case: status is never actually
    used or returned anywhere so you could just remove all references to it
    with no impact on the code. In fact, so long as reducer has no side
    effects, you could just do this:

    def collect(fields, reducer):
    def rule(record):
    if fields:
    return record[fields[-1]]
    return rule

    Alternatively, if I assume you actually wanted rule to return the status as
    well as y, then the outer assignment disappears quite easily:

    def collect(fields, reducer):
    def rule(record):
    def _((x, status), y):
    cstat = reducer(x, y)
    return (y, status and cstat)
    return reduce(_, [record[field] for field in fields], (0, True))
    return rule

    If reducer has no side effects this can be further reduced:

    def collect(fields, reducer):
    def rule(record):
    def _((x, status), y):
    return (y, status and reducer(x,y))
    return reduce(_, [record[field] for field in fields], (0, True))
    return rule

    Given that the final y returned is simply the last record[field] maybe you
    only wanted the status value? In that case expanding the reduce and _ into
    inline code is likely to make things clearer:

    def collect(fields, reducer):
    def rule(record):
    prev = 0
    for field in fields:
    if not reducer(prev, record[field]):
    return False
    prev = record[field]
    return True
    return rule
     
    Duncan Booth, Dec 12, 2005
    #5
  6. Guest

    Thanks for the example code. Definately provided a few different ways
    of doing the construction.

    Actually, the status variable was the only thing that mattered for the
    inner function. The first code post had a bug b/c I was first just
    trying to use reduce. However, I realized that the boolean reducer
    ended up using the boolean result instead of the next field for
    subsequent operations.

    The effect I was going for was was a logical AND of all the returned
    values from reducer. However the inner function is supposed to use the
    same type algorithm as the built-in reduce function.

    reducer does have no side effects so I suppose short-circuting it would
    be the best thing. I think the only thing about the last example is
    that it starts things off with a zero. I think that would boink it.

    The way that this thing works is it's a general algorithm for boolean
    operations that are applied to fields in a dict.

    def lt(*fields):
    return collect(fields, lambda x, y: x < y)

    data = {
    'birth_date' : 19740201,
    'hire_date' : 19840721,
    'fire_date' : 19850123
    }

    rule = lt('birth_date', 'hire_date')
    assert rule(data) == True
     
    , Dec 12, 2005
    #6
  7. Guest

    wrote:
    > That does make sense. So there is no way to modify a variable in an
    > outer lexical scope? Is the use of a mutable data type the only way?
    >
    > I'm trying to think of a case that would create semantic ambiguity when
    > being able to modify a variable reference in an outer scope, but I
    > cannot (which is probably short-sighted). Is this behavior correct
    > because of performance or perhaps because of a certain design of the
    > interpreter?


    That is by design and IMO a good thing as you don't need to declare
    things(the '=' sort of done it for you) in python. I think you can do
    something like globals()['s'] but that is ugly and should be
    discouraged.

    Just use a mutable object like s=[1] or a Ref class or whatever similar
    implementation if you need this "shared write" functionality. Just
    reading is perfectly fine.
     
    , Dec 12, 2005
    #7
  8. Guest

    wrote:
    > Thanks for the example code. Definately provided a few different ways
    > of doing the construction.
    >
    > Actually, the status variable was the only thing that mattered for the
    > inner function. The first code post had a bug b/c I was first just
    > trying to use reduce. However, I realized that the boolean reducer
    > ended up using the boolean result instead of the next field for
    > subsequent operations.
    >
    > The effect I was going for was was a logical AND of all the returned
    > values from reducer. However the inner function is supposed to use the
    > same type algorithm as the built-in reduce function.
    >
    > reducer does have no side effects so I suppose short-circuting it would
    > be the best thing. I think the only thing about the last example is
    > that it starts things off with a zero. I think that would boink it.
    >
    > The way that this thing works is it's a general algorithm for boolean
    > operations that are applied to fields in a dict.
    >
    > def lt(*fields):
    > return collect(fields, lambda x, y: x < y)
    >
    > data = {
    > 'birth_date' : 19740201,
    > 'hire_date' : 19840721,
    > 'fire_date' : 19850123
    > }
    >
    > rule = lt('birth_date', 'hire_date')
    > assert rule(data) == True


    I usually solve this kind of thing by giving reduce a tuple(or n-uples)
    as the accumulator then factor out only the needed final result sans
    these "temp variables".
     
    , Dec 12, 2005
    #8
  9. On Mon, 12 Dec 2005 08:26:59 +0000, Steve Holden <> wrote:
    [...]
    >
    >The scoping rules do work when you obey them:
    >
    > >>> def f1(a, b):

    > ... s = a+b
    > ... def _(x):
    > ... return s+x
    > ... return _
    > ...
    > >>> func = f1(12, 13)
    > >>> func(10)

    >35
    > >>>

    >
    >Here the nested lexical scopes rule isn't that helpful given the
    >overriding nature of assignment within an inner scope. Using global will
    >simply put the status variable at *module* scope, which also isn't what
    >you want.
    >

    Is external rebindability being considered at all ?

    what if it were a new builtin that could do it in C, e.g. rebind(targetname, expr)
    where target could be in any enclosing scope (including globals(),
    but not extending to __builtins__) and it would be a NameError if target
    didn't exist. Thus this would be possible (untested and currently impossible ;-)

    def mkbumper(initval, incr):
    def _():
    rebind('initval', initval+incr) # vs initval := initval+incr
    return initval
    return _

    bump3 = mkbumper(8, 3)
    bump3() => 11
    bump3() => 14

    I wonder if a function would fly better than a punctuation tweak on
    bare name assignment ;-)

    Regards,
    Bengt Richter
     
    Bengt Richter, Dec 12, 2005
    #9
  10. Duncan Booth Guest

    wrote:

    > reducer does have no side effects so I suppose short-circuting it would
    > be the best thing. I think the only thing about the last example is
    > that it starts things off with a zero. I think that would boink it.


    In that case, and assuming that fields contains at least one entry:

    def collect(fields, reducer):
    def rule(record):
    f = iter(fields)
    prev = record[f.next()]
    for field in f:
    if not reducer(prev, record[field]):
    return False
    prev = record[field]
    return True
    return rule
     
    Duncan Booth, Dec 12, 2005
    #10
  11. On 12 Dec 2005 01:23:48 -0800, "" <> wrote:

    >Thanks for the example code. Definately provided a few different ways
    >of doing the construction.
    >
    >Actually, the status variable was the only thing that mattered for the
    >inner function. The first code post had a bug b/c I was first just
    >trying to use reduce. However, I realized that the boolean reducer
    >ended up using the boolean result instead of the next field for
    >subsequent operations.
    >
    >The effect I was going for was was a logical AND of all the returned
    >values from reducer. However the inner function is supposed to use the
    >same type algorithm as the built-in reduce function.
    >
    >reducer does have no side effects so I suppose short-circuting it would
    >be the best thing. I think the only thing about the last example is
    >that it starts things off with a zero. I think that would boink it.
    >
    >The way that this thing works is it's a general algorithm for boolean
    >operations that are applied to fields in a dict.
    >
    >def lt(*fields):
    > return collect(fields, lambda x, y: x < y)
    >
    >data = {
    > 'birth_date' : 19740201,
    > 'hire_date' : 19840721,
    > 'fire_date' : 19850123
    >}
    >
    >rule = lt('birth_date', 'hire_date')
    >assert rule(data) == True
    >

    I suspect you are trying to win an obfuscation contest ;-)
    It appears that in this case you want lt to have the same effect
    as if it were written

    >>> def lt(*fields):

    ... def _(dct):
    ... return dct[fields[0]] < dct[fields[1]]
    ... return _
    ...
    >>> data = {

    ... 'birth_date' : 19740201,
    ... 'hire_date' : 19840721,
    ... 'fire_date' : 19850123
    ... }
    >>>
    >>> rule = lt('birth_date', 'hire_date')
    >>> rule

    <function _ at 0x02F986BC>
    >>> rule(data)

    True
    >>> assert rule(data)
    >>> assert rule(data) == True


    But why do you need to "collect" with reduce and all that cryptic cruft?
    I think we need a use case where you have other than two fields to operate on.
    but in that case, what would reduce do? Something seems bogus with that.

    I'm guessing you want to loop through a bunch of "records" that are extracted from
    a database in the form of dicts, and you want to check them in various ways.
    So you want to extract fields in given order and apply various checks, something like (untested)

    for data in dbrecordsupply:
    for fields, check, msg in record_supply_checklist:
    assert check(*[data[field] for field in fields]), msg

    where record_supply_checklist is something like (untested) (and eschewing lambdas for a change ;-)

    def before(x, y): return x<y
    def before_now(x): return x < time.time()
    def during_company_existence(x): return company_founding_time <= x < time.time()

    record_supply_checklist = [
    (['birth_date', 'hire_date'], before, 'born at or after hire),
    (['birth_date'], before_now, 'not yet born'),
    (['hire_date'], during_company_existence, 'bogus hire date')
    # etc ...
    ]

    I suppose different checklists would go with different recordsupply's.
    Is this too simple? Or have I misread between the lines? ;-)
    (Of course you might want to log stuff instead of blowing up on the first assert.)

    Regards,
    Bengt Richter
     
    Bengt Richter, Dec 12, 2005
    #11
  12. Guest

    Well, the the comparison operations are just a special case really.I
    don't know about the obfuscation contest. I've attempted to make an
    extensible library.

    def lt(*fields):
    return collect(fields, lambda x, y: x < y)

    def gt(*fields):
    return collect(fields, lambda x, y: x > y)

    def gte(*fields):
    """ gte(field, ...) -> rule
    """
    return collect(fields, lambda x, y: x >= y)

    etc...

    The purpose of writing the collect function was just to be able to
    easily apply the logic to whatever function was wanted.

    the added ability is to be able to take as many arguments as it wants.

    hire_rule = lt('birth_date', 'hire_date', 'fire_date')
    cube_rule = eq('height', 'width', 'depth')

    The reason that it's written like this is because I'm attempting to
    make things as declarative as possible. I could use classes and
    override __special__ operators and things like that but just applying
    rules using prefix notation seems to work alright.

    I've basically got a simple call signature...takes dict returns bool
    that plugs into the little validation routine.

    It turns out that envoking the rules is easistly expressed with a
    closure, though I might switch to types if it needs that added
    complexity.

    But your assumption is correct, it's for dictionaries of data to be
    validated. I haven't decided if I want to take advantage of the dict's
    mutability to include formating as well.

    Here's a bit from my unit test.

    rule = when(all(have('length'), have('width')),
    check(['length', 'width'], lambda x, y: x == y))
    assert rule({'length' : '2', 'width' : '2'}) == True
    assert rule({'length' : '2', 'width' : '1'}) == False
    assert rule({'length' : '1', 'width' : '2'}) == False

    I've also got a "collectable" equality function so that can be shown
    as.

    box_rule = when(all(have('length'), have('width')), eq('length',
    'width'))

    Which basically means, if we have both a length and a width, then be
    sure they are equal.

    Of course, there is a little function that performs a conjunction of a
    complete list of rules on a dict and returns the rules that failed.

    I've also got a little adapter that translates functions that take a
    string and returns bool into one that fits the call signature called
    match.

    match(is_ssn, 'social-security_number')

    Like I said, it may be considered more readable if using operator
    overloading so that it uses python's native syntax. Considering the
    added complexity, I don't know if it would be worth it. I'd probably
    put a simple little declarative language on top of it to translate the
    function calls before that.
     
    , Dec 16, 2005
    #12
  13. On 15 Dec 2005 18:45:17 -0800, "" <> wrote:

    >Well, the the comparison operations are just a special case really.I
    >don't know about the obfuscation contest. I've attempted to make an
    >extensible library.

    Sorry about "obfuscation contest," I just balked at the reduce code, which seemed
    like premature overgeneralization ;-)
    >
    >def lt(*fields):
    > return collect(fields, lambda x, y: x < y)
    >
    >def gt(*fields):
    > return collect(fields, lambda x, y: x > y)
    >
    >def gte(*fields):
    > """ gte(field, ...) -> rule
    > """
    > return collect(fields, lambda x, y: x >= y)
    >
    >etc...

    DRY ? ;-)

    >
    >The purpose of writing the collect function was just to be able to
    >easily apply the logic to whatever function was wanted.
    >
    >the added ability is to be able to take as many arguments as it wants.
    >

    Yes, but does collect(fields, fun) always know that fun takes two args? And that
    it makes sense to apply fun pairwise as a boolean relation to reduce to a single boolean value?
    You could say that by definition it does. In that case, will your future library extender need
    to add some other kind of collect-like function?

    >hire_rule = lt('birth_date', 'hire_date', 'fire_date')
    >cube_rule = eq('height', 'width', 'depth')


    I see where the idea of using reduce would occur ;-)

    >
    >The reason that it's written like this is because I'm attempting to
    >make things as declarative as possible. I could use classes and
    >override __special__ operators and things like that but just applying
    >rules using prefix notation seems to work alright.

    Applying or generating? But yeah, prefix should work fine. Or infix.

    >
    >I've basically got a simple call signature...takes dict returns bool
    >that plugs into the little validation routine.

    Simple is good.
    >
    >It turns out that envoking the rules is easistly expressed with a
    >closure, though I might switch to types if it needs that added
    >complexity.

    It's good that you have a syntax that can be implemented either way.

    >
    >But your assumption is correct, it's for dictionaries of data to be
    >validated. I haven't decided if I want to take advantage of the dict's
    >mutability to include formating as well.

    You might consider putting formatting in dict subclasses that have appropriate
    __str__ and/or __repr__ methods, and doing print Fmtxxx(record_dict) or such?

    >
    >Here's a bit from my unit test.

    Looks nice. And the syntax is regular enough that you can probably
    write an optimizing rule generator when/if you need it.

    >
    >rule = when(all(have('length'), have('width')),
    > check(['length', 'width'], lambda x, y: x == y))
    >assert rule({'length' : '2', 'width' : '2'}) == True
    >assert rule({'length' : '2', 'width' : '1'}) == False
    >assert rule({'length' : '1', 'width' : '2'}) == False
    >

    But what about when the "when" clause says the rule does not apply?
    Maybe return NotImplemented, (which passes as True in an if test) e.g.,

    assert rule({'length' : '1', 'height' : '2'}) is NotImplemented

    >I've also got a "collectable" equality function so that can be shown
    >as.
    >
    >box_rule = when(all(have('length'), have('width')), eq('length',
    >'width'))
    >
    >Which basically means, if we have both a length and a width, then be
    >sure they are equal.

    either way, I think it would be better to give tests names, i.e., instead
    of the lambda, pass a function def eq(x,y): return x==y and then also
    change check to have signature def check(testfun, *fields):...
    But "collectable" has the possibility of inline code generation (see below ;-)

    >
    >Of course, there is a little function that performs a conjunction of a
    >complete list of rules on a dict and returns the rules that failed.
    >
    >I've also got a little adapter that translates functions that take a
    >string and returns bool into one that fits the call signature called
    >match.
    >
    >match(is_ssn, 'social-security_number')
    >
    >Like I said, it may be considered more readable if using operator
    >overloading so that it uses python's native syntax. Considering the
    >added complexity, I don't know if it would be worth it. I'd probably
    >put a simple little declarative language on top of it to translate the
    >function calls before that.
    >


    Note that you could treat your when, all, have, and check as code source generators
    and compile a rule as the last part of "when" processing, e.g.,
    (fresh off the griddle, only tested as far as you see ;-)

    ----< jslowery.py >------------------------------------------
    def all(*tests):
    return '(' + ' and '.join(tests) +')'
    def have(*fields):
    return '(' + ' and '.join('"%s" in record'%field for field in fields) + ')'
    def check(tfun, *fields):
    return '%s(%s)' % (tfun.func_name, ', '.join('record[%r]'%field for field in fields))
    def when(cond, test):
    g = globals()
    d = dict((name, g[name]) for name in __testfuns__)
    src = """\
    def rule(record):
    if not (%s): return NotImplemented
    return (%s)
    """ %(cond, test)
    print src # XXX debug
    exec src in d
    return d['rule']


    def eq(*fields): # "collectable"
    return '(' + ' == '.join('record[%r]'%(field,) for field in fields) + ')'

    def eq_test(x, y): # not "collectable" in itself, use with check
    return x==y

    __testfuns__ = ['eq_test']

    #rule = when(all(have('length'), have('width')),
    # check(['length', 'width'], lambda x, y: x == y))

    # rewritten with changed check, differentiating eq_test from "collectable" eq
    rule = when(all(have('length'), have('width')),
    check(eq_test, 'length', 'width'))
    box_rule = when(all(have('length'), have('width')), eq('length', 'width'))
    -------------------------------------------------------------

    Then: (I left the debug print in, which prints the rule source code, with name "rule"
    all the time. You could always change that)

    >>> import jslowery

    def rule(record):
    if not ((("length" in record) and ("width" in record))): return NotImplemented
    return (eq_test(record['length'], record['width']))

    def rule(record):
    if not ((("length" in record) and ("width" in record))): return NotImplemented
    return ((record['length'] == record['width']))

    So the "collectable" eq for box_rule (second debug output) generated in-line code,
    which will run faster.

    >>> jslowery.rule({'length':2, 'width':2})

    True
    >>> jslowery.rule({'length':2, 'width':1})

    False
    >>> jslowery.rule({'height':2, 'width':1})

    NotImplemented
    >>> jslowery.box_rule({'height':2, 'width':1})

    NotImplemented
    >>> jslowery.box_rule({'length':2, 'width':1})

    False
    >>> jslowery.box_rule({'length':2, 'width':2})

    True

    Looks like I didn't need all the parens ;-)

    Anyway, this is not well tested, but rules constructed this way should run
    a lot faster than doing all the nested calling at run time. There are always
    security issues in exec-ing or eval-ing, so I am not recommending the above
    as secure from malicious rule-writers, obviously.

    Regards,
    Bengt Richter
     
    Bengt Richter, Dec 16, 2005
    #13
  14. Guest

    >Sorry about "obfuscation contest," I just balked at the reduce code, which seemed
    >like premature overgeneralization ;-)


    It's a personal preference, but the only thing I consider premature is
    optimization, not generalization. I prefer to try and capture any
    concept I can as its own abstracted procedure. In practice this usually
    means using a lot of lazy data structures at the end of the day :)

    >Yes, but does collect(fields, fun) always know that fun takes two args? And that
    >it makes sense to apply fun pairwise as a boolean relation to reduce to a single boolean value?

    You could say that by definition it does. In that case, will your
    future library extender need
    >to add some other kind of collect-like function?


    The collect does assume that the function takes 2 parameters and the
    logic is to apply to the result of each call and the next item in the
    list.

    The ony more general way I can think of making this would be to do
    inspection on the function to determine the number. To make a general
    algorithm that does something like:

    let n = number of elements a given function f takes

    apply n elements from the list to the function
    given a result of the function r, apply [r, *(n-1)] recursively to the
    function.

    def f(a, b, c):
    return (b+c)*a

    map_col([1, 2, 3, 4, 5], f) => 45

    However, I don't know how applicable that would be in this context,
    since this is a boolean system and in the simple binary setup, it makes
    sense to return the last operand to the beginning of the next
    evaluation. In a more general case like this, the value of the
    evalution would be the most useful, not the boolean truth value.

    I do have one other generic rountine besides the booleans and collect
    collectors which is called check:

    def check(fields, test):
    """ check(list fields, func test) -> rule
    """
    def rule(record):
    values = [record[field] for field in fields]
    return test(*values)
    return rule

    it basically lets you pull out whatever fields you want and pass them
    to the provided function.

    def do_complicated_test(a, b, tolerance=0.001):
    return a - ((a+b)/b) < a*tolerance

    rule = check(('x', 'y'), do_complicated_test)

    Now that I think of it, I should probably swap this to be
    check(do_complicated_test, 'x', 'y')

    When I first wrote this thing it was 3 stage construction, not 2. I
    wanted to abstract out specific tests and then JUST apply fields to
    them in the actual rules for the business code. I then figured out if I
    standardize having the fields as the last arguments on the check
    routines, then I could use a simple curry technique to build the
    abstraction.

    from functional import curry

    comp_rule = curry(check)(do_complicated_test)

    Then in the actual business code, can use it like this:

    comp_rule(('a', 'b'))

    Or:

    ssn_rule = curry(match)(is_ssn)

    ssn_rule('my-social-security-field')

    For this to work properly and easily for the business logic. I do need
    to get down some more solid rule invocation standards.


    >>(have('length'), have('width')),
    >> check(['length', 'width'], lambda x, y: x == y))
    >>assert rule({'length' : '2', 'width' : '2'}) == True
    >>assert rule({'length' : '2', 'width' : '1'}) == False
    >>assert rule({'length' : '1', 'width' : '2'}) == False

    >
    >But what about when the "when" clause says the rule does not apply?
    >Maybe return NotImplemented, (which passes as True in an if test) e.g.,


    I don't really see why a special exception needs to be here because
    "when" easily falls into the logic. When is basically the truth
    function, "if A therefore B" and when(A, B) => NOT A OR B

    If A is false, then the expression is always true.

    def when(predicate, expr):
    """ when(rule predicate, rule expr) -> rule
    """
    def rule(record):
    if predicate(record):
    return expr(record)
    else:
    return True
    return rule

    So if the "test" fails, then the expression is True OR X, which is
    always True.

    Or, form another POV, the entire system is based on conjunction.
    Returning "true" means that it doesn't force the entire validation
    False. When the predicate fails on the When expression, then the
    "therefore" expression does not matter because the predicate failed.
    Therefore, the expresison is true no matter what.

    Trying not to be condesending, I just think that the typical
    propositional logic works well in this case.
    Because this fits well into the system, can you give me an example why
    this should make an exception to the boolean logic and raise an
    exception?

    The inline code generation is an interesting concept. It would
    definately run faster. Hmm, I did this little test.

    This one does arbitrary recursion calls

    >>> def doit(v, i):

    .... if not i:
    .... return v
    .... else:
    .... return doit(v, i-1)

    # Simple Adding while stressing function calls
    >>> def run(times, depth):

    .... for i in range(times):
    .... doit(i, depth/2) + doit(i, depth/2)

    # benchmark
    >>> def ezrun(times):

    .... for i in range(times):
    .... i + i

    # timer
    >>> def efunc(f, *a, **k):

    .... s = time.time()
    .... f(*a, **k)
    .... return time.time() - s
    ....

    >>> efunc(run, 10000, 8)

    0.061595916748046875
    >>> efunc(run, 10000, 4)

    0.038624048233032227
    >>> efunc(ezrun, 10000)

    0.0024700164794921875
    >>> efunc(run, 1000, 4)

    0.0038750171661376953
    >>> efunc(ezrun, 1000)

    0.00024008750915527344
    >>> efunc(run, 1000, 2)

    0.002780914306640625
    >>> efunc(ezrun, 1000)

    0.00023913383483886719
    >>>


    Even in the best case where there are 2 function calls per evaluation,
    then it's at least 10^1 slower.

    However, given a more realistic example with the kind of domain I'm
    going to apply this to:

    Hmm, I'd say that for practical application a record with 500 different
    fields with an average of 3 rules per field is a common case upper
    bound. using the product of that for a profile:

    >>> efunc(run, 1500, 4)

    0.0058319568634033203
    >>> efunc(ezrun, 1500)

    0.00034809112548828125
    >>>


    I can probably live with those results for now, though optimiziation in
    the future is always an option :)

    >>>> jslowery.rule({'length':2, 'width':2})

    > True
    > >>> jslowery.rule({'length':2, 'width':1})

    > False
    > >>> jslowery.rule({'height':2, 'width':1})

    > NotImplemented
    > >>> jslowery.box_rule({'height':2, 'width':1})

    > NotImplemented
    > >>> jslowery.box_rule({'length':2, 'width':1})

    > False
    > >>> jslowery.box_rule({'length':2, 'width':2})

    > True


    You know, the first implementation of this thing I did had the fields
    seperated from the complete rules. When I realized that this was just a
    basic logic system, it became a problem having the fields seperated
    from the rules.

    In practice, I'm looking to use this for easily collecting many rules
    together and applying them to a data set.

    def test_validator2(self):
    from strlib import is_ssn
    rules = [
    have('first_name'),
    have('last_name'),
    all(have('ssn'), match(is_ssn, 'ssn')),
    when(all(have('birth_date'), have('hire_date')),
    lt('birth_date', 'hire_date'))
    ]
    v = validator(rules)
    data = {'first_name' : 'John',
    'last_name' : 'Smith',
    'ssn' : '123-34-2343',
    'birth_date' : 0, 'hire_date' : 0}

    assert v(data) == []

    data = {'first_name' : 'John',
    'last_name' : 'Smith',
    'ssn' : '12-34-2343',
    'birth_date' : 0, 'hire_date' : 0}
    assert v(data) == [rules[2]]


    The thing I'm wrestling with now is the identity of rules and fields.
    I'd like to fully support localization. Making a localization library
    that will provide string tables for "error messages". Could use
    positions to do this.

    rules = [
    have('first_name'),
    have('last_name'),
    all(have('ssn'), match(is_ssn, 'ssn')),
    when(all(have('birth_date'), have('hire_date')), lt('birth_date',
    'hire_date'))
    ]

    msgs = [
    locale('MISSING_VALUE', 'FIRST_NAME'),
    locale('MISSING_VALUE', 'LAST_NAME'),
    locale('INVALID_SSN'),
    locale('INVALID_BIRTH_HIRE_DATE'),
    ]

    # Could use a functional construction to get all of the locale
    references out of here if wanted

    and then use a generic routine to join them:

    error_map = assoc_list(rules, msgs)

    errors = validate(record)

    for e in errors:
    print locale.lookup(error_map[e])
     
    , Dec 17, 2005
    #14
  15. Guest

    >>def lt(*fields):
    >> return collect(fields, lambda x, y: x < y)


    >>def gt(*fields):
    >> return collect(fields, lambda x, y: x > y)


    >>def gte(*fields):
    >> """ gte(field, ...) -> rule
    >> """
    >> return collect(fields, lambda x, y: x >= y)


    >>etc...


    >DRY ? ;-)


    Do you mean by the doc string or the fact that I'm having to redeclare
    built-in operations?
     
    , Dec 17, 2005
    #15
  16. On 16 Dec 2005 16:55:52 -0800, "" <> wrote:
    [...]

    >When I first wrote this thing it was 3 stage construction, not 2. I
    >wanted to abstract out specific tests and then JUST apply fields to
    >them in the actual rules for the business code. I then figured out if I
    >standardize having the fields as the last arguments on the check
    >routines, then I could use a simple curry technique to build the
    >abstraction.
    >
    >from functional import curry

    I'm not familiar with that module, but I wrote a byte-code-munging curry
    as a decorator that actually modified the decorated function to eliminate
    parameter(s) from the signature and preset the parameter values inside the code.
    I mention this because pure-python currying that I've seen typically creates a wrapper
    function that calls the original function, and slows things down with the extra calling
    instead of speeding things up, at least until some future version of python.
    Looks nice on the surface though.

    >
    >comp_rule = curry(check)(do_complicated_test)
    >
    >Then in the actual business code, can use it like this:
    >
    >comp_rule(('a', 'b'))
    >
    >Or:
    >
    >ssn_rule = curry(match)(is_ssn)
    >
    >ssn_rule('my-social-security-field')
    >
    >For this to work properly and easily for the business logic. I do need
    >to get down some more solid rule invocation standards.
    >
    >
    >>>(have('length'), have('width')),
    >>> check(['length', 'width'], lambda x, y: x == y))
    >>>assert rule({'length' : '2', 'width' : '2'}) == True
    >>>assert rule({'length' : '2', 'width' : '1'}) == False
    >>>assert rule({'length' : '1', 'width' : '2'}) == False

    >>
    >>But what about when the "when" clause says the rule does not apply?
    >>Maybe return NotImplemented, (which passes as True in an if test) e.g.,

    >
    >I don't really see why a special exception needs to be here because
    >"when" easily falls into the logic. When is basically the truth
    >function, "if A therefore B" and when(A, B) => NOT A OR B
    >

    I was suggesting a distinction between testing and production failure detection.
    The latter of course can treat not-applicable as logically the same as
    applicable-and-did-not-fail.

    Note that NotImplemented is not the same as NotImplementedError -- and I wasn't suggesting
    raising an exception, just returning a distinguishable "True" value, so that
    a test suite (which I think you said the above was from) can test that the "when"
    guard logic is working vs just passing back a True which can't distinguish whether
    the overall test passed or was inapplicable.

    Of course, to assert a did-not-fail condition, bool(NotImplemented) tests as True,
    so the assertion (if that's what you want to do) would be written

    assert rule(...), 'You only see this if rule was applicable and did not pass'
    # since assert NotImplented succeeds and assert True succeeds
    # w/o raising AssertionError and False means rule was applicable and failed.

    for testing purposes you can distinguish results:

    assert rule(...) is True, 'You only see this if rule was not applicable or test failed'
    assert rule(...) is False, 'You only see this if rule was not applicable or test succeeded'
    assert rule(...) is NotImplemented, 'You see this if rule WAS applicable, and either passed or failed

    OTOH, I can see returning strictly True or False. It's a matter of defining the semantics.

    >If A is false, then the expression is always true.
    >
    >def when(predicate, expr):
    > """ when(rule predicate, rule expr) -> rule
    > """
    > def rule(record):
    > if predicate(record):
    > return expr(record)
    > else:
    > return True
    > return rule
    >
    >So if the "test" fails, then the expression is True OR X, which is
    >always True.

    Sure.

    >
    >Or, form another POV, the entire system is based on conjunction.
    >Returning "true" means that it doesn't force the entire validation
    >False. When the predicate fails on the When expression, then the
    >"therefore" expression does not matter because the predicate failed.
    >Therefore, the expresison is true no matter what.
    >
    >Trying not to be condesending, I just think that the typical
    >propositional logic works well in this case.
    >Because this fits well into the system, can you give me an example why
    >this should make an exception to the boolean logic and raise an
    >exception?

    First, I didn't suggest raising an exception, I suggested returning an alternate
    value (NotImplemented) that python also considers logically true in an "assert value"
    or "if value" context, so it could be distinguised, if desired, in a test suite (as I believe
    you said your snippets were from).

    Of course if you want to write "assert expr == True" as opposed to
    just "assert expr", you are writing a narrower assertion. Note:

    >>> assert NotImplemented, 'assertion failed'
    >>> assert True, 'assertion failed'
    >>> assert False, 'assertion failed'

    Traceback (most recent call last):
    File "<stdin>", line 1, in ?
    AssertionError: assertion failed

    If your rule was returning NotImplemented as a "true" value, you wouldn't use
    the narrowed assertion in a production context, you'd just use the normal
    bare python-truth assertion. I.e., (using the dummy rule to illustrate)

    >>> def rulereturning(x): return x

    ...

    you would not write

    >>> assert rulereturning(NotImplemented)==True, 'assertion failed'

    Traceback (most recent call last):
    File "<stdin>", line 1, in ?
    AssertionError: assertion failed

    You would write
    >>> assert rulereturning(NotImplemented), 'assertion failed'


    (That passed)

    BTW, note that unless you use 'is', expr==True is not as narrow as you might think:
    >>> assert 1.0==True, 'assertion failed'

    or
    >>> assert rulereturning(1.0)==True, 'assertion failed'


    (both passed sans exception)


    >The inline code generation is an interesting concept. It would
    >definately run faster. Hmm, I did this little test.
    >

    [...]
    BTW, is this data coming from an actual DBMS system? If so, wouldn't
    there be native facilities to do all this validation? Or is Python
    just too much more fun than SQL? ;-)
    [...]
    >
    >The thing I'm wrestling with now is the identity of rules and fields.
    >I'd like to fully support localization. Making a localization library
    >that will provide string tables for "error messages". Could use
    >positions to do this.

    IWT dict keys would be more robust. Maybe just use the english string as the
    key for looking up alternates? BTW strings could also name data to be
    formatted, as in '%(this)s and %(that_num)d' % mappingobject,
    where mappingobject can be anything that looks up values per
    mappingobject.__getitem__(name) where name would be 'this' or 'that' here.

    >
    >rules = [
    > have('first_name'),
    > have('last_name'),
    > all(have('ssn'), match(is_ssn, 'ssn')),
    > when(all(have('birth_date'), have('hire_date')), lt('birth_date',
    >'hire_date'))
    >]
    >
    >msgs = [
    > locale('MISSING_VALUE', 'FIRST_NAME'),
    > locale('MISSING_VALUE', 'LAST_NAME'),
    > locale('INVALID_SSN'),
    > locale('INVALID_BIRTH_HIRE_DATE'),
    >]

    Not to distract you, but I'm wondering what your rules would look like represented in simple
    rule-per-line text, if you assume that all names are data base field names
    unless otherwise declared, and just e.g. "name!" would be short for "have(name)",
    then just use some format like (assuming 'rules' is a name for a particular group of rules)

    functions:: "is_ssn" # predeclare user-defined function names
    operators:: "and", "<" # might be implicit

    rules: first_name!; last_name!; ssn! and is_ssn(ssn); birth_date! and hire_date! and (birth_date < hire_date)

    or with identation, and messages after a separating comma a la assert

    rules:
    first_name!, 'Missing first name' # or ('MISSING_VALUE', 'FIRST_NAME') tuple as args for your locale thing
    last_name!, 'Missing last name'
    ssn! and is_ssn(ssn), 'Invlaid ssn' # => 'ssn' in record and is_ssn(record['ssn']
    birth_date! and hire_date! and (birth_date < hire_date) # => 'birtdate' in record and 'hire_date' in record and
    # record['birth_date'] < record['hire_date']
    more_rules:
    field_name!, 'field name is missing'
    etc ...

    Seem like it would be fairly easy to translate to code in a function per rule group,
    without any fancy parsing.
    I don't like XML that much, but that might be a possible representation too.

    Regards,
    Bengt Richter
     
    Bengt Richter, Dec 17, 2005
    #16
  17. Guest

    >Note that NotImplemented is not the same as NotImplementedError -- and I wasn't suggesting
    >raising an exception, just returning a distinguishable "True" value, so that
    >a test suite (which I think you said the above was from) can test that the "when"
    >guard logic is working vs just passing back a True which can't distinguish whether
    >the overall test passed or was inapplicable.



    Ahh, sorry about that. I got ya. that's not a bad idea.

    >BTW, is this data coming from an actual DBMS system? If so, wouldn't
    >there be native facilities to do all this validation? Or is Python
    >just too much more fun than SQL? ;-)


    Well, the "first usage" is going to come from a RDBMS, yeap. We use
    MySQL 4.1.x right now. I know with Oracle/MSSQL, it's easy to put all
    kinds of fancy constraints on data. I'm aware of no such facilities in
    MySQL though I would love it if there was one to apply these kinds of
    flexible validation routines.

    Worst case scenario I can use it to do testing of what is pulled out of
    the database :)

    >Not to distract you, but I'm wondering what your rules would look like represented in simple
    >rule-per-line text, if you assume that all names are data base field names
    >unless otherwise declared, and just e.g. "name!" would be short for "have(name)",
    >then just use some format like (assuming 'rules' is a name for a particular group of rules)


    > functions:: "is_ssn" # predeclare user-defined function names
    > operators:: "and", "<" # might be implicit


    > rules: first_name!; last_name!; ssn! and is_ssn(ssn); birth_date! and hire_date! and (birth_date < hire_date)


    >or with identation, and messages after a separating comma a la assert


    That's really a great idea!

    When I was trying to figure out how to combine these rules and figure
    out what kind of flexibility I wanted, I actually did write some
    declarative strings. It was a lot more verbose than what you wrote, but
    it helped with the concepts. Your approach is good.

    My little pseduo-lang was like:

    REQUIRE last_name or first_name
    WHEN HAVE last_name, surname THEN REQUIRE last_name != surname



    Well, I didn't go on and work with the error strings. I decided that
    this thing wouldn't be very useful without being able to work with
    hierarchical data.

    So I abstracted out the idea of referencing the record by key and
    implemented a little "data resolution" protocol using PLY. I like the
    protocol because I think it will be something that can be reused fairly
    often, especially for mini-langs.

    def testNestedData(self):
    data = {
    'first_name' : 'John',
    'last_name' : 'Smith',
    'surname' : '',
    'phone' : '205-442-0841',
    'addresses' : {
    'home' : {
    'street' : '123 Elm St',
    'city' : 'Birmingham',
    'state' : 'AL',
    'postal' : '35434'
    },
    'work' : {
    'street' : '101 Century Square',
    'city' : 'Centre',
    'state' : 'AL',
    'postal' : '35344'
    }
    },
    'aliases' : [
    'John Smith',
    'Frank Beanz'
    ]
    }

    rule = when(have('addresses.work'), all(
    have('addresses.work.city'),
    have('addresses.work.state'),
    have('addresses.work.postal')))

    assert repr(rule(data))
    data['addresses']['work']['city'] = ''
    assert rule(data) == False
    data['addresses']['work']['city'] = 'Centre'


    It works the same as the rules do:

    def match(test, field):
    """ match uses a function that takes a single string and returns a
    boolean
    and converts it into a validation rule.
    """
    field = drep(field)
    def rule(record):
    value = field(record)
    if value:
    return bool(test(value))
    else:
    return True
    return rule

    def check(test, fields):
    """ check uses a function that takes a variable arguments and
    applies the given
    fields to that function for the test.
    """
    fields = [drep(field) for field in fields]
    def rule(record):
    values = [field(record) for field in fields]
    return test(*values)
    return rule


    The data resolution algorithm uses dots and square braces. Dot's
    resolve on dicts and square braces resolve on lists. It doesn't support
    slicing, but it easily could. There is a special [:] notation that
    turns the return into a list. all rules after the [:] are then applied
    successively to each list entry.

    def testDeep(self):
    data = {
    'a' : 'eh-yah',
    'b' : {'x' : 'xylophone',
    'y': 'yo-yo',
    'z': 'zebra',
    'A': [
    1, 2, 3, 4
    ]}
    }
    assert drep('b')(data) == {'y': 'yo-yo', 'x': 'xylophone', 'z':
    'zebra', 'A': [1, 2, 3, 4]}
    assert drep('b.x')(data) == 'xylophone'
    assert drep('b.A[:]')(data) == [1, 2, 3, 4]
    assert drep('b.A')(data) == [1, 2, 3, 4]
    assert drep('b.A[1]')(data) == 2


    def testListResolution(self):
    data = {"foo" : [
    {"bar" : 1},
    {"bar" : 2},
    {"bar" : 3}
    ]}
    assert drep("foo[:].bar")(data) == [1, 2, 3]
    assert drep("foo[2].bar")(data) == 3

    I know the code COULD also use some funky thing to break apart dicts as
    well, but I didn't think of that when I was writing the lexer/parser.

    Combining this with a little mini-language like the one you described
    would be nice.
     
    , Dec 17, 2005
    #17
  18. Guest

    >>from functional import curry

    >I'm not familiar with that module, but I wrote a byte-code-munging curry
    >as a decorator that actually modified the decorated function to eliminate
    >parameter(s) from the signature and preset the parameter values inside the code.
    >I mention this because pure-python currying that I've seen typically creates a wrapper
    >function that calls the original function, and slows things down with the extra calling
    >instead of speeding things up, at least until some future version of python.
    >Looks nice on the surface though.


    Sounds very useful. Is this open source? All of the curry functions
    I've seen in the wild use a wrapper as well. I don't care for this
    either mainly because it hides away the function attributes and when I
    use decorators on functions that mudge attributes, it doesn't work.
     
    , Dec 17, 2005
    #18
  19. On 17 Dec 2005 01:16:31 -0800, "" <> wrote:

    >>>from functional import curry

    >
    >>I'm not familiar with that module, but I wrote a byte-code-munging curry
    >>as a decorator that actually modified the decorated function to eliminate
    >>parameter(s) from the signature and preset the parameter values inside the code.
    >>I mention this because pure-python currying that I've seen typically creates a wrapper
    >>function that calls the original function, and slows things down with the extra calling
    >>instead of speeding things up, at least until some future version of python.
    >>Looks nice on the surface though.

    >
    >Sounds very useful. Is this open source? All of the curry functions
    >I've seen in the wild use a wrapper as well. I don't care for this
    >either mainly because it hides away the function attributes and when I
    >use decorators on functions that mudge attributes, it doesn't work.
    >

    The curry part works like:
    (first unmodified)

    >>> import inspect
    >>> def foo(x, y): return x*y

    ...
    >>> import dis
    >>> dis.dis(foo)

    1 0 LOAD_FAST 0 (x)
    3 LOAD_FAST 1 (y)
    6 BINARY_MULTIPLY
    7 RETURN_VALUE
    >>> inspect.getargspec(foo)

    (['x', 'y'], None, None, None)

    (modified)
    >>> from ut.presets import presets, curry
    >>> @curry(y=111)

    ... def foo(x, y): return x*y
    ...
    >>> dis.dis(foo)

    1 0 LOAD_CONST 1 (111)
    3 STORE_FAST 1 (y)

    3 6 LOAD_FAST 0 (x)
    9 LOAD_FAST 1 (y)
    12 BINARY_MULTIPLY
    13 RETURN_VALUE
    >>> inspect.getargspec(foo)

    (['x'], None, None, None)

    The presets decorator does the same except doesn't modify the signature
    (it came first ;-)

    >>> def foo():

    ... return now()
    ...
    >>> dis.dis(foo)

    2 0 LOAD_GLOBAL 0 (now)
    3 CALL_FUNCTION 0
    6 RETURN_VALUE
    >>> import time
    >>> @presets(now=time.ctime)

    ... def foo():
    ... return now()
    ...
    >>> dis.dis(foo)

    1 0 LOAD_CONST 1 (<built-in function ctime>)
    3 STORE_FAST 0 (now)

    3 6 LOAD_FAST 0 (now)
    9 CALL_FUNCTION 0
    12 RETURN_VALUE
    >>> foo()

    'Sat Dec 17 02:49:06 2005'

    Humph. Good night ;-/

    I googled and found a post of mine, which I think may be my last version, at

    http://groups.google.com/group/comp.lang.python/msg/7d27a53385e89924

    where you can look to see if it's interesting. I think it's probably more fruitful
    to put work into doing it by way of rewriting the AST, which would potentially
    allow folding constant expressions that may be formed when parameters
    become constants, etc. So I haven't pursued byte code munging much further.
    The version above only had minor testing, and only on python 2.4. IOW, basically
    a proof of concept.

    I based my presets/curry decorator on the pattern of a decorator Raymond Hettinger wrote,
    which optimizes global accesses and other stuff. No warranties, mind! ;-)

    Regards,
    Bengt Richter
     
    Bengt Richter, Dec 17, 2005
    #19
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Matt Knepley

    Lexical Scope

    Matt Knepley, Oct 30, 2003, in forum: Python
    Replies:
    3
    Views:
    441
    Paul Clinch
    Oct 30, 2003
  2. globalrev

    python: lexical or dynamic scope?

    globalrev, May 13, 2008, in forum: Python
    Replies:
    3
    Views:
    903
    Mark Wooding
    May 14, 2008
  3. mrstevegross

    Weird behavior with lexical scope

    mrstevegross, Nov 6, 2008, in forum: Python
    Replies:
    8
    Views:
    260
    Lawrence D'Oliveiro
    Nov 7, 2008
  4. Xah Lee
    Replies:
    0
    Views:
    2,285
    Xah Lee
    Feb 26, 2009
  5. Andrew Savige

    Lexical scope: converting Perl to Python

    Andrew Savige, Jun 13, 2009, in forum: Python
    Replies:
    3
    Views:
    461
    John S
    Jun 13, 2009
Loading...

Share This Page