Lazy evaluation: overloading the assignment operator?

Discussion in 'Python' started by sturlamolden, May 2, 2007.

  1. sturlamolden

    sturlamolden Guest

    Python allows the binding behaviour to be defined for descriptors,
    using the __set__ and __get__ methods. I think it would be a major
    advantage if this could be generalized to any object, by allowing the
    assignment operator (=) to be overloaded.

    One particular use for this would be to implement "lazy evaluation".
    For example it would allow us to get rid of all the temporary arrays
    produced by NumPy.

    For example, consider the expression:

    y = a * b + c * d

    If this expression is evaluated bya Fortran 90/95 compiler, it will
    automatically generate code like

    do i = 1,n
    y(i) = a(i) * b(i) + c(i) * d(i)
    enddo

    On the other hand, conventional use of overloaded binary operators
    would result in something like this:

    allocate(tmp1,n)
    do i = 1,n
    tmp1(i) = a(i) * b(i)
    enddo
    allocate(tmp2,n)
    do i = 1,n
    tmp2(i) = c(i) * d(i)
    enddo
    allocate(tmp3,n)
    do i = 1,n
    tmp3(i) = tmp1(i) + tmp2(i)
    enddo
    deallocate(tmp1)
    deallocate(tmp2)
    do i = 1,n
    y(i) = tmp3(i)
    enddo
    deallocate(tmp3)

    Traversing memory is one of the most expensive thing a CPU can do.
    This approach is therefore extremely inefficient compared with what a
    Fortran compiler can do.

    However, if we could overload the assignment operator, there would be
    an efficient solution to this problem. Instead of constructing
    temporary temporary arrays, one could replace those with objects
    containing lazy expressions "to be evaluated sometime in the future".
    A statement like

    y = a * b + c * d

    would then result in something like this:

    tmp1 = LazyExpr('__mul__',a,b) # symbolic representation of "a * b"
    tmp2 = LazyExpr('__mul__',c,d) # symbolic representation of "c * d"
    tmp3 = LazyExpr('__add__',tmp1,tmp1) # symbolic "a * b + c * d"
    del tmp1
    del tmp2
    y = tmp3 # tmp3 gets evaluated as assignment is overloaded


    Should there be a PEP to overload the assignment operator? In terms of
    syntax, it would not be any worse than the current descriptor objects
    - but it would make lazy evaluation idioms a lot easier to implement.



    Sturla Molden
     
    sturlamolden, May 2, 2007
    #1
    1. Advertising

  2. sturlamolden

    Stargaming Guest

    sturlamolden wrote:
    > Python allows the binding behaviour to be defined for descriptors,
    > using the __set__ and __get__ methods.


    AFAIK, __getattribute__ calls them *explicitly*.

    > I think it would be a major
    > advantage if this could be generalized to any object, by allowing the
    > assignment operator (=) to be overloaded.


    >
    > One particular use for this would be to implement "lazy evaluation".
    > For example it would allow us to get rid of all the temporary arrays
    > produced by NumPy.
    >
    > For example, consider the expression:
    >

    [snip]
    >
    > y = a * b + c * d
    >
    > would then result in something like this:
    >
    > tmp1 = LazyExpr('__mul__',a,b) # symbolic representation of "a * b"
    > tmp2 = LazyExpr('__mul__',c,d) # symbolic representation of "c * d"
    > tmp3 = LazyExpr('__add__',tmp1,tmp1) # symbolic "a * b + c * d"
    > del tmp1
    > del tmp2
    > y = tmp3 # tmp3 gets evaluated as assignment is overloaded
    >

    To allow lazy evaluation, you need overloading of the assignment
    operator? Where should you overload it? y is less than None when you do
    that assignment. I don't really see the need for overloading here.
    Following the binding rules, __mul__ would (even without any hackery) be
    evaluated before __add__.

    >
    > Should there be a PEP to overload the assignment operator?


    If -- after this discussion -- community seems to like this feature, you
    could try to come up with some patch and a PEP. But not yet.

    > In terms of
    > syntax, it would not be any worse than the current descriptor objects
    > - but it would make lazy evaluation idioms a lot easier to implement.


    --
    Stargaming
     
    Stargaming, May 2, 2007
    #2
    1. Advertising

  3. sturlamolden schrieb:
    > Python allows the binding behaviour to be defined for descriptors,
    > using the __set__ and __get__ methods. I think it would be a major
    > advantage if this could be generalized to any object, by allowing the
    > assignment operator (=) to be overloaded.
    >
    > One particular use for this would be to implement "lazy evaluation".
    > For example it would allow us to get rid of all the temporary arrays
    > produced by NumPy.
    >
    > For example, consider the expression:
    >
    > y = a * b + c * d
    >
    > If this expression is evaluated bya Fortran 90/95 compiler, it will
    > automatically generate code like
    >
    > do i = 1,n
    > y(i) = a(i) * b(i) + c(i) * d(i)
    > enddo
    >
    > On the other hand, conventional use of overloaded binary operators
    > would result in something like this:
    >
    > allocate(tmp1,n)
    > do i = 1,n
    > tmp1(i) = a(i) * b(i)
    > enddo
    > allocate(tmp2,n)
    > do i = 1,n
    > tmp2(i) = c(i) * d(i)
    > enddo
    > allocate(tmp3,n)
    > do i = 1,n
    > tmp3(i) = tmp1(i) + tmp2(i)
    > enddo
    > deallocate(tmp1)
    > deallocate(tmp2)
    > do i = 1,n
    > y(i) = tmp3(i)
    > enddo
    > deallocate(tmp3)
    >
    > Traversing memory is one of the most expensive thing a CPU can do.
    > This approach is therefore extremely inefficient compared with what a
    > Fortran compiler can do.


    I fail to see where laziness has anything to do with this. In C++, this
    problem can be remedied with the so called temporary base class idiom.

    But this has nothing to do with laziness, which does not reduce the
    amount of code to execute, but instead defers the point of execution of
    that code.

    And AFAIK the general overhead of laziness versus eager evaluation does
    not pay off - haskell is a tad slower than e.g. an ML dialect AFAIK.

    Diez
     
    Diez B. Roggisch, May 2, 2007
    #3
  4. sturlamolden

    sturlamolden Guest

    On May 2, 9:46 pm, Stargaming <> wrote:

    > > del tmp2


    > > y = tmp3 # tmp3 gets evaluated as assignment is overloaded

    >
    > To allow lazy evaluation, you need overloading of the assignment
    > operator?


    No I don't. I could for example delay evaluation until some data from
    y is requested, say when

    x = y[5]

    is executed. However, that can allow mutual dependencies between
    unevaluated expressions. These can be complicated to resolve, and is
    an issue similar to to that of cyclic dependencies in reference
    counting. With an overloaded assignment operator, we would avoid this
    as assignments are natural places to flush lazy evaluations. No
    dangling unevaluated expression would be produced, and thus there
    would be no strange bugs of this sort.


    > Where should you overload it? y is less than None when you do
    > that assignment.


    In this case, I am not suggesting overloading

    y =

    but rather overloading

    = tmp3

    That is, when a variable is bound to an object, a method is called
    (e.g. __get__) and the variable gets the return value output from that
    function instead. It is analogous to the __get__ method of
    descriptors. COnsider happens when we call

    a = someObject.someProperty

    and someProperty has a __get__ method. Even if a is less than None, we
    still get a call to __get__.

    On the other hand, if y had been bound to a value before hand, it
    would be meaningful to call a method called __set__ on y when

    y = tmp3

    is executed. Just like

    someObject.someProperty = value

    would call __set__ on someProperty if it had one. Obviously if
    someObject.someProperty had been unbound, there would have been no
    call to __set__.

    So I am suggesting generalising __set__ and __get__ to overload the
    assignment operator.

    This would be an example:


    class Foo(object):

    def __init__(self):
    self.value = None

    def __set__(self,value):
    ''' overloads bar = value after bar = Foo()'''
    self.value = value

    def __get__(self):
    ''' overloads obj = bar after bar = Foo()'''
    return self.value


    So it is just a generalization of the already existing descriptors. It
    even makes descriptors and properties easier to understand.


    > I don't really see the need for overloading here.


    > Following the binding rules, __mul__ would (even without any hackery) be
    > evaluated before __add__.


    Yes, but at the cost of generating several temporary arrays and
    looping over the same memory several times. If the assignment operator
    could be overloaded, we could avoid the temporary objects and only
    have one loop. For numerical code, this can mean speed ups in the
    order of several magnitudes.



    Sturla Molden











    >
    >
    > > Should there be a PEP to overload the assignment operator?

    >
    > If -- after this discussion -- community seems to like this feature, you
    > could try to come up with some patch and a PEP. But not yet.
    >
    > > In terms of
    > > syntax, it would not be any worse than the current descriptor objects
    > > - but it would make lazy evaluation idioms a lot easier to implement.

    >
    > --
    > Stargaming
     
    sturlamolden, May 2, 2007
    #4
  5. sturlamolden

    sturlamolden Guest

    On May 2, 11:08 pm, "Diez B. Roggisch" <> wrote:

    > And AFAIK the general overhead of laziness versus eager evaluation does
    > not pay off - haskell is a tad slower than e.g. an ML dialect AFAIK.


    In the numerical Python community there is already a prototype
    compiler called 'numexpr' which can provide efficient evaluation of
    expressions like y = a*b + c*d. But as long as there is no way of
    overloading an assignment, it cannot be seamlessly integrated in an
    array framework. One will e.g. have to type up Python expressions as
    strings and calling eval() on the string instead of working directly
    with Python expressions.

    In numerical work we all know how Fortran compares with C++. Fortran
    knows about arrays and can generate efficient code. C++ doesn't and
    have to resort to temporaries returned from overloaded operators. The
    only case where C++ can compare to Fortran is libraries like Blitz++,
    where for small fixes-sized arrays the temporary objects and loops can
    be removed using template meta-programming and optimizing compilers.
    NumPy has to generate a lot of temporary arrays and traverse memory
    more than necessary. This is a tremendous slow down when arrays are
    too large to fit in the CPU cache. Numexpr deals with this, but Python
    cannot integrate it seamlessly. I think it is really a matter of what
    you are trying to do. Some times lazy evaluation pays off, some times
    it doesn't.

    But overloaded assignment operators have more use than lazy
    evaluation. It can be used and abused in numerous ways. For example
    one can have classes where every assignment results in the creation of
    a copy, which may seem to totally change the semantics of Python code
    (except that it doesn't, it's just an illusion).


    Sturla Molden
     
    sturlamolden, May 2, 2007
    #5
  6. sturlamolden

    Terry Reedy Guest

    "sturlamolden" <> wrote in message
    news:...
    |
    | Python allows the binding behaviour to be defined for descriptors,
    | using the __set__ and __get__ methods. I think it would be a major
    | advantage if this could be generalized to any object, by allowing the
    | assignment operator (=) to be overloaded.

    Conceptually, assignment is *not* an operator. Binary operators take two
    values (of the same type) and produce a third (usually of either the input
    type or a boolean) that usually depends on both inputs. Assignment does
    nothing of the sort.

    In Python, the equal sign is *not* an operator: it is a grammatical symbol.
    One use of the operator fiction in C is to enable simple chained
    assignment: a=b=2. Python does this directly without the fiction. C's
    store-and-test usage can be implemented in Python with a 'purse' class.

    | One particular use for this would be to implement "lazy evaluation".

    Since (in Python, at least) operands are evaluated *before* the
    operator/function is called, I do not see how.

    | Should there be a PEP to overload the assignment operator?

    You mean a PEP to make assignment an (pseudo)operation and hence
    overloadable (as all operators are). That has been proposed and rejected
    before, more than once, but I don't have a reference handy.

    Terry Jan Reedy
     
    Terry Reedy, May 3, 2007
    #6
  7. Diez B. Roggisch wrote:
    > I fail to see where laziness has anything to do with this.
    > In C++, this problem can be remedied with the so called
    > temporary base class idiom.


    I have seen this referred to as lazy evaluation in C++,
    so I suspect that Diez and Sturia are using "Lazy evaluation"
    in different contexts with different meaning.

    > But this has nothing to do with laziness, which does not
    > reduce the amount of code to execute, but instead defers the
    > point of execution of that code.


    But that is precisely what Sturia is suggesting, defer
    (for a few nanoseconds) the evaluation of the multiplications
    and addition until the assignment occurs. Admittedly a big
    difference to the lazy evaluation implied by python's yield
    statement, but still a version of lazy evaluation and (at
    least sometimes) referred to as such in a C++ context.

    I am a python newbie (about one month) but I think
    some of what Sturia wants could be achieved by partially
    following what is usually done in C++ to achieve what he
    wants. It would involve a replacement array class (possibly
    derived from NumPy's arrays) and a proxy class.

    + Addition, multiplication, etc of arrays and proxy
    arrays does not return the result array, but returns
    a proxy which stores the arguments and the
    operation.

    + Array indexing of the proxy objects results in
    the indexing methods of the arguments being
    called and the operation being carried out and
    returned. In C++ this is normally very efficient
    as the operations are all defined inline and
    expanded by the compiler.

    + If necessary, define an additional method to evaluate
    the entire array and return it.

    I think this would allow code like (if the new array type is
    XArray)

    a = Xarray(...)
    b = Xarray(...)
    c = Xarray(...)
    d = Xarray(...)

    y = a*b+c*d # Returns a proxy object

    x = y[4] # Computes x = a[4]*b[4] + c[4]*d[4]

    v = y.eval() # Evaluates all elements, returning Xarray

    z = ((a+b)*(c+d)).eval() # Also evaluates all elements

    Whether it would be any faster is doubtful, but it would eliminate
    the temporaries.

    Charles
     
    Charles Sanders, May 3, 2007
    #7
  8. On 2007-05-03, Terry Reedy <> wrote:
    >
    > "sturlamolden" <> wrote in message
    > news:...
    >|
    >| Python allows the binding behaviour to be defined for descriptors,
    >| using the __set__ and __get__ methods. I think it would be a major
    >| advantage if this could be generalized to any object, by allowing the
    >| assignment operator (=) to be overloaded.
    >
    > Conceptually, assignment is *not* an operator. Binary operators take two
    > values (of the same type) and produce a third (usually of either the input
    > type or a boolean) that usually depends on both inputs. Assignment does
    > nothing of the sort.
    >
    > In Python, the equal sign is *not* an operator: it is a grammatical symbol.
    > One use of the operator fiction in C is to enable simple chained
    > assignment: a=b=2. Python does this directly without the fiction. C's
    > store-and-test usage can be implemented in Python with a 'purse' class.
    >
    >| One particular use for this would be to implement "lazy evaluation".
    >
    > Since (in Python, at least) operands are evaluated *before* the
    > operator/function is called, I do not see how.


    But they could evaluate to an expression tree instead of the actual
    result. This tree could then be evaluate at the moment of assignment.

    This is an idea I have been playing with myself in an other context.
    You have a class of symbolic names. e.g. First, Last ... You can use
    the normal operators to these names, the result will be an expression
    tree. So Last - 2 will evaluate to something like

    sub
    / \
    Last 2

    I want to use this in the context of a table (a list like structure
    but with arbitrary start index, which can be negative, so tab[-1]
    can't refer to the last element).

    So I can use this as follows:

    el = tab[Last - 2]

    to access the element two places before the last, because the evaluation
    of the tree happens in the __getitem__ method.

    I could even write something like:

    el = tab[(First + Last) / 2]

    To get at the midle element.

    --
    Antoon Pardon
     
    Antoon Pardon, May 3, 2007
    #8
  9. sturlamolden

    sturlamolden Guest

    On May 3, 6:22 am, Charles Sanders <>
    wrote:

    > y = a*b+c*d # Returns a proxy object
    >
    > x = y[4] # Computes x = a[4]*b[4] + c[4]*d[4]
    >
    > v = y.eval() # Evaluates all elements, returning Xarray
    >
    > z = ((a+b)*(c+d)).eval() # Also evaluates all elements


    When I suggested this on the NumPy mailing list, I too suggested using
    the indexing operator to trigger the computations. But I am worried
    that if an expression like

    y = a*b+c*d

    returns a proxy, it is somehow possible to mess things up by creating
    cyclically dependent proxies. I may be wrong about this, in which case
    __getitem__ et al. will do the job.


    > Whether it would be any faster is doubtful, but it would eliminate
    > the temporaries.


    The Numexpr compiler in SciPy suggests that it can. It parses an
    expression like 'y = a*b+c*d' and evaluates it. Numexpr is only a
    slow prototype written in pure Python, but still it can sometimes give
    dramatical speed-ups. Here we do not even need all the machinery of
    Numexpr, as Python creates the parse tree on the fly.

    Inefficiency of binary operators that return temporary arrays is
    mainly an issue when the arrays in the expression is too large to fit
    in cache. RAM access can be very expensive, but cache access is
    usually quite cheap. One also avoids unnecessary allocation and
    deallocation of buffers to hold temporary arrays. Again, it is mainly
    an issue when arrays are large, as malloc and free can be rather
    efficient for small objects.
     
    sturlamolden, May 3, 2007
    #9
  10. On 2007-05-03, sturlamolden <> wrote:
    > On May 3, 6:22 am, Charles Sanders <>
    > wrote:
    >
    >> y = a*b+c*d # Returns a proxy object
    >>
    >> x = y[4] # Computes x = a[4]*b[4] + c[4]*d[4]
    >>
    >> v = y.eval() # Evaluates all elements, returning Xarray
    >>
    >> z = ((a+b)*(c+d)).eval() # Also evaluates all elements

    >
    > When I suggested this on the NumPy mailing list, I too suggested using
    > the indexing operator to trigger the computations. But I am worried
    > that if an expression like
    >
    > y = a*b+c*d
    >
    > returns a proxy, it is somehow possible to mess things up by creating
    > cyclically dependent proxies. I may be wrong about this, in which case
    > __getitem__ et al. will do the job.


    How do you expect to handle the following kind of situation:

    while <condition>:
    x = y
    a = ...
    b = ...
    y = a * x + b

    --
    Antoon Pardon
     
    Antoon Pardon, May 3, 2007
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    3
    Views:
    334
  2. Boltar

    Lazy evaluation question

    Boltar, Jan 5, 2008, in forum: C Programming
    Replies:
    40
    Views:
    1,118
    Peter Nilsson
    Jan 10, 2008
  3. Ken Pu
    Replies:
    3
    Views:
    679
    Steven D'Aprano
    Jan 16, 2009
  4. Boris Borcic
    Replies:
    0
    Views:
    559
    Boris Borcic
    Jan 16, 2009
  5. Boris Borcic
    Replies:
    0
    Views:
    552
    Boris Borcic
    Jan 16, 2009
Loading...

Share This Page