Rethinking Memoization

Discussion in 'Ruby' started by James Edward Gray II, Jan 24, 2006.

  1. Daniel Berger and I have been having an off-list discussion about his
    memoize.rb library. You can find that library at:

    http://raa.ruby-lang.org/project/memoize/

    In the course of the discussion, I ended up building a library of my
    own, to show Daniel what I was talking about. Daniel thought it
    might be worth moving the discussion of my new library here, to get
    community feedback.

    The primary difference between our two approaches is that Daniel's
    library is built to memoize individual objects, while mine is
    intended to link all instance method calls for a class of objects to
    a single cache.

    I wanted this behavior because I felt it would increase cache hits
    and make memoizing methods that much more worthwhile. Daniel pointed
    out though that a finer grained control can be important, to avoid
    exhausting memory with the cache. Luckily, Ruby's syntax makes it
    trivial to use my library to alter a unique object as well.

    Here's an example of using my library to memoize an instance method,
    the intended usage:

    #!/usr/local/bin/ruby -w

    # instance_methods.rb
    #
    # Created by James Edward Gray II on 2006-01-23.
    # Copyright 2006 Gray Productions. All rights reserved.

    require "memoizable"

    class Fibonacci
    extend Memoizable

    def fib( num )
    return num if num < 2
    fib(num - 1) + fib(num - 2)
    end
    memoize :fib
    end

    puts "This method is memoized, and will run very fast..."
    start = Time.now
    puts "fib(100): #{Fibonacci.new.fib(100)}"
    puts "Run time: #{Time.now - start} seconds"

    puts
    puts "All objects share a cache, so this call, is even faster..."
    start = Time.now
    puts "fib(100): #{Fibonacci.new.fib(100)}" # simple cache hit
    puts "Run time: #{Time.now - start} seconds"

    __END__

    Also, here is how you would use the library to affect individual
    objects:

    #!/usr/local/bin/ruby -w

    # singleton_objects.rb
    #
    # Created by James Edward Gray II on 2006-01-23.
    # Copyright 2006 Gray Productions. All rights reserved.

    require "memoizable"

    class Fibonacci
    def fib( num )
    return num if num < 2
    fib(num - 1) + fib(num - 2)
    end
    end

    slow = Fibonacci.new

    puts "This method is not memoized and thus slow..."
    start = Time.now
    puts "slow.fib(30): #{slow.fib(30)}"
    puts " Run time: #{Time.now - start} seconds"

    fast = Fibonacci.new
    class << fast # memoize just this object
    extend Memoizable
    memoize :fib
    end

    puts
    puts "We can fix that for a unique object..."
    start = Time.now
    puts "fast.fib(30): #{fast.fib(30)}"
    puts " Run time: #{Time.now - start} seconds"

    puts
    puts "But the original is still slow..."
    start = Time.now
    puts "slow.fib(30): #{slow.fib(30)}"
    puts " Run time: #{Time.now - start} seconds"

    __END__

    My library also works for class/module methods and even top-level
    methods, though I will spare you those examples.

    The other difference between our libraries is that Daniel's supports
    using a file for persistent caching, while my library supports using
    a custom cache object. That means that it's a little more work to
    cache to a file using mine, but you can do other kinds of caching as
    well. Here's a file cache example:

    #!/usr/local/bin/ruby -w

    # file_persistance.rb
    #
    # Created by James Edward Gray II on 2006-01-23.
    # Copyright 2006 Gray Productions. All rights reserved.

    require "memoizable"

    #
    # A trivial implementation of a custom cache. This cache uses disk
    storage,
    # instead of a Hash in memory. Access is slower than using an in-
    memory cache,
    # though still much faster than a non-memoized method, but persistant
    between
    # program runs.
    #
    # *WARNING*: This implementation is not thread-safe!
    #
    class FileCache
    def initialize( path )
    @path = path
    end

    def []( key )
    if File.exist? @path
    File.foreach(@path) do |entry|
    return entry.split(" ").last.to_i if entry =~ /\A#{key}: /
    end
    end
    nil
    end

    def []=( key, value )
    File.open(@path, "a") { |cache| cache.puts "#{key}: #{value}" }
    end
    end

    class Fibonacci
    extend Memoizable

    def fib( num )
    return num if num < 2
    fib(num - 1) + fib(num - 2)
    end
    memoize :fib, FileCache.new("fib_cache.txt")
    end

    puts "This method is memoized using a file-based cache. See
    fib_cache.txt..."
    start = Time.now
    puts "fib(100): #{Fibonacci.new.fib(100)}"
    puts "Run time: #{Time.now - start} seconds"

    puts
    puts "Run again to see the file cache at work."

    __END__

    You can find an example using weak references and the actual library
    code in the "Memoization" section of the following article from my blog:

    http://blog.grayproductions.net/articles/2006/01/20/caching-and-
    memoization

    The point of posting all this here is to give people a chance to
    express concerns over my implementation. Daniel was avoiding going
    down this road because of issues raised by this community. Raise
    away. ;)

    If there is any interest, and we don't prove the library horribly
    broken, I would be happy to package it up.

    Thanks.

    James Edward Gray II
     
    James Edward Gray II, Jan 24, 2006
    #1
    1. Advertising

  2. James Gray wrote:
    > Daniel Berger and I have been having an off-list discussion about his
    > memoize.rb library. You can find that library at:
    >
    > http://raa.ruby-lang.org/project/memoize/
    >
    > In the course of the discussion, I ended up building a library of my
    > own, to show Daniel what I was talking about. Daniel thought it
    > might be worth moving the discussion of my new library here, to get
    > community feedback.
    >
    > < snip feature discussion due to RForum />


    Good stuff! The only idea I have just from an end-user
    perspective is that you could hook into Module#append_features
    to enable using #include instead of #extend at the class
    level (this should also allow just using #extend on individual
    objects without having to explicitly open the singleton class).

    > Thanks.
    >
    > James Edward Gray II



    E

    --
    Posted via http://www.ruby-forum.com/.
     
    Eero Saynatkari, Jan 24, 2006
    #2
    1. Advertising

  3. On Wed, Jan 25, 2006 at 01:27:25AM +0900, James Edward Gray II wrote:
    > Daniel Berger and I have been having an off-list discussion about his
    > memoize.rb library.

    [...]
    > The primary difference between our two approaches is that Daniel's
    > library is built to memoize individual objects, while mine is
    > intended to link all instance method calls for a class of objects to
    > a single cache.


    I also generalized Daniel's memoize.rb to support class-level
    memoization some time ago[1]:

    http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/173587

    Should I also release it? ;)

    > class Fibonacci
    > extend Memoizable
    >
    > def fib( num )
    > return num if num < 2
    > fib(num - 1) + fib(num - 2)
    > end
    > memoize :fib, FileCache.new("fib_cache.txt")
    > end


    I kept memoize.rb's interface, but I think I prefer this approach; maybe
    special-casing for strings could make sense though...

    > You can find an example using weak references and the actual library
    > code in the "Memoization" section of the following article from my blog:
    >
    > http://blog.grayproductions.net/articles/2006/01/20/caching-and-memoization
    >
    > The point of posting all this here is to give people a chance to
    > express concerns over my implementation. Daniel was avoiding going
    > down this road because of issues raised by this community. Raise
    > away. ;)


    I'm not sure about the
    ([Class, Module].include?(self.class) ? self : self.class)
    part; that way, you cannot memoize singleton methods from Module/Class
    objects. In my implementation, I distinguished between
    Module#instance_memoize and Object#memoize (after including Memoize).

    Also,
    original = "_#{name}"
    would need a longer prefix IMO.

    Finally, not that it really matters, but the WeakCache is fairly
    inefficient (one lambda per key); see that other thread (WeakRef hash)
    or
    http://eigenclass.org/hiki.rb?weakhash and weakref
    for other implementations.

    [1] IIRC persistent caching in Daniel Berger's memoize.rb was inspired by
    something I wrote, I did a sort of rewrite of memoize.rb and now see my name
    mentioned in your blog; it seems memoize.rb and I are bound by the string of
    destiny (*^_^*)

    --
    Mauricio Fernandez
     
    Mauricio Fernandez, Jan 24, 2006
    #3
  4. On Jan 24, 2006, at 2:21 PM, Eero Saynatkari wrote:

    > James Gray wrote:
    >> Daniel Berger and I have been having an off-list discussion about his
    >> memoize.rb library. You can find that library at:
    >>
    >> http://raa.ruby-lang.org/project/memoize/
    >>
    >> In the course of the discussion, I ended up building a library of my
    >> own, to show Daniel what I was talking about. Daniel thought it
    >> might be worth moving the discussion of my new library here, to get
    >> community feedback.
    >>
    >> < snip feature discussion due to RForum />

    >
    > Good stuff! The only idea I have just from an end-user
    > perspective is that you could hook into Module#append_features
    > to enable using #include instead of #extend at the class
    > level (this should also allow just using #extend on individual
    > objects without having to explicitly open the singleton class).


    That's an interesting idea. It does seem this is what extend() is
    for though and it's certainly been used like this in the past (see
    Forwardable).

    I do like the idea of not having to use the singleton class on
    individual objects though. Hmm, wait, doesn't that work now...
    <checks> Yep, seems to.

    What do others think, is extend() okay in the classes?

    James Edward Gray II
     
    James Edward Gray II, Jan 25, 2006
    #4
  5. 2006/1/24, James Edward Gray II <>:

    Going back to the original question which approach to favour...

    > The primary difference between our two approaches is that Daniel's
    > library is built to memoize individual objects, while mine is
    > intended to link all instance method calls for a class of objects to
    > a single cache.
    >
    > I wanted this behavior because I felt it would increase cache hits
    > and make memoizing methods that much more worthwhile. Daniel pointed
    > out though that a finer grained control can be important, to avoid
    > exhausting memory with the cache. Luckily, Ruby's syntax makes it
    > trivial to use my library to alter a unique object as well.


    I may have missed something but from what I understand I see a
    complete different problem: if you memoize for all objects of a class
    then you have two options: either use only method name and args as key
    (which I believe you do because you want better cache hits) *or* store
    add something that identifies the instance (object id, the object
    itself) which clearly would lead to cache hit ratios like in Daniel's
    approach.

    If you have a single cache for all (first alternative) you risk that
    cached results are wrong because they may depend on individiual
    object's state which you do not reflect in the cache. If you choose
    the second approach you make things more complicated than necessary
    (think of GC for example and how the cache gets notified if content is
    invalidated etc.).

    So, basically I prefer Daniel's simpler approach - especially since
    you can achieve the same: just create an instance that contains all
    code (either directly or by delegation) that is expected to be slow
    (and thus should be memoized) and memoize on that class.

    > My library also works for class/module methods and even top-level
    > methods, though I will spare you those examples.
    >
    > The other difference between our libraries is that Daniel's supports
    > using a file for persistent caching, while my library supports using
    > a custom cache object. That means that it's a little more work to
    > cache to a file using mine, but you can do other kinds of caching as
    > well. Here's a file cache example:


    I like the idea with a generalized cache. Default is a hash but it
    can be replaced by some file handling object that adhers to the same
    interface (even multilel caching with a LRU cache in mem and the whole
    cache on disk can be done... :)

    Kind regards

    robert
    --
    Have a look: http://www.flickr.com/photos/fussel-foto/
     
    Robert Klemme, Jan 25, 2006
    #5
  6. On Jan 25, 2006, at 12:55 PM, Robert Klemme wrote:

    > So, basically I prefer Daniel's simpler approach - especially since
    > you can achieve the same: just create an instance that contains all
    > code (either directly or by delegation) that is expected to be slow
    > (and thus should be memoized) and memoize on that class.


    Can you show an example or two here? Using the current memoize.rb,
    how do we get objects A, B, and C using the same cache? I guess I'm
    having trouble getting my head around it.

    Thanks.

    James Edward Gray II
     
    James Edward Gray II, Jan 25, 2006
    #6
  7. James Edward Gray II wrote:
    > On Jan 25, 2006, at 12:55 PM, Robert Klemme wrote:
    >
    >> So, basically I prefer Daniel's simpler approach - especially since
    >> you can achieve the same: just create an instance that contains all
    >> code (either directly or by delegation) that is expected to be slow
    >> (and thus should be memoized) and memoize on that class.

    >
    > Can you show an example or two here? Using the current memoize.rb,
    > how do we get objects A, B, and C using the same cache? I guess I'm
    > having trouble getting my head around it.


    class SillySomething
    extend Memoizable

    def initialize()
    @a,@b,@c = A.new, B.new, C.new
    # or other initialization
    end

    def calculate_foo(quirk, quark, quork)
    x = @a.foo(quirk) + @b.doit(quark)
    if x > 10
    x += @c.doh(quork, @a)
    else
    x - 2
    end
    x
    end

    memoize :calculate_foo
    end

    The idea is to group relevant parts of your object model in a single class
    and use an instance of that for memoization.

    Maybe I can add an explanation to make it more clear: if you have a class
    X with method Y that calculates something results of invoking Y on several
    instances of X can only be memoized in the same cache if (i) either they
    are not influenced by state of X instances or (ii) memoization somehow
    uses this state to control lookup.

    If (i) I don't see the point in having several objects that do exactly the
    same calculation - methods might even be class methods (i.e. singleton
    methods of the *single* class instance X). So there is actually no need
    for a cache that covers several instances.

    If (ii) you gain nothing by having a centralized cache because you'll have
    to keep memoized values from different instances separate; in fact you add
    a level of lookup (first object, then method and args - or do it in one
    request with a larger key) and separate instance state from the instance.
    IMHO that's inefficient, memory leaking and error prone.

    Kind regards

    robert
     
    Robert Klemme, Jan 26, 2006
    #7
  8. On Jan 26, 2006, at 7:08 AM, Robert Klemme wrote:

    > class SillySomething
    > extend Memoizable
    >
    > def initialize()
    > @a,@b,@c = A.new, B.new, C.new
    > # or other initialization
    > end
    >
    > def calculate_foo(quirk, quark, quork)
    > x = @a.foo(quirk) + @b.doit(quark)
    > if x > 10
    > x += @c.doh(quork, @a)
    > else
    > x - 2
    > end
    > x
    > end
    >
    > memoize :calculate_foo
    > end


    This example runs as written with the module discussed in this
    thread. It requires multiple changes to work with the current
    memoize.rb library.

    > If (i) I don't see the point in having several objects that do
    > exactly the
    > same calculation - methods might even be class methods (i.e. singleton
    > methods of the *single* class instance X). So there is actually no
    > need
    > for a cache that covers several instances.
    >
    > If (ii) you gain nothing by having a centralized cache because
    > you'll have
    > to keep memoized values from different instances separate; in fact
    > you add
    > a level of lookup (first object, then method and args - or do it in
    > one
    > request with a larger key) and separate instance state from the
    > instance.
    > IMHO that's inefficient, memory leaking and error prone.


    I understand know. You're just against this feature. Fair enough.

    Thanks for taking the time to explain it to me.

    James Edward Gray II
     
    James Edward Gray II, Jan 26, 2006
    #8
  9. James Edward Gray II wrote:
    > On Jan 26, 2006, at 7:08 AM, Robert Klemme wrote:
    >
    >> class SillySomething
    >> extend Memoizable
    >>
    >> def initialize()
    >> @a,@b,@c = A.new, B.new, C.new
    >> # or other initialization
    >> end
    >>
    >> def calculate_foo(quirk, quark, quork)
    >> x = @a.foo(quirk) + @b.doit(quark)
    >> if x > 10
    >> x += @c.doh(quork, @a)
    >> else
    >> x - 2
    >> end
    >> x
    >> end
    >>
    >> memoize :calculate_foo
    >> end

    >
    > This example runs as written with the module discussed in this
    > thread. It requires multiple changes to work with the current
    > memoize.rb library.


    It was typed from memory so I may have mixed up something. The point was
    the grouping of instances (which could have been demonstrated completely
    without memoization).

    >> If (i) I don't see the point in having several objects that do
    >> exactly the
    >> same calculation - methods might even be class methods (i.e.
    >> singleton methods of the *single* class instance X). So there is
    >> actually no need
    >> for a cache that covers several instances.
    >>
    >> If (ii) you gain nothing by having a centralized cache because
    >> you'll have
    >> to keep memoized values from different instances separate; in fact
    >> you add
    >> a level of lookup (first object, then method and args - or do it in
    >> one
    >> request with a larger key) and separate instance state from the
    >> instance.
    >> IMHO that's inefficient, memory leaking and error prone.

    >
    > I understand know. You're just against this feature. Fair enough.


    Hm, that sounds like it was a matter of taste. Actually I tried to
    *argue* against this based on reflection about consequences of this
    design. IMHO having a cache cover only one instance is a superior
    approach with regard to code quality, maintainability and runtime
    properties (lookups, GC).

    > Thanks for taking the time to explain it to me.


    You're welcome.

    Kind regards

    robert
     
    Robert Klemme, Jan 27, 2006
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Magnus Lycka

    Rethinking the Python tutorial

    Magnus Lycka, Feb 9, 2006, in forum: Python
    Replies:
    11
    Views:
    560
    Magnus Lycka
    Feb 15, 2006
  2. stork
    Replies:
    6
    Views:
    371
    Michael Ashton
    Dec 18, 2006
  3. dorayme
    Replies:
    0
    Views:
    390
    dorayme
    May 4, 2008
  4. Steven D'Aprano
    Replies:
    138
    Views:
    2,115
  5. John Carter
    Replies:
    3
    Views:
    516
    Robert Klemme
    Mar 30, 2007
Loading...

Share This Page