Memoize and GC

I

Intransition

I have a quick question. I just re-read:

http://www.engineyard.com/blog/2010/memoization-and-id2ref/

I am wondering about garbage collection. Am I right to think that
using

def foo
@foo ||= something
end

will mean that @foo be GC'd when the object is GC'd, but if I do

FOO = []

def foo
FOO[object_id] ||= something
end

Then the memorized entry will just hang around forever?

If so, then how do we do it without polluting the instance var space
(as in example 2), but still get the garbage collection (as in example
1)?

Thanks.
 
R

Robert Klemme

I have a quick question. I just re-read:

http://www.engineyard.com/blog/2010/memoization-and-id2ref/

I am wondering about garbage collection. Am I right to think that
using

def foo
@foo ||= something
end

will mean that @foo be GC'd when the object is GC'd, but if I do

FOO = []

def foo
FOO[object_id] ||= something
end

Then the memorized entry will just hang around forever?

If so, then how do we do it without polluting the instance var space
(as in example 2), but still get the garbage collection (as in example
1)?

I would live with polluting the instance var space. Btw, I usually do
not use memoize but rather an explicit Hash with default_proc. That way
I have control over where I place the Hash and hence how long it lives.

Of course you can hack something together using #object_id and an
external Hash but then you also need to create a finalizer and soon
things get overly complicated. If you have issues with polluting the
instance variable namespace you can a) put all memoize containers into a
single instance variable (typically a Hash) and / or b) generate
instance variable names which are likely to be conflict free.

Kind regards

robert
 
J

Jesús Gabriel y Galán

I have a quick question. I just re-read:

=A0http://www.engineyard.com/blog/2010/memoization-and-id2ref/

I am wondering about garbage collection. Am I right to think that
using

=A0def foo
=A0 =A0@foo ||=3D something
=A0end

will mean that @foo be GC'd when the object is GC'd, but if I do

=A0FOO =3D []

=A0def foo
=A0 =A0FOO[object_id] ||=3D something
=A0end

Then the memorized entry will just hang around forever?

If so, then how do we do it without polluting the instance var space
(as in example 2), but still get the garbage collection (as in example
1)?

Thanks.

One way could be using a closure that redefines the method:

irb(main):019:0> class A
irb(main):020:1> def a
irb(main):021:2> puts "expensive calculation of a..."
irb(main):022:2> a =3D "complex value"
irb(main):023:2> self.class.send:)define_method, :a) do
irb(main):024:3* a
irb(main):025:3> end
irb(main):026:2> a
irb(main):027:2> end
irb(main):028:1> end
=3D> nil
irb(main):029:0> obj =3D A.new
=3D> #<A:0xb7c9ef20>
irb(main):030:0> obj.a
expensive calculation of a...
=3D> "complex value"
irb(main):031:0> obj.a
=3D> "complex value"

Jesus.
 
C

Caleb Clausen

will mean that @foo be GC'd when the object is GC'd, but if I do

FOO = []

def foo
FOO[object_id] ||= something
end

I have to assume that you meant FOO to be a hash... tho an array will
work here... sorta.

But rather than either an array or hash, what if you could write:

FOO=Cache.new

That is, FOO would behave mostly like a hash, but unused entries would
age out after a while. Now if only someone would invent an appropriate
Cache class.
 
I

Intransition

will mean that @foo be GC'd when the object is GC'd, but if I do
=A0 FOO =3D []
=A0 def foo
=A0 =A0 FOO[object_id] ||=3D something
=A0 end

I have to assume that you meant FOO to be a hash... tho an array will
work here... sorta.

But rather than either an array or hash, what if you could write:

=A0 FOO=3DCache.new

That is, FOO would behave mostly like a hash, but unused entries would
age out after a while. Now if only someone would invent an appropriate
Cache class.

Ah, very good thought! An LRUCache would do pretty well here.

Well then, that leads to me to another project. My implementation of
LRUCache (lrucache gem) kind of sucks in that it only handles max
size, so I am looking for a new one to replace it. Any
recommendations?
 
R

Robert Klemme

2010/4/13 Intransition said:
will mean that @foo be GC'd when the object is GC'd, but if I do
=A0 FOO =3D []
=A0 def foo
=A0 =A0 FOO[object_id] ||=3D something
=A0 end

I have to assume that you meant FOO to be a hash... tho an array will
work here... sorta.

But rather than either an array or hash, what if you could write:

=A0 FOO=3DCache.new

That is, FOO would behave mostly like a hash, but unused entries would
age out after a while. Now if only someone would invent an appropriate
Cache class.

Ah, very good thought! An LRUCache would do pretty well here.

Well then, that leads to me to another project. My implementation of
LRUCache (lrucache gem) kind of sucks in that it only handles max
size, so I am looking for a new one to replace it. Any
recommendations?

What else do you want to handle? My implementation [1] does also only
consider size but I have also seen implementations that consider
insertion or access time. Personally I dislike those solutions
because they might throw away stuff too early and make the
implementation more complex. The main point (resource control) is
already achieved by limiting the size of a LRU map.

Kind regards

robert

http://github.com/rklemme/muppet-laboratories/blob/master/lib/lruhash.rb


--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 
R

Robert Dober

2010/4/13 Jes=FAs Gabriel y Gal=E1n said:
I have a quick question. I just re-read:

=A0http://www.engineyard.com/blog/2010/memoization-and-id2ref/

I am wondering about garbage collection. Am I right to think that
using

=A0def foo
=A0 =A0@foo ||=3D something
=A0end

will mean that @foo be GC'd when the object is GC'd, but if I do

=A0FOO =3D []

=A0def foo
=A0 =A0FOO[object_id] ||=3D something
=A0end

Then the memorized entry will just hang around forever?

If so, then how do we do it without polluting the instance var space
(as in example 2), but still get the garbage collection (as in example
1)?

Thanks.

One way could be using a closure that redefines the method:

irb(main):019:0> class A
irb(main):020:1> def a
irb(main):021:2> puts "expensive calculation of a..."
irb(main):022:2> a =3D "complex value"
irb(main):023:2> self.class.send:)define_method, :a) do
I like the closure idea, but I guess the memozing got lost a little bit ;).
thus I would write
def a
_a =3D nil
self.class.module_eval do
define_method :a do _a ||=3D "expensive calculation" end
define_method :_invalidate_a do _a =3D nil end
end
a
end
Cheers
R.

--=20
Learning without thought is labor lost; thought without learning is perilou=
s.=94
--- Confucius
 
I

Intransition

What else do you want to handle? =A0My implementation [1] does also only
consider size but I have also seen implementations that consider
insertion or access time. =A0Personally I dislike those solutions
because they might throw away stuff too early and make the
implementation more complex. =A0The main point (resource control) is
already achieved by limiting the size of a LRU map.

Fair enough, I just thought it would be nice to have the *option* of a
time-based cache.
 
R

Robert Klemme

2010/4/14 Intransition said:
What else do you want to handle? =A0My implementation [1] does also only
consider size but I have also seen implementations that consider
insertion or access time. =A0Personally I dislike those solutions
because they might throw away stuff too early and make the
implementation more complex. =A0The main point (resource control) is
already achieved by limiting the size of a LRU map.

Fair enough, I just thought it would be nice to have the *option* of a
time-based cache.

Definitively! But this would be a different beast for me. One might
be able to implement this by using a LRUHash or by inheriting from it.
But I would not combine that in a single class. Another option could
be to have a modular approach where the container gets a
DeletionPolicy that is responsible for deciding when or what to
delete. The interface could look like

interface DeletionPolicy
def insert(key,val)
def remove(key, val)
def access(key, val)
def next_delete # yields key
end

Just a quick hack though withou

Kind regards

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 
I

Intransition

2010/4/14 Intransition <[email protected]>:


What else do you want to handle? =A0My implementation [1] does also on= ly
consider size but I have also seen implementations that consider
insertion or access time. =A0Personally I dislike those solutions
because they might throw away stuff too early and make the
implementation more complex. =A0The main point (resource control) is
already achieved by limiting the size of a LRU map.
Fair enough, I just thought it would be nice to have the *option* of a
time-based cache.

Definitively! =A0But this would be a different beast for me. =A0One might
be able to implement this by using a LRUHash or by inheriting from it.
=A0But I would not combine that in a single class. =A0Another option coul= d
be to have a modular approach where the container gets a
DeletionPolicy that is responsible for deciding when or what to
delete. =A0The interface could look like

interface DeletionPolicy
=A0 def insert(key,val)
=A0 def remove(key, val)
=A0 def access(key, val)
=A0 def next_delete # yields key
end

Just a quick hack though withou

I kind of like that idea.
 
C

Caleb Clausen

Ah, very good thought! An LRUCache would do pretty well here.

Well then, that leads to me to another project. My implementation of
LRUCache (lrucache gem) kind of sucks in that it only handles max
size, so I am looking for a new one to replace it. Any
recommendations?

Last time I went looking for a cache class, I found several written in
ruby, but I really needed something that would be super-fast, so it
had to be written in c. I did write about half of a ruby hash class in
c before giving up.... For most use cases, tho, presumably including
this one, a ruby implementation would be fine. Having a customizable
age-out policy is the only other nice-to-have I can think of.
 
P

Phrogz

But rather than either an array or hash, what if you could write:

  FOO=Cache.new

That is, FOO would behave mostly like a hash, but unused entries would
age out after a while. Now if only someone would invent an appropriate
Cache class.

Undocumented and without a test framework, I offer you a class I wrote
a while ago that's still live and in use a couple of years later. It's
not exactly what you're asking for, but close. (It returns string-
based keys for accessing the objects because my usage has it running
in a separate process from the main app.) I reasonably hate the
decision I made to return values versus arrays based on number of
arguments, and it's not as DRY as it could be, but it's working :)


# Example usage
# class SituationServer < Phrogz::Librarian
# ...
# end
#
# CLEANING_SCHEDULE = 60*60 # Clean up once an hour
# MAX_STALENESS = 60*60 # Remove items not used in the last
hour
#
# @server = SituationServer.new
# Thread.new{
# loop {
# sleep CLEANING_SCHEDULE
# removed = @server.remove_stale( MAX_STALENESS )
# puts "Just removed #{removed} stale item#{:s unless
removed==1}..."
# }
# }

module Phrogz; end
class Phrogz::Librarian
def initialize( key_size=4 )
@key_size = key_size
@max_keys = 16 ** key_size
@serializer = "%0#{key_size}x"
@library = {}
@stored = {}
@access = {}
end

def store( *objects )
keys = objects.map{ |object|
key,obj = @library.find{ |k,o| o==object }
unless key
# FIXME: bail if @library.size >= MAX_KEYS, or else this will
infinite loop
# TODO: come up with a better strategy for finding a free key
begin
key = @serializer % rand( @max_keys )
end while @library.has_key?( key )
@library[ key ] = object
@stored[ key ] = Time.now
end
key
}
keys.length == 1 ? keys.first : keys
end
alias_method :<<, :store

def fetch( *keys )
now = Time.now # So that multiple accesses are exactly
synchronized
objects = keys.map{ |key|
if @library.has_key?( key )
@access[ key ] = now
@library[ key ]
end
}
objects.length == 1 ? objects.first : objects
end

def size
@library.size
end

def storage_time( *keys )
results = @stored.values_at( *keys )
results.length == 1 ? results.first : results
end

def age( *keys )
results = @stored.values_at( *keys ).map{ |t| Time.now - t }
results.length == 1 ? results.first : results
end

def last_access( *keys )
results = @access.values_at( *keys )
results.length == 1 ? results.first : results
end

def staleness( *keys )
results = @access.values_at( *keys ).map{ |t| Time.now - t }
results.length == 1 ? results.first : results
end

def discard( *keys )
results = keys.map{ |key|
@stored.delete(key)
@access.delete(key)
@library.delete(key)
}
results.length == 1 ? results.first : results
end

def remove_stale( max_staleness )
now = Time.now
stale_keys = @access.select{ |key,time| (now-time) >
max_staleness }.keys
stale_keys.each{ |key|
@stored.delete( key )
@access.delete( key )
object = @library.delete( key )
yield object if block_given?
}
stale_keys.length
end

def remove_old( max_age )
now = Time.now
stale_keys = @stored.select{ |key,time| (now-time) >
max_age }.keys
stale_keys.each{ |key|
@stored.delete( key )
@access.delete( key )
object = @library.delete( key )
yield object if block_given?
}
stale_keys.length
end

end
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top