WeakRef and caches

T

Tim Bates

Hi all,
I'm trying to build a caching mechanism into a library I'm writing. I
want to do it using weakrefs. It will look something like this:

require 'weakref'

class Cache
class << self
@cache_hash = {}
def insert(obj, id)
@cache_hash[id] = WeakRef.new(obj)
end

def retrieve(id)
o = @cache_hash[id]
# If this object hasn't been GC'd yet
if o.weakref_alive?
# Return a strong ref to it.
o.__getobj__
else
# Regenerate and return object
# o = MyClass.fetch(id)
insert(o, id)
o
end
end
end
end

Unfortunately, this doesn't work. The idea is that the cache always
returns strong-refs, so that the user doesn't have to worry about them
getting garbage-collected - to the user, they're just ordinary objects.
However, once they go outside the user's scope, the only reference to
them left is the weakref in the cache, and they can be garbage
collected. Then, if the user requests it again, it can be returned if
it's not GC'd yet, or regenerated and returned if it has.

It seems, though, that the WeakRef implementation is such that if you
refer to an object other than via its weakref, the weakref-ness is
destroyed and the object never gets GC'd. Can anyone explain why and/or
offer a way around this?

Tim Bates
 
R

Ryan Pavlik

Hi all,
I'm trying to build a caching mechanism into a library I'm writing. I
want to do it using weakrefs. It will look something like this:
<snip>

Funny, I was just working on this. ;-)

It seems, though, that the WeakRef implementation is such that if you
refer to an object other than via its weakref, the weakref-ness is
destroyed and the object never gets GC'd. Can anyone explain why and/or
offer a way around this?

Here's what I did. The idea was that the system would be addressing an
indeterminate number of persistant objects, in a transparent (to the
user/developer) manner. Thus, the problem was twofold:

* We need one unique reference for each object, so reloading objects
from the object store doesn't create copies and we have sync
problems

* Reading from the disk all the time is slow

Keeping everything in memory is also out, since the system needs to
scale.

This gives you an idea what the cache was _for_. To accomplish this, I
keep _two_ caches: a "strong" cache and a "weak" cache. The "strong"
holds objects until it's determined that they've expired---the strong
cache limit is reached, and they're written to disk and removed from the
strong cache.

When an object is referred to, first I check the strong cache, then I
check the weak cache. If it's found in the strong cache, I return it
normally. If it's in the weak cache, I put it in the strong cache, and
return it. (This solves the multiple-references problem.) If it's not
found in either, load it from disk and put it in both.

The garbage collector does its stuff to keep things within limits.

Unfortunately this isn't perfect---I'd like to expire stuff from the
cache based on size, so I could set a memory limit, but Ruby has no way
for querying object size (*). Also, there's some further evil magic
required to make sure things get written to the disk as appropriate.
(For instance, if an object which has a reference changes the object
after it's expired from the strong cache, it needs written again.) Not
having destructors makes this slightly painful, and finalizers don't
work here.

Depending on what you're doing, this scheme ought to work---I'm not sure
how you're adding and removing objects, but it's probably similar if
you're wanting to cache them---just make sure you've got all the cases
covered. ;)

(If you need a fully-functional transparently persistant object system
with this sort of thing already implemented, lemme know, I can give you
the code.)

hth,

--
Ryan Pavlik <[email protected]>

"The fact that you believe in something called a
'Babedar' *alone* leads me to conclude that it's
always wrong." - 8BT
 
T

Tim Bates

(If you need a fully-functional transparently persistant object system
with this sort of thing already implemented, lemme know, I can give you
the code.)

I'm mostly done writing my own, except mine is db-based rather than
disk-based - although the eventual plan is to make it support multiple
persistence mechanisms. It has a cache already (to make sure you don't
get multiple instances of the same database row) and I simply wanted to
make it automatically expire once the user has finished with the
strong-ref (ie it goes out of scope). But WeakRef doesn't like you
using strong-refs, even if they subsequently go out of scope it doesn't
let the object be GC'd.

Tim Bates
 
R

Ryan Pavlik

I'm mostly done writing my own, except mine is db-based rather than
disk-based - although the eventual plan is to make it support multiple
persistence mechanisms.

Actually I use DBs at the moment myself. (Basically, you just need to
write a Storage driver, but I only have MySQL and SQLite drivers at the
moment; more planned.)
It has a cache already (to make sure you don't
get multiple instances of the same database row) and I simply wanted to
make it automatically expire once the user has finished with the
strong-ref (ie it goes out of scope). But WeakRef doesn't like you
using strong-refs, even if they subsequently go out of scope it doesn't
let the object be GC'd.

Well, ask yourself exactly what you want the cache to be used for. In
my case, it's speeding up access to commonly-used objects; so even if
one client gets done with a set of objects, another might use them.
Having them in memory makes things a _lot_ faster. Plus, it's easier to
do. ;-)

--
Ryan Pavlik <[email protected]>

"The fact that you believe in something called a
'Babedar' *alone* leads me to conclude that it's
always wrong." - 8BT
 
T

ts

T> I'm trying to build a caching mechanism into a library I'm writing. I
T> want to do it using weakrefs. It will look something like this:

Perhaps best to don't use weakref. Something like (*this is just an
example*)

class WeakPool
def initialize
@obj = Hash.new
end

def final(a)
lambda { @obj.delete(a) }
end

def []=(a, b)
ObjectSpace.define_finalizer b, final(a)
@obj[a] = b.id
end

def [](a)
b = @obj[a]
b = ObjectSpace._id2ref(b) unless b.nil?
b
end
end


Guy Decoux
 
R

Robert Klemme

Tim Bates said:
Unfortunately, this doesn't work. The idea is that the cache always
returns strong-refs, so that the user doesn't have to worry about them
getting garbage-collected - to the user, they're just ordinary objects.
However, once they go outside the user's scope, the only reference to
them left is the weakref in the cache, and they can be garbage
collected. Then, if the user requests it again, it can be returned if
it's not GC'd yet, or regenerated and returned if it has.

It seems, though, that the WeakRef implementation is such that if you
refer to an object other than via its weakref, the weakref-ness is
destroyed and the object never gets GC'd. Can anyone explain why and/or
offer a way around this?

That's the expected behavior: the weakest element in the strongest path
determines the availability of an instance. So an instance won't be GC'ed
as long as there is at least one strong ref to it. This has to be so, in
order to not change the semantics of strong refs. If they could be cleared
out of nothing many programs would not work as expected any more.

Your cache implementation is exactly as it should be, but the code outside
should only hold on to an instance as long as it is needed, typically a
method invocation's duration. The only thing the outside world should store
is the id, which is needed to access the instance via the cache. A typical
usage pattern looks like this

def doit(arg)
obj = @cache[@id]
obj.mathod1
obj.mathod2
obj.mathod3(arg)
# obj cleared here
end

Regards

robert


PS: There's a nice article about Java reference types which are quite
similar:
http://developer.java.sun.com/developer/technicalArticles/ALT/RefObj/
 
A

Alexander Bokovoy

Actually I use DBs at the moment myself. (Basically, you just need to
write a Storage driver, but I only have MySQL and SQLite drivers at the
moment; more planned.)
Wow. Are you planning to have this opened, under some OSS license like
LGPL?
 
R

Ryan Pavlik

On Mon, 11 Aug 2003 22:10:48 +0900

Wow. Are you planning to have this opened, under some OSS license like
LGPL?

GPL, yes... you can download it now if you really want, but I need a
good week or so to do a formal release. (Site, documentation, a few
updates I've been meaning to do, etc.)
 
A

Alexander Bokovoy

On Mon, 11 Aug 2003 22:10:48 +0900



GPL, yes... you can download it now if you really want, but I need a
good week or so to do a formal release. (Site, documentation, a few
updates I've been meaning to do, etc.)
Ok. I can wait for a week ;) more overload at work right now.
 
R

Robert Klemme

Kent Dahl said:
Tim said:
Your cache implementation is exactly as it should be, but the code outside
should only hold on to an instance as long as it is needed, typically a
method invocation's duration. The only thing the outside world should store
is the id, which is needed to access the instance via the cache. A typical
usage pattern looks like this

Yes, except that it doesn't work:

[tim@zaphod:3 ~/ruby]$ cat weakref_test.rb
require 'weakref'

o = Object.new
q = o
o = WeakRef.new(o)
ObjectSpace.garbage_collect
p o
p q
q = nil # o is now the only reference to the object.
ObjectSpace.garbage_collect
puts o

Playing around I noticed this:

require 'weakref'
o = WeakRef.new(Object.new)
puts o.__getobj__ # A
puts o.__getobj__.to_s # B
ObjectSpace.garbage_collect
puts o

Comment out line A and B in turn. With only B, o gets GC'ed, but not
with line A.

A wild guess: Parameters to method calls are pushed onto the stack and
(naturally for efficiency) not cleaned away after the method call has
ended. Thus the current call frame contains a reference to the
"unreferenced" object, and you cannot be sure that it will disappear
until after the call frame has been "popped" from the call stack.

Does that make any sense to someone with knowledge of the Ruby innards?

A WeakRef does not guarantee that an instance is GC'ed as soon as there is
no strong ref any more. It's the other way round: *if* there is no strong
ref any more, then the instance *might* be GC'ed during a GC cycle.
Introducing another cycle leads to this:

irb(main):001:0> require 'weakref'
=> true
irb(main):002:0>
irb(main):003:0* o = WeakRef.new(Object.new)
=> #<Object:0x2801ca0>
irb(main):004:0> o.__getobj__ # A
=> #<Object:0x2801ca0>
irb(main):005:0> # o.__getobj__.to_s # B
irb(main):006:0* ObjectSpace.garbage_collect
=> nil
irb(main):007:0> o
=> #<Object:0x2801ca0>
irb(main):008:0> ObjectSpace.garbage_collect
=> nil
irb(main):009:0> o
WeakRef::RefError: Illegal Reference - probably recycled
from C:/Programme/Ruby-1.8.0/lib/ruby/1.8/irb.rb:292:in
`output_value'
from C:/Programme/Ruby-1.8.0/lib/ruby/1.8/irb.rb:149:in `eval_input'
from C:/Programme/Ruby-1.8.0/lib/ruby/1.8/irb.rb:146:in
`signal_status'
from C:/Programme/Ruby-1.8.0/lib/ruby/1.8/irb.rb:146:in `eval_input'
from C:/Programme/Ruby-1.8.0/lib/ruby/1.8/irb.rb:144:in
`each_top_level_
statement'
from C:/Programme/Ruby-1.8.0/lib/ruby/1.8/irb/ruby-lex.rb:219:in
`loop'
from C:/Programme/Ruby-1.8.0/lib/ruby/1.8/irb/ruby-lex.rb:247:in
`each_t
op_level_statement'
from C:/Programme/Ruby-1.8.0/lib/ruby/1.8/irb/ruby-lex.rb:218:in
`catch'

from C:/Programme/Ruby-1.8.0/lib/ruby/1.8/irb/ruby-lex.rb:218:in
`each_t
op_level_statement'
from C:/Programme/Ruby-1.8.0/lib/ruby/1.8/irb.rb:144:in `eval_input'
from C:/Programme/Ruby-1.8.0/lib/ruby/1.8/irb.rb:70:in `start'
from C:/Programme/Ruby-1.8.0/lib/ruby/1.8/irb.rb:69:in `catch'
from C:/Programme/Ruby-1.8.0/lib/ruby/1.8/irb.rb:69:in `start'
from C:/Programme/Ruby-1.8.0/bin/irb:13
Maybe IRB bug!!
irb(main):010:0>

Of course another option is that you might have found a bug in WeakRef / GC,
but I don't assume so in this case.

Regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,348
Messages
2,571,451
Members
48,795
Latest member
Lonell Lee

Latest Threads

Top