Implementing caches that may be cleared if heap space shrinks away

T

Tassilo Horn

Hi all,

I have a client that queries a server via RMI and uses some caches for
the results to speed up things. Unfortunatily, these caches can grow
till heap space is completely filled up and I get an exception.

So now my cache classes don't use Map<S,T> anymore but
SoftReference<Map<S,T>>. But with that approach I always have to check
if the soft reference has already been cleared before accessing the
referenced Map, i.e. in each of my cache classes I have something like
this:

--8<---------------cut here---------------start------------->8---
private void initializeCache() {
words = new SoftReference<HashMap<String, WordInfo>>(
new HashMap<String, WordInfo>());
}

WordInfo getWordInfo(String word) throws RemoteException, JGWNLException {
if (words.get() == null) {
initializeCache();
} else if (words.get().containsKey(word)) {
return words.get().get(word);
}
WordInfo wi = jgwnl.lookupWord(word);
for (SynsetInfo si : wi.getSynsetInfos()) {
synsetCache.updateSynsetInfo(si.getUid(), si);
}
words.get().put(word, wi);
return wi;
}
--8<---------------cut here---------------end--------------->8---

And there may be the possibility that after the check the soft ref is
cleared before I access it...

One poor-man's alternative would be to poll Runtime.freeMemory() in
another Thread and clear the caches when it falls beyond a certain
limit.

The other alternative I've thought about is deriving the cache classes
directly from SoftReference and overriding the clear() method. But its
docs state that the garbage collector won't call clear() but delete the
referent directly. So that's no solution, too.

Basically, what I want is an interface Clearable which defines a clear()
method that's called automatically if heap space shrinks away, but I
couldn't find anything like that.

So what's the right way to implement a cache in Java?

Thanks for any pointers,
Tassilo
 
R

Roedy Green

So what's the right way to implement a cache in Java?

Have you had a look at how WeakHashMap works?
--
Roedy Green Canadian Mind Products
http://mindprod.com

"Don’t worry about the world coming to an end today.
It is already tomorrow in Australia."
~ Charles Schulz
 
T

Tassilo Horn

Hi Matt,
I don't think this is true. From the javadocs on SoftReference--- "As
long as the referent of a soft reference is strongly reachable, that
is, is actually in use, the soft reference will not be cleared." At
the time you retrieve the soft reference it is either null (already
cleared) or it refers to the object. If it refers to the object it is
now strongly reachable via the local variable and will not be cleared.
This guarantee is atomic so there's no need to double check. Just
retrieve the reference and if it's not present reload it. It will
remain as long as you have a reference to it.

Ah, ok. Now I got it. Thanks for enlightening me!
Although I've not implemented caches this way I'm puzzled as to why
the entire map is under softreference rather than the individual
entries, although this is likely just a characteristic of your
particular design.

Yes, both keys and values are of small memory footprint, but there may
[QUOTE="SoftRef said:
Basically, what I want is an interface Clearable which defines a
clear() method that's called automatically if heap space shrinks
away, but I couldn't find anything like that.

That is similar to what I've done (appropriate for my system). I use
direct LRU / LFU with a scavenger thread so that each entry has a
last-used-at timestamp or frequency. The scavenger invokes "cleanup"
which scans the table and drops items not being used.[/QUOTE]

But when do you invoke cleanup? Polling Runtime.freeMemory()?

Bye,
Tassilo
 
T

Tassilo Horn

Hi Roedy,
Have you had a look at how WeakHashMap works?

Hm, there may be value objects that refer to their own keys. The docs
state that will make those entries undeletable.

Bye,
Tassilo
 
L

lord.zoltar

But when do you invoke cleanup?  Polling Runtime.freeMemory()?

Bye,
Tassilo

Apache Commons Collections (http://commons.apache.org/collections/)
has a number of classes that may help, including an LRU cache. It will
keep adding to the cache until it hits the limit you specify and then
will remove the least recently used items. Of course, if you get a
cache miss, you'll have to have some other method of retrieving the
data. It's worked pretty well for me, but I usually know a good
ballpark figure of # items in cache so cache misses are not too
frequent.
 
T

Tassilo Horn

Hi Steven,
But there is no local variable in the code above. Having tested
words.get() and found it to be not null, there's no hard reference, so
there's a slim chance of it becoming null before the next 'if'. I
think the first thing you need to do with any Reference is to get() it
into a local variable, and use that from there:

That's what I'm doing now, and Matt's response pointed me into that
direction, although he wasn't right with his argument of me having a
strong reference.

Bye,
Tassilo
 
T

Tassilo Horn

Hi Matt,
Get () returns a strong reference (if there is one) which your code
discarded. Sorry for the confusion--I was describing what you should
do, not what you had done.

Yes, and I got it right. So thanks again for the pointer.

Bye,
Tassilo
 
T

Tassilo Horn

Hi Matt,
Having millions of keys is probably your main design consideration.
Are they accessed somewhat randomly or is there some pattern?

To make the cache efficient you have to have a reasonable guess at the
usage pattern. (i.e. How many re-reads are you expecting to save?
What's the cost of each one? How often are writes performed)

To make a short story short: There's a server hosting a graph that was
built using the WordNet taxonomy. It offers a method to compare two
words via RMI.

The clients are used to compare some other graphs modeling requirements,
and on the lowest level that includes comparing words. For that, it
simply asks the server. Of course, many requirements use equal words,
so the similarity for a pair is cached.

So I'd say keeping the most often used word pairs and deleting the least
often used would be the best caching policy. But currently this is not
the bottleneck, so I'll stay with the current delete-all approach till
optimization on this side is needed.

Bye,
Tassilo
 
A

Alan Gutierrez

The other alternative I've thought about is deriving the cache classes
directly from SoftReference and overriding the clear() method. But its
docs state that the garbage collector won't call clear() but delete the
referent directly. So that's no solution, too.

Basically, what I want is an interface Clearable which defines a clear()
method that's called automatically if heap space shrinks away, but I
couldn't find anything like that.

So what's the right way to implement a cache in Java?

Read through the ReferenceMap in Google Collections to see a map
implementation that supports keys and/or values wrapped in soft or
weak references.

http://code.google.com/p/google-col...m/google/common/collect/ReferenceMap.java?r=4

ReferneceMap is gone in the latest Google Collections replaced by...

Map<String, WordInfo> cache =
new MapMaker.weakKeys().softValues().makeMap();

I've implemented caches that have weak or soft references to the by
creating a subclass of SoftReference that adds the value key as a
property of the reference, keeping the references in a reference
queue. Whenever get or put is called on the cache, I walk through the
reference queue removing the collected values from the cache, using
the key kept in the reference. You can see the collect method in this
cache class as as a place to start to see the implementation.

http://github.com/bigeasy/sheaf/blo...c/main/java/com/goodworkalan/sheaf/Sheaf.java

Alan Gutierrez - (e-mail address removed) - http://blogometer.com/
 
T

Tassilo Horn

Hi Alan,
ReferneceMap is gone in the latest Google Collections replaced by...

Map<String, WordInfo> cache =
new MapMaker.weakKeys().softValues().makeMap();

I've implemented caches that have weak or soft references to the by
creating a subclass of SoftReference that adds the value key as a
property of the reference, keeping the references in a reference
queue. Whenever get or put is called on the cache, I walk through the
reference queue removing the collected values from the cache, using
the key kept in the reference. You can see the collect method in this
cache class as as a place to start to see the implementation.

http://github.com/bigeasy/sheaf/blo...c/main/java/com/goodworkalan/sheaf/Sheaf.java

Thanks, that looks interesting. I'll have a look at it.

Bye,
Tassilo
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,189
Latest member
CryptoTaxSoftware

Latest Threads

Top