Weak references - I must be missing something...

M

Mark M

I have read a lot of material on weak/soft/phantom references, but
they do not seem to solve a common memory problem we have. I must be
missing something about how weak references work. Here is the
problem:

We have large in-memory objects that cannot be recreated from the
original source. They can however, be persisted to disk and
reconstructed, but it is very expensive to do so.

We need to have the JVM collect these objects when memory runs low,
but we don't want to pay the cost of putting them to disk unless it is
necessary.

Weak references do not seem to solve this problem because they seem to
assume that the object can be re-created at any time. We must know
ahead of time that the object is going to be disposed so we can save
it to disk first. E.g. the weak reference (via the ReferenceQueue)
only tells us *after* the object finializer has been run and the
object has been made un-referenceable (is that a word)? At this point
we cannot get a (hard) reference to the object to persist it... it is
too late. The get() method on the Reference object will always return
null at this point.

We also cannot do the write-to-disk during in the object finalizer
because the object does not know why it is being finialized (if due to
a weak reference being cleared, it should persist itself to disk, if
due to being unreferenced, it should do nothing because it is leaving
the JVM forever).

How can we get control over when to persist the object to disk before
it gets collected? We cannot afford to persist every such object to
disk with the idea that sometime it might need to be reconstructed.

Thanks for any ideas...
-Mark
 
X

xarax

Mark M said:
I have read a lot of material on weak/soft/phantom references, but
they do not seem to solve a common memory problem we have. I must be
missing something about how weak references work. Here is the
problem:

We have large in-memory objects that cannot be recreated from the
original source. They can however, be persisted to disk and
reconstructed, but it is very expensive to do so.

We need to have the JVM collect these objects when memory runs low,
but we don't want to pay the cost of putting them to disk unless it is
necessary.

Weak references do not seem to solve this problem because they seem to
assume that the object can be re-created at any time. We must know
ahead of time that the object is going to be disposed so we can save
it to disk first. E.g. the weak reference (via the ReferenceQueue)
only tells us *after* the object finializer has been run and the
object has been made un-referenceable (is that a word)? At this point
we cannot get a (hard) reference to the object to persist it... it is
too late. The get() method on the Reference object will always return
null at this point.

We also cannot do the write-to-disk during in the object finalizer
because the object does not know why it is being finialized (if due to
a weak reference being cleared, it should persist itself to disk, if
due to being unreferenced, it should do nothing because it is leaving
the JVM forever).

How can we get control over when to persist the object to disk before
it gets collected? We cannot afford to persist every such object to
disk with the idea that sometime it might need to be reconstructed.

Thanks for any ideas...
-Mark

You cannot maintain a strong reference anywhere for
the object that is wrapped by a SoftReference or a
WeakReference.

A SoftReference is eligible to be put on its ReferenceQueue
when there are no more reachable strong references. A (daemon)
thread that is blocked on the ReferenceQueue will awaken
after the SoftReference is placed on that ReferenceQueue.

When the thread pulls the SoftReference off of the
ReferenceQueue, it can call get() to retrieve the referent
object (which makes it a strong reference again). The thread
must clear() the SoftReference, because the GC will not clear
a SoftReference (or a WeakReference) went it is placed on a
ReferenceQueue. GC only clears the referent when there is no
associated ReferenceQueue.

The thread writes the referent out to disk, then nullifies
explicitly all references to the object. Use a SoftReference,
rather than a WeakReference, because a SoftReference is
intended to stay around a little longer than a WeakReference.

Be sure to clear() the Reference in either case. A WeakReference
can disappear at any time. A SoftReference is "encouraged" to
disappear only during a full GC cycle (as opposed to an
incremental GC cycle) when there is insufficient free heap
space to accommodate an allocation. However, the Sun JVM
doesn't abide by Sun's own recommendation. Eligible SoftReference
objects are processed by age (elapsed time), regardless
of whether there is any memory stress. Sun JVM will eventually
process all eligible SoftReference objects, even for an idle
application.

Please remember that the point of either a SoftReference
or WeakReference is to allow GC to reclaim the object when
available memory gets low. If you make the referent strong or
delay clearing the SoftReference, then you are asking for an
OutOfMemoryError. So, be very quick about nullifying all references
to the referent object. The Reference API is designed to allow an
application to reclaim *related* resources for a reclaimed object,
not for persisting the (about-to-be) reclaimed object.

2 cents worth. Your mileage may vary.


--
----------------------------
Jeffrey D. Smith
Farsight Systems Corporation
24 BURLINGTON DRIVE
LONGMONT, CO 80501-6906
http://www.farsight-systems.com
z/Debug debugs your Systems/C programs running on IBM z/OS!
Are ISV upgrade fees too high? Check our custom product development!
 
M

Mike Schilling

Mark M said:
I have read a lot of material on weak/soft/phantom references, but
they do not seem to solve a common memory problem we have. I must be
missing something about how weak references work. Here is the
problem:

We have large in-memory objects that cannot be recreated from the
original source. They can however, be persisted to disk and
reconstructed, but it is very expensive to do so.

We need to have the JVM collect these objects when memory runs low,
but we don't want to pay the cost of putting them to disk unless it is
necessary.

Weak references do not seem to solve this problem because they seem to
assume that the object can be re-created at any time. We must know
ahead of time that the object is going to be disposed so we can save
it to disk first. E.g. the weak reference (via the ReferenceQueue)
only tells us *after* the object finializer has been run and the
object has been made un-referenceable (is that a word)? At this point
we cannot get a (hard) reference to the object to persist it... it is
too late. The get() method on the Reference object will always return
null at this point.

We also cannot do the write-to-disk during in the object finalizer
because the object does not know why it is being finialized (if due to
a weak reference being cleared, it should persist itself to disk, if
due to being unreferenced, it should do nothing because it is leaving
the JVM forever).

How can we get control over when to persist the object to disk before
it gets collected? We cannot afford to persist every such object to
disk with the idea that sometime it might need to be reconstructed.

Here's a pattern that was decribed to me once, though I've never used it.
To explain it, I'll make some assertion that might or might not be valid,
but don't affect the utility of the basic ideas for other cases.

Assertions: The large object is called a LargeObject. It's identified by
an instance of the LargeObjectID class.. The goal is to notice when a
LargeObject is no longer referenced, and persist it to disk so that it can
be fetched next time the LargeObject with that ID is required.

Solution:

1. Create a class ClientLargeObject. It has one field, a private one that
points to a LargeObject. It implements the entire public interface of
LargeObject by delegation. Client code never seens LargeObjects, only
ClientLargeObjects.

2. Create a class ClientLORef, a subclass of WeakReference. It has one
field, a LargeObjectID.

3. Create a class LargeObjectCache. It has three fields

1. memMap, a weak hash map from LargeObjectID to ClientLargeObjects, for
memory-resident LargeObjects
2 loMap, a hash map from LargeObjectID to LargeObject, for
memory-resident LargeObjects
3. diskMap, a hash map from LargeObjectID to disk files, for
disk-resident LargeObjects.

Its behavior is:

A. When asked to return a ClientLargeObject for a given ID, it checks
the maps. If the ClientLargeObject is in memMap, it's been found. If not,
but it's in diskMap, the LargeObject is read from disk and a
ClientLargeObject is created to hold it and put in memMap. If neither, the
LargeObject iscreated fresh, and a ClientLargeObject is created to hold it
and put in memMap. Now the ClientLargeObject is returned.
B. Whenever a LargeObject is created (fresh or from disk), it's put
into loMap
C. Whenever a ClientLargeObject is created, a ClientLORef is created
that points to it, has the same ID, and is put into a reference queue.
D. When the ClientLORef is retrieved from the reference queue, that
means the LargeObject with the same ID is no longer in use. Save it to
disk, remove it from loMap and memMap, and enter the disk file in diskMap.

That is, the ClientLargeObject , which contains no real information, is used
to track usage of the LargeObject . The fact that memMap is a WeakHashMap
means that it won't keep otherwise unreferenced CientLargeObjects in memory.
We keep a hard reference to the LargeObject until it's safely written to
disk. The fact that WeakReference can be subclassed allows us to see the
key (LargeObjectID) even after the ClientLargeObject has been collected.
 
N

nos

So you can't recreate it and you won't save it to disk.
Sounds like you care what happens to it. Just hope
for a power fail.
 
C

Chris Uppal

Mark said:
We have large in-memory objects that cannot be recreated from the
original source. They can however, be persisted to disk and
reconstructed, but it is very expensive to do so.

We need to have the JVM collect these objects when memory runs low,
but we don't want to pay the cost of putting them to disk unless it is
necessary.

You are right that Java's weak/etc references are not of direct help in solving
this problem.

I think you need to approach it from a different direction. Here's how I see
it:

I'm assuming that the objects you talk about really are (more or less) single
objects from the POV of the users of those objects -- rather than being
presented as complex webs of objects. If that's not true then I'm pretty sure
you'll have to redesign so that it *is* true :-(

So, you have a class of objects called BigObject which contains within it some
large amount of data which can be persisted to file. So, for simplicity, we
have a BigObject containing fields m_data and m_filename -- m_filename is set
to the name of the file where the data has been persisted (if it has been) and
m_data is either set to a large data object or to null.

You can tell a BigObject to purge() itself which will write the data to file
(if it hasn't been already) and set the m_data to null. You can also tell it
to restore() itself, which will read the data from file and assign it to m_data
(if it isn't restored already). The question is, when to do these things.

[BTW, I'm sorry if I'm going into more detail than you want or need, but the
replies I've seen so far seem to me to miss the point, so I thought it might be
better to take this in small steps.]

The use of soft references won't give you the features that you need -- that
isn't what it was designed to do. So you need to use a different technique.
That technique is necessarily going to be a heuristic. Ideally it'll be one
that you can easily monitor and control (and tweak).

You will need to keep track of all the referenced BigObjects. Using a WeakSet
(or similar) as a static member of the class will do that. BigObjects are
added to that set as part of their constructor, and are removed from it by the
system.

When a new BigObject is created, the class checks the list of already existing
BigObjects and if the number exceeds a certain threshold (or you could use
their total size for a better estimate) it will tell some of them to purge()
themselves. You will have to set the threshold by experimentation or analysis
or luck.

You might also (depending on the structure of your application) be able to trap
out-of-memory events and use them to cause some BigObjects to purge()
themselves. Possible you could try to detect such events "early" by
temporarily creating a large byte[] array. That could improve the accuracy of
the heuristic, but I don't think that it could be made reliable enough, or
convenient enough, to replace the use of a limit on the space taken by existing
BigObjects (not unless your app has a very specific structure, anyway).

How to choose which BigObjects to purge() ? If these cases are rare (as I'd
guess is likely) then you could probably get away with just unconditionally
purge()-ing all the pre-existing BigObjects. Or you could choose some to
sacrifice randomly. OTOH, you might maintain a least recently used list (LRU)
of BigObjects, and purge() the ones that have gone "idle". That is, of course,
quite a bit more work since (a) you'd have to make each operation on a
BigObject update the LRU list, and (b) you'd have to implement the LRU list as
a weak collection of some kind. Perhaps it would be easier just to keep a
timestamp in each BigObject and purge() the oldest ones.

How to get BigObjects to restore() themselves ? Two patterns suggest
themselves. One is for each BigObject to restore() itself automatically before
each use (a nullop if it hasn't been purge()ed). That runs the risk that the
system will "thrash" with BigObjects being purge()ed only to restore()
themselves almost instantly afterwards (that's an inherent risk in any such
system -- it would apply even if you could use soft references -- but at least
you are in control of the run-time parameters and can easily monitor/modify
what's going on.) The other pattern would be to have a explicit
lock()/unlock() calls that the client code is required to call before/after
each block of operations. The lock() call would lock the BigObject into memory
(make it un-purge()-able), the unlock call would make it eligible for
purge()ing. The problem with that is that there's a maintenance headache in
ensuring that lock() and unlock() were always called correctly (you could use
finalisation as a debugging aid to catch cases where BigObjects that died while
still locked). Note that using explicit lock()/unlock() protocol would
simplify the LRU implementation (if you used one) because it would only be the
unlock() call that updated the LRU list.

Of course, there are many ways of setting up such a system. I'd be inclined to
use small Handle objects as the public face of the BigObjects, each of which
would have a reference to either a real BigObject (memory resident) or a
PersistedBigObject (basically just the name of the file where the data was
written). I think that would better separate the logic of the BigObject's real
role(s) from the independent logic of controlling their resource usage.

There'd also be issues with thread-safety, ensuring that persisted files got
cleaned up, tools for monitoring the behaviour of the system, etc. But that's
just code...

-- chris
 
M

Mark M

Jeffrey, thanks for the input. You describe how I *thought* soft
references should work, but according to the API docs, they don't.
See below.
You cannot maintain a strong reference anywhere for
the object that is wrapped by a SoftReference or a
WeakReference.
Understood.


A SoftReference is eligible to be put on its ReferenceQueue
when there are no more reachable strong references. A (daemon)
thread that is blocked on the ReferenceQueue will awaken
after the SoftReference is placed on that ReferenceQueue.
OK.


When the thread pulls the SoftReference off of the
ReferenceQueue, it can call get() to retrieve the referent
object (which makes it a strong reference again). The thread
must clear() the SoftReference, because the GC will not clear
a SoftReference (or a WeakReference) went it is placed on a
ReferenceQueue. GC only clears the referent when there is no
associated ReferenceQueue.

This is what I thought too, but according to the Java API
documentation, this will not work. From the java.lang.ref package
description: "Soft and weak references are automatically cleared by
the collector before being added to the queues with which they are
registered, if any.". Therefore, when the thread pulls the
SoftReference off the ReferenceQueue, calling get() will, by
definition, return NULL! The reference object has already been
cleared, you cannot access the referent. The subclass descriptions of
WeakReference and SoftReference also state that the reference is
cleared before the reference object is queued.
The thread writes the referent out to disk, then nullifies
explicitly all references to the object. Use a SoftReference,
rather than a WeakReference, because a SoftReference is
intended to stay around a little longer than a WeakReference.

But you cannot write out the referent because you cannot access it.
Be sure to clear() the Reference in either case.

According to the docs, it will already be cleared.
 
X

xarax

Mike Schilling said:
/snip/
The fact that memMap is a WeakHashMap
means that it won't keep otherwise unreferenced CientLargeObjects in memory.
We keep a hard reference to the LargeObject until it's safely written to
disk. The fact that WeakReference can be subclassed allows us to see the
key (LargeObjectID) even after the ClientLargeObject has been collected.

A WeakHashMap uses a WeakReference for the *KEYS*, not the values.
 
X

xarax

Mark M said:
Jeffrey, thanks for the input. You describe how I *thought* soft
references should work, but according to the API docs, they don't.
See below.


This is what I thought too, but according to the Java API
documentation, this will not work. From the java.lang.ref package
description: "Soft and weak references are automatically cleared by
the collector before being added to the queues with which they are
registered, if any.". Therefore, when the thread pulls the
SoftReference off the ReferenceQueue, calling get() will, by
definition, return NULL! The reference object has already been
cleared, you cannot access the referent. The subclass descriptions of
WeakReference and SoftReference also state that the reference is
cleared before the reference object is queued.

DOH! We're screwed.
But you cannot write out the referent because you cannot access it.

What the heck was I reading that said it was not cleared
when placed on a ReferenceQueue? oh, well... :(
According to the docs, it will already be cleared.

Well, another poster had a solution of wrapping the LargeObject
as a private field of an instance of a ClientLargeObject. All
threads that are using the large object are given a reference
to the ClientLargeObject, which has delegation methods to access
the LargeObject. All clients only have references to ClientLargeObject,
not directly to the LargeObject.

Now, create a subclass of SoftReference that has a key field
for a HashMap. The map translates the key to the LargeObject.
The referent of the SoftReference is the ClientLargeObject.
The ClientLargeObject has another private field that contains
the key.

When GC decides there are no more strong references to ClientLargeObject,
it places the SoftReference subclass onto the ReferenceQueue. You cannot
get the ClientLargeObject referent, but you *can* get the key field from
the subclass. Use the key field to retrieve the LargeObject from the
HashMap and write it to disk. Be sure to remove the key from the HashMap
and to nullify all references to the SoftReference subclass.

You may need other tracking information (Maps) to reconstitute the
LargeObject and its corresponding ClientLargeObject from disk.


--
----------------------------
Jeffrey D. Smith
Farsight Systems Corporation
24 BURLINGTON DRIVE
LONGMONT, CO 80501-6906
http://www.farsight-systems.com
z/Debug debugs your Systems/C programs running on IBM z/OS!
Are ISV upgrade fees too high? Check our custom product development!
 
M

Mike Schilling

xarax said:
A WeakHashMap uses a WeakReference for the *KEYS*, not the values.

Drat; I should know better than to write about something without trying it.

Let's see, the purpose of memMapis to hadn out the same ClientLargeObject to
all requestors, rather than create N of them. This isn't a memory
optimization, since ClientLargeObjects are small. The point is to detect
when a LargeObject is unreferenced by seeing the corresponding
ClientLargeObjects on the reference queue.

How about this: get rid of memMap and create ClientLargeObjects freely.
Reference count LargeObjects, and persist them when their *last*
ClientLargeObject is enqueued.
 
P

Phillip Lord

Mike> Drat; I should know better than to write about something
Mike> without trying it.


It's very easy to have a HashMap for the values...you just stick the
values into a Reference when you put them, and take them out again
when you finish. It's like storing primitive types as values, except
using Reference objects rather than Integer or whatever.

The point with the WeakHashMap is that you have to be cleverer,
because you need to remove keys (and their values) which have been
GC'd.

I guess every time a get is called on the Map the WeakHashMap it
checks its reference queue are removes those objects.

Phil
 
M

Mike Schilling

It's very easy to have a HashMap for the values...you just stick the
values into a Reference when you put them, and take them out again
when you finish. It's like storing primitive types as values, except
using Reference objects rather than Integer or whatever.

The point with the WeakHashMap is that you have to be cleverer,
because you need to remove keys (and their values) which have been
GC'd.

I guess every time a get is called on the Map the WeakHashMap it
checks its reference queue are removes those objects.

That's how I'd do it. Every so often I wish for something like a
WeakHashMap on .NET, which doesn't have one, but I don't build it because I
can't decide on an algorithm that doesn't require a ReferenceQueue (which
..NET also lacks.)

(I suppose one could search all the hash chains for dead references every N
gets, or search the current hash chain every get, or search every hash chain
after the map grows by N% since the last full search or... )
 
C

Chris Uppal

Mike said:
How about this: get rid of memMap and create ClientLargeObjects freely.
Reference count LargeObjects, and persist them when their *last*
ClientLargeObject is enqueued.

I may be misreading the OP, but I think that part of the requirement is that:

1 create LargeObject
2 use LargeObject
3 drop LargeObject

should not write LargeObject to disk *unless* the space is required sometime
during (2).

And I may be misunderstanding you, or missing the obvious, but I don't see how
these proposals meet that part of the "spec".

-- chris
 
M

Mike Schilling

Chris Uppal said:
I may be misreading the OP, but I think that part of the requirement is that:

1 create LargeObject
2 use LargeObject
3 drop LargeObject

should not write LargeObject to disk *unless* the space is required sometime
during (2).

And I may be misunderstanding you, or missing the obvious, but I don't see how
these proposals meet that part of the "spec".

I was trying to address "How do I use weak references to detect that an
object is no longer referenced without losing the information in that
object?" I expect that's not a complete solution to the OP's problem, but
it's a start.
 
C

Chris Uppal

Mike said:
I was trying to address "How do I use weak references to detect that an
object is no longer referenced without losing the information in that
object?" I expect that's not a complete solution to the OP's problem, but
it's a start.

Fair enough.

BTW, did *anyone* see my attempt at this problem posted 2004/03/02 ? If not
then I'm going to be even more miffed about my ISP's (Nildram) increasingly
crap NNTP service than I already am.

-- chris
 
M

Mike Schilling

Chris Uppal said:
Fair enough.

BTW, did *anyone* see my attempt at this problem posted 2004/03/02 ? If not
then I'm going to be even more miffed about my ISP's (Nildram) increasingly
crap NNTP service than I already am.

I saw it.

A good way to tell if a posting made it into the world is to check Google
Groups.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top