Doing one last thing to a WeakReference

T

Tom Anderson

We recently observed that what seemed like an innocent addition of a
small finalize() method to a class of large objects severely impacted
memory usage. With a finalizer defined, the object's memory can't be
reclaimed as quickly, since it has to stay around until the finalizer
thread can run. Admittedly, this was in a test run where the CPU was
kept arterially busy, but the results were enough to make us find a
different approach.

Interesting. Is your feeling that the problem arose because the objects
were large, rather than numerous? Or because, either way, they were using
a lot of memory?

It would be really nice to have some hard data on this.

I was thinking about this while riding home last night, and it struck me
that you might be able to handle finalizers more efficiently using a write
barrier, as used by a generational garbage collector. You could trap all
writes from the code that runs during finalization, and examine them to
see if they were storing a pointer to the dying objects into the live
heap. If one was, you'd mark the objects live again, and leave them to be
collected later. If no such write occurred, you'd know it was okay to free
the objects there and then, without having to re-scan. There are two ways
to implement write barriers: using OS memory protection functions, and
using the compiler or interpreter to do some special processing on all
writes. The former would probably involve trapping all writes to anywhere
in memory during finalization, since the objects being finalized would
probably be scattered all over memory, and the latter would probably
involve throwing away any compiled code you'd already made and
interpreting or re-compiling the bytecode used during finalization.
Neither is cheap - but both mean that in the usual case, finalizers
wouldn't lead to a big drag on memory use.
Suppose the finalizer tries to allocate memory?

Bad finalizer! I guess there are two options: throw an OOME there and then
in the finalizer, or suspend that finalizer thread and try to finalize and
dispose of some other objects, to free some memory, then return to the
allocating finalizer and finish that one.

tom
 
T

Tom Anderson

Or suppose the finalizer leaks a hard-reference of itself?

Then you don't end up freeing any memory, and you go back to the
allocation that triggered the finalization and throw an OOME. But at least
you tried.

tom
 
T

Tom Anderson

A slightly larger issue is,

I note this isn't an answer to my question. May i take it as an implicit
acknowledgement that try-finally is not useful here?
what if your front end code grabs a handle, stuffs it in a session or
application object, and never gives it up? You'd be obliged to maintain
that handle forever, and you'd have a de-facto resource leak.

If the session or application object lives forever, then yes, that would
be a problem. Here, the sessions are released by the servlet container
when they end, so that's not a problem.
So I wouldn't give the handles out at all. I'd keep them in my CRUD/ORM
whatever object, and give the client something else, that wouldn't
matter if the client abuses it or forget to close it. Don't give state
to your clients, it's bad.

And how does the CRUD/ORM whatever object know when it can close the
handle?
This also lets you handle errors more cleanly. What if your legacy
database coughs up a hairball right when the client needs to use it?
Legacy apps don't always have the nicest error handling characteristics.
Separating the two concerns (an object for your clients, and one for
your connection) lets you deal with events like that independently.

I don't think i ever specified whether i was doing or not doing this in
the handle - if you like, how well-padded the handle is. It would, as you
suggest, be very sensible to do this in a layer between the native
interface and the client interface, so that the client code doesn't have
to deal with the cruft, but that could all be part of the handle.
Of course, I'm not actually writing your app, so I don't know your
requirements, but that's the first thing that occurs to me. Bogart the
handles, keep your client-side connection object separate. Just hand
the client the data, for example, and some sort of key it can check back
in with. Never pass around a reference, pointer or handle from a lower
level layer to an upper level one.

I'm still curious about how the middle layer knows when it can close
handles. I think i have an idea of what you're thinking, but if you could
explain it, that would be most kind.

Anyway, now i'm going to throw a spanner in the works and ask what happens
if i add another detail: if you're booking a hotel, you have to do the
searching and booking with the same handle (rather than open, search,
close, open another, book, close). The system that travel agents use to
book airline flights is actually quite like this, i believe - the search
stakes a claim on some seats, which can then either be booked or released.

tom
 
M

Mark Space

Tom said:
I'm still curious about how the middle layer knows when it can close
handles. I think i have an idea of what you're thinking, but if you
could explain it, that would be most kind.

Whenever it wants. You've completely decoupled the connection from the
upper level layers. If you meant "give me some advice how to do it,"
then I'd say for each connection you support, read the connection info
from a config file, and allow a time out value for each link. This way
you can arbitrarily set a time out for each connection.
Anyway, now i'm going to throw a spanner in the works and ask what
happens if i add another detail: if you're booking a hotel, you have to
do the searching and booking with the same handle (rather than open,
search, close, open another, book, close). The system that travel agents
use to book airline flights is actually quite like this, i believe - the
search stakes a claim on some seats, which can then either be booked or
released.

If the design requires you to maintain a connection a connection to hold
a lock on seats, then that's what you'll have to do. I'd use a time out
as I mention above. After say 5 minutes of non-use, the connection is
closed. If an agent was holding locks on seats for more than 5 minutes
and walked away from their desk, too bad.

If they complain, you have the option to increase their timeout. If
they still complain, set the time out for 24 hours. Then HUP the server
each night at 2AM for "scheduled maintenance" so you get get some
backups done. ;-)

But the important bit is, you can close the connection if the client
software isn't holding a handle. If an agent locks some seats and then
leaves their terminal and goes home, your design isn't borked. You
still have the option to close their connection away, because they
aren't holding the raw connection object, just a key you handed them.
Like wise, if the other side (database, mainframe, etc.) hic-ups and
closes the connections on you, you don't have to worry about having a
lot of invalid handles hanging around, you just have old keys that are
no longer valid, which is a normal state for your design.
 
M

Mike Schilling

Tom Anderson said:
On Wed, 18 Jun 2008, Mike Schilling wrote:


Interesting. Is your feeling that the problem arose because the objects
were large, rather than numerous? Or because, either way, they were using
a lot of memory?

The last of these. (And "arterially" should of course be "artificially".
Misspelling a word that badly requires a spellchecker.)
 
P

Paul J. Lucas

Mark said:
Try-catch-finally. The "finally" portion is always run.

No it's not:

try {
// ...
if ( catastrophe )
System.exit( -1 );
}
finally {
// not executed if catastrophe == true
}

- Paul
 
P

Paul J. Lucas

Tom said:
Yes. Reference counting is a good solution to this problem, probably the
best one.

Too bad Java doesn't provide it. It would be nice if CountedReference existed
and the increment/decrement of the counter were handled automatically by the VM.
It would be nicer still if you could specify a method to be called once the
count reaches 0 and just before the referrent is reclaimed.

- Paul
 
T

Tom Anderson

Too bad Java doesn't provide it. It would be nice if CountedReference
existed and the increment/decrement of the counter were handled
automatically by the VM. It would be nicer still if you could specify a
method to be called once the count reaches 0 and just before the
referrent is reclaimed.

Er ...

How would this be any better than proper garbage collection?

I was thinking you'd do reference counting explicitly, using the 'destroy'
or whatever method in the session objects to do a decrement on the
relevant HotelHandle. Leaving it to the VM to do things based on when
objects die isn't a good solution here. You need to work at a higher level
of abstraction.

tom
 
S

Stefan Ram

Tom Anderson said:
How would this be any better than proper garbage collection?

One can only assume that the reclaimer is started
if there is insufficient memory.

The handles are an independent resource: The system may
already have run out of handles, when it is not yet out of
memory.

So you need to write a custom garbage collector for
handles. Which is not ...
 
P

Paul J. Lucas

Tom said:
Er ...

How would this be any better than proper garbage collection?

Because I'd want the VM to *guarantee* that the dispose() method would be called
*as* *soon* *as* the count reaches 0. This is in contrast to WeakReferences
that may never have their referrents be reclaimed.

Hence, it's *not* "garbage collection" in the Java sense at all.
I was thinking you'd do reference counting explicitly, using the
'destroy' or whatever method in the session objects to do a decrement on
the relevant HotelHandle.

That's just it: for my use case, I *can't* do it explicitly because I have no
way of knowing if my resource is either currently in use or will be used again
in the not-too-distant future (because some code/object somewhere still has a
hard reference to it).

For the record, the use-case is as follows. I have some class (ImageInfo) that
reads image files (possibly very large image files). As part of the class's
implementation, it opens a RandomAccessFile on the image and reads "chunks" of
the file into memory. (The "chunks" themselves are held by SoftReferences -- a
SoftChunkyByteBuffer -- and this part works fine.)

The class has many methods, each of which accesses different parts of the image,
e.g., get the image's metadata (getMetadata()), the image's color profile
(getColorProfile()), the image's thumbnail (getThumbnail()), and the image
itself (getImage()). All said methods must ultimately read data from the "chunks."

For a given image file F, a new ImageInfo(F) is created. Once created, the rest
of the code will use the various methods of ImageInfo in an arbitrary order.
The first method called (regardless of which method it is) creates the
SoftChunkyByteBuffer and, after creation, it (and the RandomAccessFile it uses)
sticks around.

After the first method is called, I can't just close the RandomAccessFile
because it might be the case that another method is about to be called for the
same image F via the same ImageInfo object. In fact, I can *never* explicitly
close the RandomAccessFile because of this.

So what I currently have to do it an ugly hack and keep global track of all the
RandomAccessFiles open and, at some time(s) when it's essential that they be
closed, manually close them.*

Instead, what I wish Java provided would be some kind of TenuousReference (i.e.,
weaker than WeakReference) that would *gaurantee* that the referrent would be
reclaimed *as* *soon* *as* it became "tenuously reachable" and its finalizer
would be called. I could then close the RandomAccessFile in the finalizer.

But, given Java as it is, I see no better way to do this.

- Paul

* The times when closing a file is essential, at least on Windows, is when you
want to move or delete it.
 
L

Lew

Paul J. Lucas said:
But, given Java as it is, I see no better way to do this.

You make it sound like Java's fault. No language nor its library will
have all your use cases coded for you; at some point you will have to
write your own solution.

This is not a flaw in Java.
 
D

Daniele Futtorovic

That's just it: for my use case, I *can't* do it explicitly because I
have no way of knowing if my resource is either currently in use or will
be used again in the not-too-distant future (because some code/object
somewhere still has a hard reference to it).

For the record, the use-case is as follows. I have some class
(ImageInfo) that reads image files (possibly very large image files).
As part of the class's implementation, it opens a RandomAccessFile on
the image and reads "chunks" of the file into memory. (The "chunks"
themselves are held by SoftReferences -- a SoftChunkyByteBuffer -- and
this part works fine.)

The class has many methods, each of which accesses different parts of
the image, e.g., get the image's metadata (getMetadata()), the image's
color profile (getColorProfile()), the image's thumbnail
(getThumbnail()), and the image itself (getImage()). All said methods
must ultimately read data from the "chunks."

For a given image file F, a new ImageInfo(F) is created. Once created,
the rest of the code will use the various methods of ImageInfo in an
arbitrary order. The first method called (regardless of which method it
is) creates the SoftChunkyByteBuffer and, after creation, it (and the
RandomAccessFile it uses) sticks around.

After the first method is called, I can't just close the
RandomAccessFile because it might be the case that another method is
about to be called for the same image F via the same ImageInfo object.
In fact, I can *never* explicitly close the RandomAccessFile because of
this.

Open the file on-demand then. Keep your chunky buffers as
WeakReferences, so that they stick around a tad bit stronger, and close
the file after, say, a short timeout. By experimenting around, you ought
to find a reasonable trade-off balance for how soon to close.
I mean, there isn't a particularly huge overhead associated with opening
a RAF, is there? One FD. You may even be able to do without the
buffering of significant amounts of data (as opposed to metadata)
altogether.

[Apologies if I missed some important parts of the thread -- I haven't
read it all]
 
P

Paul J. Lucas

Lew said:
You make it sound like Java's fault. No language nor its library will
have all your use cases coded for you; at some point you will have to
write your own solution.

This is not a flaw in Java.

Yes it is. In Java, you have very limited ways you can interact with the
garbage collector. If one of the ways, say WeakReferences as is the case here,
doesn't do what you want, you're just plain out of luck.

In C++, however, I could easily code my own reference-counted class and
destructors and get exactly the behavior I want (thanks to, among other things,
stack allocation of objects that permit the "resource acquisition is
initialization" technique).

It's a perfectly reasonable (and common) thing to want to:

a) guarantee an object is destroyed as soon as it's no longer used (and not
"some later time").

b) do one last thing to an object before it's destroyed.

Java simply doesn't let you do those.

- Paul
 
S

Stefan Ram

Paul J. Lucas said:
It's a perfectly reasonable (and common) thing to want to:
a) guarantee an object is destroyed as soon as it's no longer used (and not
"some later time").
b) do one last thing to an object before it's destroyed.
Java simply doesn't let you do those.

In Java, the operation »close()« can be implemented
and called as soon as an object is »no longer used«.
 
S

Stefan Ram

Paul J. Lucas said:
It's a perfectly reasonable (and common) thing to want to:
a) guarantee an object is destroyed as soon as it's no longer used (and not
"some later time").
b) do one last thing to an object before it's destroyed.

C++ has destructors.

Are there any other (object-oriented) programming
languages with destructors?
 
M

Mike Schilling

Lew said:
Eschew System.exit(), especially in try blocks.

It doesn't matter really; a JVM can exit prematurely for many reasons,
up to and including the machine being turned off. Try-finally
guarantees resource cleanup only for resources also freed when a
process exits (normally or otherwise.) The same is true of C++
destructors. If you're worried about more persistent resources, you
need something like transactional persistent storage (e.g. a
database.)
 
P

Paul J. Lucas

Lew said:
No, it isn't.

Real compelling argument there. Java programmers need to stop being in denial
and admit that Java has flaws just like every other language.

- Paul
 
P

Paul J. Lucas

Lew said:
Eschew System.exit(), especially in try blocks.

That doesn't invalidate my point. And it should be used when appropriate.
That's why Java has shutdownHooks.

- Paul
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top