Nulling an object

Mike Schilling · May 18, 2009

Karl said:
- The stack example (although failure of a stack to truly
unreference
a popped element would be unexpected and, I hope, clearly
documented)

I think we'd all agree that it's an issue for the stack implementor,
and the stack user should never need to be aware of it, i.e. if he is,
there's a bug in the implementation. Note that although we all keep
saying "stack", ArrayList.remove() has the same issue. (it isn't
suffiicent to close the gap and adjust the size.) removed
..

- Objects created near the head of the call stack (say, main) that
remain in scope for a very long time, even though they are no longer
needed. I suspect that there are ways of organizing the code so as
to
eliminate that behavior, but perhaps just setting the reference to
null when finished with it is the simplest thing.

Agrreed on all counts. I'll add that doing either one requires
understanding the problem. That is, GC in Java is automatic but not
foolproof.

Mayeul · May 18, 2009

Frank said:
Frank Cisco said:

If you null and object ie. obj = null, when is it cleared from memory?
Immediately or at the next garbage collection?

Click to expand...

[snip]
I've noticed a lot of null setting in the core
java APIs - so it can't be that harmful. Anyone from Sun Microsystems about?

I don't think anyone would argue that setting a reference to null is
harmful in any other way than adding unnecessary code and thus
complicating maintainability.

As for the null settings in Java APIs, well:

- Expressing a reference to no object, (yet or anymore, for instance,)
may be handy.
- References in local variables normally do not need to be set to null
to become candidate for garbage collection: this will automatically
happen when leaving the scope of the variable. But member variables may
be a different story if the member's lifecycle should be way shorter
than the holding object's lifecycle.

Roedy Green · May 18, 2009

If you null and object ie. obj = null, when is it cleared from memory?
Immediately or at the next garbage collection?

see http://mindprod.com/jgloss/garbagecollection.html
--
Roedy Green Canadian Mind Products
http://mindprod.com

"It wasn’t the Exxon Valdez captain’s driving that caused the Alaskan oil spill. It was yours."
~ Greenpeace advertisement New York Times 1990-02-25

Lew · May 18, 2009

Mayeul said:
I don't think anyone would argue that setting a reference to null is
harmful in any other way than adding unnecessary code and thus
complicating maintainability.

Someone already did, just upthread, twice.

The argument was that unnecessary traversal of objects to null them
out might cause cache misses or interfere with certain GC strategies.

In the article from Brian Goetz that I cited upthread, the author
suggests that explicit nulling can harm GC performance.

Lew · May 18, 2009

Mayeul said:
- References in local variables normally do not need to be set to null
to become candidate for garbage collection: this will automatically
happen when leaving the scope of the variable. But member variables may
be a different story if the member's lifecycle should be way shorter
than the holding object's lifecycle.

Quite often, a member variable needing to be nulled before the death
of its containing object indicates that the member variable should be
down-factored to a method variable.

The principle is that member variables should represent the state of
the object. Of course, it is common that 'null' is a valid attribute
state, but it's also very common that 'null' is not a valid attribute
state.

Member variables should generally only become 'null' if that is a
valid representation of instance state for that attribute, not because
it "helps" the GC. In other words, the member variable should be
valid, 'null' or not, for the entire life of the object.

Lew · May 19, 2009

Peter said:
[...]
I've reviewed my posts in this thread, and they make no special effort
to claim that what you wrote was wrong.

Click to expand...

What's your definition of "special effort"? Are you simply saying that
your statement that I was wrong took no extra effort on your part, but
rather was done easily?

I've stated a few times now that I misunderstood you to have understood my
references to implementation of certain structures to mean that I was
referring to implementation, not use, of those structures. Clearly had I been
correct in thinking that you had followed my references and understood what I
was saying, then you would have been wrong when you said, "In fact, even the
example Lew gives of a Stack is in reality unlikely to require specific
handling." Since the example I gave of a Stack actually does require specific
handling, in the implementation thereof, as explained in both the links I
provided relevant to my point, the statement is, on the face of it, wrong.
However, since you explained that I had misapprehended the context whereof you
spoke, and that you in fact were not speaking to the points I made nor in the
context of the examples I provided, but instead to a different point and a
different context, your statement was not, on further analysis, wrong. Such
things happen.

The statement "Joshua Bloch and Brian Goetz and Sun generally disagree
with you" is to me a statement with the clear implication that I am
wrong. The only reasonable inference is that you believe those parties
to be right (otherwise, why bring them up at all?), and if they are
right, and they disagree with me, that implicitly means I must be wrong.

Had you read either of the links I provided to either of those authors, or to
the Sun Tech Tip to which I linked, you would have understood that I
misunderstood that you misunderstood what I had said.

The fact is, they and I are not in disagreement at all. But for you to
say we are is to state that I am wrong.

The only thing about which you were wrong was the context of which I spoke. I
had mistakenly thought that the links which explained three different ways the
examples to which I explicitly referred were sufficient to disambiguate my
comments. Perhaps you couldn't be bothered to follow those links, and thus
misinterpreted my comments. No matter. I was mistaken. These things happen.

Perhaps that's not what you meant and if so, I accept your apology in
saying that what I wrote was not wrong. But it's an important point for
you to understand, if you are to avoid a similar mistake in the future.

This is Usenet. This is human interaction. People miscommunicate. You
failed to understand what I said, I failed to understand that what I said was
subject to misunderstanding. These things happen.

If you're not concerned about that, well...I guess that's your
prerogative. I've offered what insight I can, and don't think there's
anything else productive I might write on the topic.

Your insight is assimilated.

Arne Vajhøj · May 19, 2009

Frank said:
Cheers for the responses. I guess the only way to say whether this is an
effective practice is to see how the GC does it's work - maybe helping the
GC out isn't such a bad idea. I've noticed a lot of null setting in the core
java APIs - so it can't be that harmful. Anyone from Sun Microsystems about?

I would say that it is very bad practice as a general practice (few
exceptions are covered previously).

It is certainly code clutter.

And it is more likely to decrease performance than to increase it.

So it is choosing both the plague and cholera.

Arne

Arved Sandstrom · May 19, 2009

Arne said:
I would say that it is very bad practice as a general practice (few
exceptions are covered previously).

It is certainly code clutter.

And it is more likely to decrease performance than to increase it.

So it is choosing both the plague and cholera.

Arne

Without looking at the source and seeing why various references are
being set to null, we have no idea of whether that's being done in a
(misguided) attempt to help out GC, or whether it's being done simply to
set a reference to null. There are, after all, reasons why you'd set a
reference to null - it's a legitimate value for a reference variable.

In any case, isn't it a bit of a strong statement that setting
references to null is going to decrease performance?

AHS

Arne Vajhøj · May 19, 2009

Arved said:
Without looking at the source and seeing why various references are
being set to null, we have no idea of whether that's being done in a
(misguided) attempt to help out GC, or whether it's being done simply to
set a reference to null. There are, after all, reasons why you'd set a
reference to null - it's a legitimate value for a reference variable.

In any case, isn't it a bit of a strong statement that setting
references to null is going to decrease performance?

Some reasons for why it could happen has been given.

I will not consider them that likely.

But I consider them more likely than the setting to null will
help the GC.

Arne

Arved Sandstrom · May 19, 2009

Arne said:
Arved Sandstrom wrote:

[ SNIP ]

Some reasons for why it could happen has been given.

I will not consider them that likely.

But I consider them more likely than the setting to null will
help the GC.

Arne

I've been following the thread in detail. I would agree with you and
others that setting references to null in the hopes of helping GC is
misguided. I read Goetz' article that Lew pointed out; there's no
question that it's possible to construct examples where over-zealous
unnecessary nulling of references is a performance hit, but it's
generally a performance hit in the sense that any unnecessary executed
code is a performance hit.

AHS

Seamus MacRae · May 20, 2009

Mike said:
Peter said:

In fact, even the example Lew gives of a Stack is in reality
unlikely
to require specific handling. In particular, if you're actually
_done_ with the Stack, simply getting rid of the reference to the
Stack itself is sufficient for releasing all the objects that the
Stack itself references.

Click to expand...

The issue witrh stacks (and ArrayLists) is that a naive implementation
retains unnecessary references to objects. Assume something like:
(not tested or compiled, so please ignore any typos, as well as the
fact that neither overflow nor underflow is handled.)

public class SimpleStack
{
private Object stack[] = new Object[MAX_SIZE];
int size = 0;

public void push(Object o)
{
stack[++size] = o;
}

public Object pop()
{
return stack[--size];
}
}

If you push three objects and then pop two of them, the stack still
references all three, and will continue to do so for an indefinite
period of time. The solution is to replace pop()

The solution is a really simple stack. Also gets rid of that MAX_SIZE wart:

public class SimpleStack<T> {
private final class Entry {
public T item = null;
public Entry next = null;
}
private Entry top = null;
public void push (T item) {
Entry novus = new Entry();
novus.item = item;
novus.next = top;
top = novus;
}
public T pop () {
if (top == null) throw new NoSuchElementException();
Entry popped = top;
top = top.next;
return popped.item;
}
}

Only slightly longer code, no array, no MAX_SIZE, and no packratting:
pop leaves the former top Entry object unreachable, and therefore the
item in it is no longer reachable through the stack object. If the
caller of pop discards the reference (when done using it) and no other
references to the item linger, both the Entry object and the item become
eligible for garbage collection.

Speed efficiency: assuming decent optimizations, push does a pointer
bump and a couple of assignments to an object header for "new Entry()"
(assuming a decent JVM and a non-full TLAB), and three more pointer
assignments, some stack pushes (for item and novus), and some stack
pops. The other implementation used one push and one pop (o), an integer
addition, an integer comparison and conditional branch (for the bounds
check), a pointer addition and a pointer dereference (indexing into the
array) and a pointer assignment (assignment of o to dereferenced array
cell).

Probably about comparable.

Pop does a pointer comparison, conditional branch, two pointer
assignments, and one push and pop (assuming the optimizer eliminates
"popped" and its output just pushes the return value, does the top =
top.next assignment, then executes RET). The other implementation uses
one push and pop (return value), an integer subtraction, an integer
comparison and conditional branch (bounds check), a pointer addition and
a pointer dereference (array cell), and a pointer assignment (return
value extraction). Avoiding packratting means adding one more pointer
assignment (stack[size] = null; assumes the optimizer gets rid of the
temporary holding the return value and its output just pushes the return
value, sets stack[size] = null, and executes RET).

Again, probably about comparable.

Space efficiency: The other implementation chews up MAX_SIZE + 1 object
references worth of memory at all times, plus two object headers (stack
and array); mine, 2*(actual size of stack) + 1 object references and
(actual size of stack) + 1 object headers. Mine's better for stacks that
need to be able to get large but spend most of their time small; yours
is better for stacks of known bounded size that tend to stay close to
full (depth-first traversals of balanced trees of known max depth, say).

(The above assumes sensible JITting of the methods, without inlining
which would eliminate some of the pushes and pops. It also does not
consider the items stored in the stack to contribute to the space
complexity of the algorithms, only the stack's infrastructure.)

Mike Schilling · May 20, 2009

Seamus said:
Mike said:

Peter said:

In fact, even the example Lew gives of a Stack is in reality
unlikely
to require specific handling. In particular, if you're actually
_done_ with the Stack, simply getting rid of the reference to the
Stack itself is sufficient for releasing all the objects that the
Stack itself references.

Click to expand...

The issue witrh stacks (and ArrayLists) is that a naive
implementation retains unnecessary references to objects. Assume
something like: (not tested or compiled, so please ignore any
typos,
as well as the fact that neither overflow nor underflow is
handled.)

public class SimpleStack
{
private Object stack[] = new Object[MAX_SIZE];
int size = 0;

public void push(Object o)
{
stack[++size] = o;
}

public Object pop()
{
return stack[--size];
}
}

If you push three objects and then pop two of them, the stack still
references all three, and will continue to do so for an indefinite
period of time. The solution is to replace pop()

Click to expand...

The solution is a really simple stack. Also gets rid of that
MAX_SIZE
wart:

As I said, it wasn't intended to be a real implementation; it's was
just real enough to illustrate the problem.

Seamus MacRae · May 20, 2009

Mike said:
As I said, it wasn't intended to be a real implementation; it's was
just real enough to illustrate the problem.

Well, mine was just real enough to illustrate how easily it could be
avoided, without much if any performance loss under most circumstances
and without explicit nulling.

Karl Uppiano · May 20, 2009

Peter Duniho said:
]

I have a vague recollection of this question coming up previously, but
don't remember exactly what the conclusion was.

For context: I assume the above statement is referring to a situation like
this:

void method()
{
object ref = ...;

sub_methodA(ref);
sub_methodB();
sub_methodC();
}

With the proposal that one assign "null" to the variable "ref" immediately
after the call to sub_methodA().

The thing I can't remember is whether Java's JITter and GC deal with this
automatically. In .NET (which I'm somewhat more familiar with), they do.
That is, the JIT compiler can tell the variable isn't used past a certain
point, and the GC treats the reference as unreachable past that point. In
fact, assigning "null" to the variable can actually delay collection,
because it causes the variable to be used at a later point in the code
than without such an assignment.

I actually didn't think this through completely. "ref" in the example above
is unused after method a, and GC should realize it. Depending on usage, the
variable ref actually could be GC'd even before sub_methodA returns, and
certainly after.

Mike Schilling · May 20, 2009

Seamus said:
Well, mine was just real enough to illustrate how easily it could be
avoided, without much if any performance loss under most
circumstances
and without explicit nulling.

You will now demonstrate how to implement ArrayList without using an
array

Seamus MacRae · May 20, 2009

Mike said:
You will now demonstrate how to implement ArrayList without using an
array

Easy:

public class ArrayList<T> extends AbstractList<T> {
public int __start_addr;
public native int size ();
public native void add (T obj);
....

*(base+offs) = foo;

....

Okay, it won't be very efficient and will even need an evil finalizer,
but malloc() and pointer arithmetic is the only way I can think of to
get fast random access without an array on short notice.

If you consider this to not count because it could be changed to e.g.
base[offs] = foo; and still work the same, then "aww **** it".

The distinction between a java applet and an application	1	Jan 4, 2023
Problem with a login script, SESSION user rights and put this together so it works with the other pages and MySQL. Code examples.	2	May 5, 2023
Windows LLDP Driver Responds With No Data	0	Mar 17, 2023
Infinite loop problem	1	Nov 4, 2023
How does a HEAD pointer end up pointing to the first node in a linked list?	3	Jan 24, 2023
Need an if statement	8	Jun 13, 2023
Updating JSON object	1	Aug 12, 2023
How do I save information from an GUI into a XML-file?	0	Aug 17, 2022

Nulling an object

Mike Schilling

Mayeul

Roedy Green

Lew

Lew

Lew

Arne Vajhøj

Arved Sandstrom

Arne Vajhøj

Arved Sandstrom

Seamus MacRae

Mike Schilling

Seamus MacRae

Karl Uppiano

Mike Schilling

Seamus MacRae

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads