Short-lived Objects - good or bad?

Andreas Leitgeb · Apr 10, 2008

My own position (and also what I've gathered in workshops
before my SCJP and also from reading this newsgroup) was,
that generally it is better to allocate and drop objects
inside a loop, rather than allocate them before the loop
and re-initialize them each iteration. (That is due to how
GC works with separating short-living objects from longer-
living objects, which could otoh be also seen as a non-
guaranteed implementation detail, afterall)
There are of course exceptions, where *re*-initialization
cost would be considerably lower than first initialization,
e.g. where only a fraction of the object's state would vary
with each iteration. I dare to say that these were quite rare
in the reviewed code.

Just recently I was confronted with comments from a reviewer,
(with whom I'm not in the position for arguing directly) who
criticized the code for (among other stuff) its rather frequent
use of new inside loops.

Judging from other comments, it doesn't look like he really
analyzed each particular situation, but more likely made a
general statement, and counted actual occurrances of certain
patterns. I could of course be wrong here.

Did I miss out some paradigm shift away from short-lived objects
recently?

Another example of different judgement is "try-catch inside
loops" (which I'd have seen as dictated from algorithm logic,
rather than either a good or bad choice).

Owen Jacobson · Apr 10, 2008

You are right. Your reviewer needs to learn a few things.

Out of curiousity, I benchmarked this a while ago on some of my own
code [0], first taking a version that allocated new objects relatively
freely and then writing a second version that performed the same
operations but preferred mutating existing objects.

Over ten million iterations, the difference in execution time was on
the order of hundreds of milliseconds - that is, utterly negligible.

-o

[0] Code extracted from an application whose performance I care about,
that is, real code and not code written for the benchmark.

Arved Sandstrom · Apr 10, 2008

Andreas Leitgeb said:
My own position (and also what I've gathered in workshops
before my SCJP and also from reading this newsgroup) was,
that generally it is better to allocate and drop objects
inside a loop, rather than allocate them before the loop
and re-initialize them each iteration. (That is due to how
GC works with separating short-living objects from longer-
living objects, which could otoh be also seen as a non-
guaranteed implementation detail, afterall)
There are of course exceptions, where *re*-initialization
cost would be considerably lower than first initialization,
e.g. where only a fraction of the object's state would vary
with each iteration. I dare to say that these were quite rare
in the reviewed code.

Just recently I was confronted with comments from a reviewer,
(with whom I'm not in the position for arguing directly) who
criticized the code for (among other stuff) its rather frequent
use of new inside loops.

Judging from other comments, it doesn't look like he really
analyzed each particular situation, but more likely made a
general statement, and counted actual occurrances of certain
patterns. I could of course be wrong here.

And let's keep in mind that no small number of Java performance
recommendations (including some of the most frequent ones in Google searches
etc) date back 6-8 years, which means that the issues they address may have
much less impact now. Furthermore, performance tuning recommendations come
in at least two flavours - (1) stuff that should be done from the gitgo, and
(2) performance tuning that comes into some conflict with "clean" design. A
general comment that relates to (1) is justifiable in any review; a general
comment related to (2) should only arise if performance is in fact an issue,
and the piece of code in question is a proven offender.

Did I miss out some paradigm shift away from short-lived objects
recently?

Depends on your definition of "recent". To read a bunch of significantly
older articles/books you'd certainly be inclined to avoid object creation
inside loops.

Another example of different judgement is "try-catch inside
loops" (which I'd have seen as dictated from algorithm logic,
rather than either a good or bad choice).

I'd see both situations as being dictated (in 2008) more by algorithm logic
and readability/maintenance rather than by trying to regain every last
microsecond at design time. In the first case, if your loops aren't creating
millions of objects (*), and a semantically new object is being created each
time through the loop (almost all of the fields are changing) why not create
new objects? If the semantics actually are for _updating_ an existing
object, then that could be considered instead.

* Note: even if your loop is creating a large number of objects, if the
semantics (program logic) are for creation, not update, and the rest of the
code inside the loop involves significantly more processing than creation,
then why not just create new objects inside the loop?

As far as try-catch inside loops, well, it would really boil down to, does
the exception stop the loop or not?

AHS

Andreas Leitgeb · Apr 10, 2008

Thanks to all for confirmation. I was a bit intimidated,
because the reviewer was professional (as in: not just a
casual programmer to whom that job was ordered upon, but
someone whose business is reviewing.)

Not that I could refer him to this thread for my defense,
but it helped me a lot to be told, that I'm not entirely
off-track on that topic.

Kenneth P. Turvey said:
I'd have to say that my experience has been different. I think it really
depends on the objects being created. A single new inside an inner loop
can be fine, but if that new results in many, many sub-objects being
created then you might have problems with it.

Yes, I covered that by mentioning the initialisation effort compared
to the re-initialization (or update) effort.

Arne Vajhøj · Apr 11, 2008

Andreas said:
My own position (and also what I've gathered in workshops
before my SCJP and also from reading this newsgroup) was,
that generally it is better to allocate and drop objects
inside a loop, rather than allocate them before the loop
and re-initialize them each iteration. (That is due to how
GC works with separating short-living objects from longer-
living objects, which could otoh be also seen as a non-
guaranteed implementation detail, afterall)
There are of course exceptions, where *re*-initialization
cost would be considerably lower than first initialization,
e.g. where only a fraction of the object's state would vary
with each iteration. I dare to say that these were quite rare
in the reviewed code.

Just recently I was confronted with comments from a reviewer,
(with whom I'm not in the position for arguing directly) who
criticized the code for (among other stuff) its rather frequent
use of new inside loops.

Judging from other comments, it doesn't look like he really
analyzed each particular situation, but more likely made a
general statement, and counted actual occurrances of certain
patterns. I could of course be wrong here.

Did I miss out some paradigm shift away from short-lived objects
recently?

It is more likely that that paradigm has shifted since that
person learned programming.

I would expect allocating once and reusing to be slightly faster. But
not enough to make a difference in 99.99% of programs. So the choice
should be made on what fits the code logic. In 90+% of cases that
mean allocate inside loop.

I would be very surprised if effective GC of short lived objects
would make GC of 1 long lived object more expensive than of 1000
short lived objects.

Arne

Andreas Leitgeb · Apr 11, 2008

Arne Vajhøj said:
It is more likely that that paradigm has shifted since that
person learned programming.

I would be very surprised if effective GC of short lived objects
would make GC of 1 long lived object more expensive than of 1000
short lived objects.

That might be the position of the reviewer, but I've been told
other times, that allocation/gc costs are generally neglectible
compared to (re-)initialisation costs (the constructor). In case
of an allocation the fields are nullified en bloc, upon re-init
they'd be nullified one by one.
Exceptions exist, and have been mentioned already.

Arne Vajhøj · Apr 12, 2008

Andreas said:
That might be the position of the reviewer, but I've been told
other times, that allocation/gc costs are generally neglectible
compared to (re-)initialisation costs (the constructor). In case
of an allocation the fields are nullified en bloc, upon re-init
they'd be nullified one by one.
Exceptions exist, and have been mentioned already.

The fact that allocation and GC are very cheap has nothing to do
with what I wrote.

t1 = cost of allocating one foobar
t2 = cost of GC one short lived foobar
t3 = cost of GC one longlived foobar
t4 = cost of initializing a foobar (constructor or otherwise)
t5 = loop overhead per iteration
t6 = actually work per iteration

outside:
cost = t1 + 1000 *(t4 + t5 + t6) + t3

inside:
cost = 1000 * (t1 + t4 + t5 + t6 + t2)

outside - inside = -999*t1 -1000*t2 + t3

My postulate was that -1000*t2+t3 was negative making the
difference negative.

The fact that t1+t2 is smaller than t4 does not disprove that.

The fact that t1+t2 is much smaller than t5+t6 proves that
outside/inside is very close to 1.

Which is the argument for write the code that are most natural.

Arne

Andreas Leitgeb · Apr 12, 2008

Arne Vajhøj said:
The fact that allocation and GC are very cheap has nothing to do
with what I wrote.

t1 = cost of allocating one foobar
t2 = cost of GC one short lived foobar
t3 = cost of GC one longlived foobar
t4 = cost of initializing a foobar (constructor or otherwise)

I think, the point is, that t4(new object) != t4(re-init)
And the difference can be either direction, and most often
much larger than t1+t2 together.

outside: cost = t1 + 1000 *(t4 + t5 + t6) + t3
inside: cost = 1000 * (t1 + t4 + t5 + t6 + t2)
outside - inside = -999*t1 -1000*t2 + t3

My postulate was that -1000*t2+t3 was negative making the
difference negative.

I do not dispute your postulate, but only it's significance
in the sum with 1000*t4(each case)

Andreas Leitgeb · Apr 12, 2008

Lew said:
Read the section "The Myth of Expensive Object Allocation" in
<http://java.sun.com/developer/technicalArticles/Interviews/goetz_qa.html>

Thanks for this pointer - this is something I can mention for
my defense

PS: if the java-version in use is still 1.4 (this is not
my choice, of course): does this make a difference, or was
1.4's GC already comparable to current's ?

Arne Vajhøj · Apr 12, 2008

Andreas said:
I think, the point is, that t4(new object) != t4(re-init)

Why ?

If they do the same work they should take the same time.

And the difference can be either direction, and most often
much larger than t1+t2 together.

I would not assume so - it is bad practice doing lot of work in
constructors.

Arne

Andreas Leitgeb · Apr 12, 2008

Arne Vajhøj said:
Why ?
If they do the same work they should take the same time.

Oh, the argument is correct, but not the premise ...

I would not assume so - it is bad practice doing lot of work in
constructors.

I spoke of "initialisation", and humpty-dumpty as I am, this
is not limited to the constructor, but includes all that is
necessary to make the Object "usable".

I think it is obvious, that it will very much depend on the object,
whether the transition to that "initialized"-state is easier reachable
from "all zeroed out"-state or from "fields have arbitrary values from
their previous use"-state (which may again mean, that some fields may
not need to be changed at all).

As an extreme example, I claim that if my object was e.g. wrapping
a byte[1000] then looping and setting each element to zero is likely
taking more effort, than just dropping it and allocating a new one.
I'm too lazy to code a benchmark for it, and we know, that benchmarks
aren't that authoritative, anyway.

Andreas Leitgeb · Apr 12, 2008

Lew said:
Back to your question - Sun introduced its generational collector by 1.4, so
you can rely on the points Mr. Goetz made.

Thanks a lot!

even with that archÃ¦ologically interesting version.

good characterization. :-/

Arne Vajhøj · Apr 13, 2008

Andreas said:
Arne Vajhøj said:

Why ?
If they do the same work they should take the same time.

Click to expand...

Oh, the argument is correct, but not the premise ...

I would not assume so - it is bad practice doing lot of work in
constructors.

Click to expand...

I spoke of "initialisation", and humpty-dumpty as I am, this
is not limited to the constructor, but includes all that is
necessary to make the Object "usable".

I think it is obvious, that it will very much depend on the object,
whether the transition to that "initialized"-state is easier reachable
from "all zeroed out"-state or from "fields have arbitrary values from
their previous use"-state (which may again mean, that some fields may
not need to be changed at all).

As an extreme example, I claim that if my object was e.g. wrapping
a byte[1000] then looping and setting each element to zero is likely
taking more effort, than just dropping it and allocating a new one.

Could very well be.

But both "original init" and "reinit" are free to use the fastest
of the one.

Arne

Roedy Green · Apr 24, 2008

Just recently I was confronted with comments from a reviewer,
(with whom I'm not in the position for arguing directly) who
criticized the code for (among other stuff) its rather frequent
use of new inside loops

Java object allocate, (not counting the code in the constructor to
initialise) is relatively quick. What hurts you primarily is the time
to garbage collect and the frequency of garbage collect.

The time for a gc sweep depends on the total size and number of live
objects. If you want to optimise something, it is the number of
object floating around at any one time.

Lots of short lived objects increase the frequency of GC sweeps, but
don't increase the time for a sweep.

I wonder how long it will be before CPUs routinely have hardware
assist for parallel GC.

Andreas Leitgeb · Apr 24, 2008

Lew said:
Given that pretty much all new PCs are multi-core any more, I'd predict on the
order of now.

I agree on the estimate, but don't understand the usage of "any more" in
the positive context of the first part of the sentence.

Andreas Leitgeb · Apr 25, 2008

Lew said:
It's a colloquialism here in the United States, at least around here.

I've never ever seen it used before, and I think I read
english texts quite frequently. Thanks for explaining.

Anyway, if a "colloquialism ... at least around here" is ok for you,
why do you always complain so loud about other colloquial writings?
I suggest *u b* less strict about those, either.

RedGrittyBrick · Apr 25, 2008

Vile expression - I just assumed it was an editing error. Evidently,
beauty is in the eye of the beholder.

It's a colloquialism here in the United States,

Are you sure it isn't a Lewism?

at least around here.
It is best understood by expressing the phrase in the negative:

"No PCs are single-core any more."

Switch the negative to the positive but keep the "any more", and you
have the idiom.

"All PCs are multi-core any more."

In my part of the Anglosphere that would be better expressed as
"All PCs are multi-core these days." But you knew that anyway.

blmblm · Apr 26, 2008

Vile expression - I just assumed it was an editing error. Evidently,
beauty is in the eye of the beholder.

Are you sure it isn't a Lewism?

US English speaker here, and -- no, I don't think so; I hear/read
it fairly often.

In my part of the Anglosphere that would be better expressed as
"All PCs are multi-core these days." But you knew that anyway.

In my part too -- keeping in mind, however, that "better expressed"
(with an implicit "IMO") and "always expressed" aren't synonymous.

Crockford's 'The Good Parts' : a short review	21	Jul 11, 2009
feedback on code design	23	May 30, 2012
The devolution of English language and slothful c.l.p behaviors exposed!	50	Jan 24, 2012
What We Do we make a living by what we get. We make a life by what wegive. &Winston Churchill .	0	Jun 2, 2009
Musatov claims "Mode/Code"	2	Oct 31, 2009
Musatov's 'Mode/Code' Primary method call	4	Oct 31, 2009
Sencha Touch--Support 2 browsers in just 228K!	64	Jul 16, 2010
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Mar 1, 2008

Short-lived Objects - good or bad?

Andreas Leitgeb

Owen Jacobson

Arved Sandstrom

Andreas Leitgeb

Arne Vajhøj

Andreas Leitgeb

Arne Vajhøj

Andreas Leitgeb

Andreas Leitgeb

Arne Vajhøj

Andreas Leitgeb

Andreas Leitgeb

Arne Vajhøj

Roedy Green

Andreas Leitgeb

Andreas Leitgeb

RedGrittyBrick

blmblm

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads