Short-lived Objects - good or bad?

A

Andreas Leitgeb

My own position (and also what I've gathered in workshops
before my SCJP and also from reading this newsgroup) was,
that generally it is better to allocate and drop objects
inside a loop, rather than allocate them before the loop
and re-initialize them each iteration. (That is due to how
GC works with separating short-living objects from longer-
living objects, which could otoh be also seen as a non-
guaranteed implementation detail, afterall)
There are of course exceptions, where *re*-initialization
cost would be considerably lower than first initialization,
e.g. where only a fraction of the object's state would vary
with each iteration. I dare to say that these were quite rare
in the reviewed code.

Just recently I was confronted with comments from a reviewer,
(with whom I'm not in the position for arguing directly) who
criticized the code for (among other stuff) its rather frequent
use of new inside loops.

Judging from other comments, it doesn't look like he really
analyzed each particular situation, but more likely made a
general statement, and counted actual occurrances of certain
patterns. I could of course be wrong here.

Did I miss out some paradigm shift away from short-lived objects
recently?

Another example of different judgement is "try-catch inside
loops" (which I'd have seen as dictated from algorithm logic,
rather than either a good or bad choice).
 
O

Owen Jacobson

You are right.  Your reviewer needs to learn a few things.

Out of curiousity, I benchmarked this a while ago on some of my own
code [0], first taking a version that allocated new objects relatively
freely and then writing a second version that performed the same
operations but preferred mutating existing objects.

Over ten million iterations, the difference in execution time was on
the order of hundreds of milliseconds - that is, utterly negligible.

-o

[0] Code extracted from an application whose performance I care about,
that is, real code and not code written for the benchmark.
 
A

Arved Sandstrom

Andreas Leitgeb said:
My own position (and also what I've gathered in workshops
before my SCJP and also from reading this newsgroup) was,
that generally it is better to allocate and drop objects
inside a loop, rather than allocate them before the loop
and re-initialize them each iteration. (That is due to how
GC works with separating short-living objects from longer-
living objects, which could otoh be also seen as a non-
guaranteed implementation detail, afterall)
There are of course exceptions, where *re*-initialization
cost would be considerably lower than first initialization,
e.g. where only a fraction of the object's state would vary
with each iteration. I dare to say that these were quite rare
in the reviewed code.

Just recently I was confronted with comments from a reviewer,
(with whom I'm not in the position for arguing directly) who
criticized the code for (among other stuff) its rather frequent
use of new inside loops.

Judging from other comments, it doesn't look like he really
analyzed each particular situation, but more likely made a
general statement, and counted actual occurrances of certain
patterns. I could of course be wrong here.

And let's keep in mind that no small number of Java performance
recommendations (including some of the most frequent ones in Google searches
etc) date back 6-8 years, which means that the issues they address may have
much less impact now. Furthermore, performance tuning recommendations come
in at least two flavours - (1) stuff that should be done from the gitgo, and
(2) performance tuning that comes into some conflict with "clean" design. A
general comment that relates to (1) is justifiable in any review; a general
comment related to (2) should only arise if performance is in fact an issue,
and the piece of code in question is a proven offender.
Did I miss out some paradigm shift away from short-lived objects
recently?

Depends on your definition of "recent". To read a bunch of significantly
older articles/books you'd certainly be inclined to avoid object creation
inside loops.
Another example of different judgement is "try-catch inside
loops" (which I'd have seen as dictated from algorithm logic,
rather than either a good or bad choice).

I'd see both situations as being dictated (in 2008) more by algorithm logic
and readability/maintenance rather than by trying to regain every last
microsecond at design time. In the first case, if your loops aren't creating
millions of objects (*), and a semantically new object is being created each
time through the loop (almost all of the fields are changing) why not create
new objects? If the semantics actually are for _updating_ an existing
object, then that could be considered instead.

* Note: even if your loop is creating a large number of objects, if the
semantics (program logic) are for creation, not update, and the rest of the
code inside the loop involves significantly more processing than creation,
then why not just create new objects inside the loop?

As far as try-catch inside loops, well, it would really boil down to, does
the exception stop the loop or not?

AHS
 
A

Andreas Leitgeb

Thanks to all for confirmation. I was a bit intimidated,
because the reviewer was professional (as in: not just a
casual programmer to whom that job was ordered upon, but
someone whose business is reviewing.)

Not that I could refer him to this thread for my defense,
but it helped me a lot to be told, that I'm not entirely
off-track on that topic.

Kenneth P. Turvey said:
I'd have to say that my experience has been different. I think it really
depends on the objects being created. A single new inside an inner loop
can be fine, but if that new results in many, many sub-objects being
created then you might have problems with it.

Yes, I covered that by mentioning the initialisation effort compared
to the re-initialization (or update) effort.
 
A

Arne Vajhøj

Andreas said:
My own position (and also what I've gathered in workshops
before my SCJP and also from reading this newsgroup) was,
that generally it is better to allocate and drop objects
inside a loop, rather than allocate them before the loop
and re-initialize them each iteration. (That is due to how
GC works with separating short-living objects from longer-
living objects, which could otoh be also seen as a non-
guaranteed implementation detail, afterall)
There are of course exceptions, where *re*-initialization
cost would be considerably lower than first initialization,
e.g. where only a fraction of the object's state would vary
with each iteration. I dare to say that these were quite rare
in the reviewed code.

Just recently I was confronted with comments from a reviewer,
(with whom I'm not in the position for arguing directly) who
criticized the code for (among other stuff) its rather frequent
use of new inside loops.

Judging from other comments, it doesn't look like he really
analyzed each particular situation, but more likely made a
general statement, and counted actual occurrances of certain
patterns. I could of course be wrong here.

Did I miss out some paradigm shift away from short-lived objects
recently?

It is more likely that that paradigm has shifted since that
person learned programming.

I would expect allocating once and reusing to be slightly faster. But
not enough to make a difference in 99.99% of programs. So the choice
should be made on what fits the code logic. In 90+% of cases that
mean allocate inside loop.

I would be very surprised if effective GC of short lived objects
would make GC of 1 long lived object more expensive than of 1000
short lived objects.

Arne
 
A

Andreas Leitgeb

Arne Vajhøj said:
It is more likely that that paradigm has shifted since that
person learned programming.
:)

I would be very surprised if effective GC of short lived objects
would make GC of 1 long lived object more expensive than of 1000
short lived objects.
That might be the position of the reviewer, but I've been told
other times, that allocation/gc costs are generally neglectible
compared to (re-)initialisation costs (the constructor). In case
of an allocation the fields are nullified en bloc, upon re-init
they'd be nullified one by one.
Exceptions exist, and have been mentioned already.
 
A

Arne Vajhøj

Andreas said:
That might be the position of the reviewer, but I've been told
other times, that allocation/gc costs are generally neglectible
compared to (re-)initialisation costs (the constructor). In case
of an allocation the fields are nullified en bloc, upon re-init
they'd be nullified one by one.
Exceptions exist, and have been mentioned already.

The fact that allocation and GC are very cheap has nothing to do
with what I wrote.

t1 = cost of allocating one foobar
t2 = cost of GC one short lived foobar
t3 = cost of GC one longlived foobar
t4 = cost of initializing a foobar (constructor or otherwise)
t5 = loop overhead per iteration
t6 = actually work per iteration

outside:
cost = t1 + 1000 *(t4 + t5 + t6) + t3

inside:
cost = 1000 * (t1 + t4 + t5 + t6 + t2)

outside - inside = -999*t1 -1000*t2 + t3

My postulate was that -1000*t2+t3 was negative making the
difference negative.

The fact that t1+t2 is smaller than t4 does not disprove that.

The fact that t1+t2 is much smaller than t5+t6 proves that
outside/inside is very close to 1.

Which is the argument for write the code that are most natural.

Arne
 
A

Andreas Leitgeb

Arne Vajhøj said:
The fact that allocation and GC are very cheap has nothing to do
with what I wrote.

t1 = cost of allocating one foobar
t2 = cost of GC one short lived foobar
t3 = cost of GC one longlived foobar
t4 = cost of initializing a foobar (constructor or otherwise)

I think, the point is, that t4(new object) != t4(re-init)
And the difference can be either direction, and most often
much larger than t1+t2 together.
outside: cost = t1 + 1000 *(t4 + t5 + t6) + t3
inside: cost = 1000 * (t1 + t4 + t5 + t6 + t2)
outside - inside = -999*t1 -1000*t2 + t3
My postulate was that -1000*t2+t3 was negative making the
difference negative.

I do not dispute your postulate, but only it's significance
in the sum with 1000*t4(each case)
 
A

Arne Vajhøj

Andreas said:
I think, the point is, that t4(new object) != t4(re-init)

Why ?

If they do the same work they should take the same time.
And the difference can be either direction, and most often
much larger than t1+t2 together.


I would not assume so - it is bad practice doing lot of work in
constructors.

Arne
 
A

Andreas Leitgeb

Arne Vajhøj said:
Why ?
If they do the same work they should take the same time.

Oh, the argument is correct, but not the premise ... :)
I would not assume so - it is bad practice doing lot of work in
constructors.

I spoke of "initialisation", and humpty-dumpty as I am, this
is not limited to the constructor, but includes all that is
necessary to make the Object "usable". :)

I think it is obvious, that it will very much depend on the object,
whether the transition to that "initialized"-state is easier reachable
from "all zeroed out"-state or from "fields have arbitrary values from
their previous use"-state (which may again mean, that some fields may
not need to be changed at all).

As an extreme example, I claim that if my object was e.g. wrapping
a byte[1000] then looping and setting each element to zero is likely
taking more effort, than just dropping it and allocating a new one.
I'm too lazy to code a benchmark for it, and we know, that benchmarks
aren't that authoritative, anyway. :)
 
A

Andreas Leitgeb

Lew said:
Back to your question - Sun introduced its generational collector by 1.4, so
you can rely on the points Mr. Goetz made.
Thanks a lot!
even with that archæologically interesting version.
good characterization. :-/
 
A

Arne Vajhøj

Andreas said:
Arne Vajhøj said:
Why ?
If they do the same work they should take the same time.

Oh, the argument is correct, but not the premise ... :)
I would not assume so - it is bad practice doing lot of work in
constructors.

I spoke of "initialisation", and humpty-dumpty as I am, this
is not limited to the constructor, but includes all that is
necessary to make the Object "usable". :)

I think it is obvious, that it will very much depend on the object,
whether the transition to that "initialized"-state is easier reachable
from "all zeroed out"-state or from "fields have arbitrary values from
their previous use"-state (which may again mean, that some fields may
not need to be changed at all).

As an extreme example, I claim that if my object was e.g. wrapping
a byte[1000] then looping and setting each element to zero is likely
taking more effort, than just dropping it and allocating a new one.

Could very well be.

But both "original init" and "reinit" are free to use the fastest
of the one.

Arne
 
R

Roedy Green

Just recently I was confronted with comments from a reviewer,
(with whom I'm not in the position for arguing directly) who
criticized the code for (among other stuff) its rather frequent
use of new inside loops

Java object allocate, (not counting the code in the constructor to
initialise) is relatively quick. What hurts you primarily is the time
to garbage collect and the frequency of garbage collect.

The time for a gc sweep depends on the total size and number of live
objects. If you want to optimise something, it is the number of
object floating around at any one time.

Lots of short lived objects increase the frequency of GC sweeps, but
don't increase the time for a sweep.

I wonder how long it will be before CPUs routinely have hardware
assist for parallel GC.
 
A

Andreas Leitgeb

Lew said:
Given that pretty much all new PCs are multi-core any more, I'd predict on the
order of now.

I agree on the estimate, but don't understand the usage of "any more" in
the positive context of the first part of the sentence.
 
A

Andreas Leitgeb

Lew said:
It's a colloquialism here in the United States, at least around here.

I've never ever seen it used before, and I think I read
english texts quite frequently. Thanks for explaining.

Anyway, if a "colloquialism ... at least around here" is ok for you,
why do you always complain so loud about other colloquial writings?
I suggest *u b* less strict about those, either. :)
 
R

RedGrittyBrick

Vile expression - I just assumed it was an editing error. Evidently,
beauty is in the eye of the beholder.
It's a colloquialism here in the United States,

Are you sure it isn't a Lewism?

at least around here.
It is best understood by expressing the phrase in the negative:

"No PCs are single-core any more."

Switch the negative to the positive but keep the "any more", and you
have the idiom.

"All PCs are multi-core any more."

In my part of the Anglosphere that would be better expressed as
"All PCs are multi-core these days." But you knew that anyway.
 
B

blmblm

Vile expression - I just assumed it was an editing error. Evidently,
beauty is in the eye of the beholder.


Are you sure it isn't a Lewism?

US English speaker here, and -- no, I don't think so; I hear/read
it fairly often.
In my part of the Anglosphere that would be better expressed as
"All PCs are multi-core these days." But you knew that anyway.

In my part too -- keeping in mind, however, that "better expressed"
(with an implicit "IMO") and "always expressed" aren't synonymous.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,023
Latest member
websitedesig25

Latest Threads

Top