Ulf Nordlund said:
I have seen the suggestion (in, for instance, AppPerfect) that counting
down rather than up when looping through an array will improve
performance. That is, doing it this way:
(1) for (int i=arr.length-1; i>=0; i--) {...}
would be slightly faster than this:
(2) for (int i=0; i<arr.length; i++) {...}
[I think it's got to do with testing against zero (which would be more
effective)].
That is what I've also heard. Though it's quite long since I last heard
that. IF this is the case, the difference is exactly one single
Maschine-Code instruction per iteration, ie (in pseudocode):
ADD counter, -lenght
TEST counter zero
instead of only
TEST counter zero
Might be two instructions to reset the counter or might need one more
register or other details. OTOH, many CPUs have a machine instruction that
simply compares arbitrary values which also takes the same amount machine
cycles as a comparison to zero.
My question is: Would this be true in a general sense (i.e. for all
JVMs)? To me, it seems that it would be possible to optimize a JVM also
for version (2) above...(?)
In general it *might* be true. It was in times way way back. There is a big
but..., however.
[If it's indeed true, the next question would be: Is it also worth
while? (Any benchmark tests?)]
And the but... is: Which one is faster based on merely on the way of
counting is probably so irrelevant on todays (and yesterdays) machines.
What you do IN the loop will have much more relevance. If you're handling
some data, which you most likely do, the order of access within the cache
of a CPU plays the most important role (unless you do things like disk-io
which is even many many orders of magnitudes slower than increasing or
decreasing counters and compare them against anything). There may be cases
where accessing your data in memory in increasing order is way faster
because it is the better access pattern for the cache or it might be just
vice versa that it is better to access it in decreasing order. Trouble is,
some change outside of the loop may change that into the other one, because
the preceeding instructions loaded some stuff which overwrote part of the
cache which makes another access order better. Add to that the fact that
todays processors (at least desktop and server processors from Intel and
AMD) reorder instructions, have piplines, branch-prediction, several
execution units with partial parallelism and out-of-order exection and a
lot of other gizmos, a general statement on which is better is rather
meaningless.
So my advice is: Take the form that suits you best (probably in terms of
readability) and stick with it. Should you ever have a performance problem,
use a profiler and see where you lose most time. Optimize there. A simple
change from an increasing counter to a decreasing one probably won't make
any noticeable change at all. If it does, it is the changed memory access
that is responsible, not the counting and comparing.
If you switch the underlying CPU architecture, it might change
dramatically. Or it won't.
But don't optimize prematurely based simply upon the fact that someone
somewhere told you that a comparison against 0 is cheap and a comparison
against 1000 is bloody expensive. It might be, on some small scale DSP, but
I doubt many people use Java on those.
And let's not forget the golden rule: "Premature optimization is the root
of all evil"
CU
Rene