Counting down faster when looping?

U

Ulf Nordlund

I have seen the suggestion (in, for instance, AppPerfect) that counting
down rather than up when looping through an array will improve
performance. That is, doing it this way:

(1) for (int i=arr.length-1; i>=0; i--) {...}

would be slightly faster than this:

(2) for (int i=0; i<arr.length; i++) {...}

[I think it's got to do with testing against zero (which would be more
effective)].

My question is: Would this be true in a general sense (i.e. for all
JVMs)? To me, it seems that it would be possible to optimize a JVM also
for version (2) above...(?)

[If it's indeed true, the next question would be: Is it also worth
while? (Any benchmark tests?)]

Cheers,
/ulf
 
A

Alex Molochnikov

I think that the "counting down" vs "counting up" philosophy is rooted in
ancient times when decrementing the register was a slightly faster an
operation than the incrementing one.

Even if this were true for the today's HW (which I doubt) , the benefits of
performance improvement would be negligibly small. IMO, the maintainability
and readability of the code is far more important than slicing another
microsecond from the processing time.

Alex Molochnikov
Gestalt Corporation
 
R

Rene

Ulf Nordlund said:
I have seen the suggestion (in, for instance, AppPerfect) that counting
down rather than up when looping through an array will improve
performance. That is, doing it this way:

(1) for (int i=arr.length-1; i>=0; i--) {...}

would be slightly faster than this:

(2) for (int i=0; i<arr.length; i++) {...}

[I think it's got to do with testing against zero (which would be more
effective)].

That is what I've also heard. Though it's quite long since I last heard
that. IF this is the case, the difference is exactly one single
Maschine-Code instruction per iteration, ie (in pseudocode):

ADD counter, -lenght
TEST counter zero

instead of only

TEST counter zero

Might be two instructions to reset the counter or might need one more
register or other details. OTOH, many CPUs have a machine instruction that
simply compares arbitrary values which also takes the same amount machine
cycles as a comparison to zero.
My question is: Would this be true in a general sense (i.e. for all
JVMs)? To me, it seems that it would be possible to optimize a JVM also
for version (2) above...(?)

In general it *might* be true. It was in times way way back. There is a big
but..., however.
[If it's indeed true, the next question would be: Is it also worth
while? (Any benchmark tests?)]

And the but... is: Which one is faster based on merely on the way of
counting is probably so irrelevant on todays (and yesterdays) machines.
What you do IN the loop will have much more relevance. If you're handling
some data, which you most likely do, the order of access within the cache
of a CPU plays the most important role (unless you do things like disk-io
which is even many many orders of magnitudes slower than increasing or
decreasing counters and compare them against anything). There may be cases
where accessing your data in memory in increasing order is way faster
because it is the better access pattern for the cache or it might be just
vice versa that it is better to access it in decreasing order. Trouble is,
some change outside of the loop may change that into the other one, because
the preceeding instructions loaded some stuff which overwrote part of the
cache which makes another access order better. Add to that the fact that
todays processors (at least desktop and server processors from Intel and
AMD) reorder instructions, have piplines, branch-prediction, several
execution units with partial parallelism and out-of-order exection and a
lot of other gizmos, a general statement on which is better is rather
meaningless.

So my advice is: Take the form that suits you best (probably in terms of
readability) and stick with it. Should you ever have a performance problem,
use a profiler and see where you lose most time. Optimize there. A simple
change from an increasing counter to a decreasing one probably won't make
any noticeable change at all. If it does, it is the changed memory access
that is responsible, not the counting and comparing.

If you switch the underlying CPU architecture, it might change
dramatically. Or it won't.

But don't optimize prematurely based simply upon the fact that someone
somewhere told you that a comparison against 0 is cheap and a comparison
against 1000 is bloody expensive. It might be, on some small scale DSP, but
I doubt many people use Java on those.

And let's not forget the golden rule: "Premature optimization is the root
of all evil"

CU

Rene
 
C

Chris Uppal

Rene wrote:

I agree with your comments, but I wanted to point out that:
ADD counter, -lenght

is assuming that getting the length of an array is a simple (one instruction)
operation. That will only be true /if/ the JITer is bright enough to pull the
array bounds into a register (or somewhere else handy) before the start of the
loop.

No one would code a loop like this:

(for int i = 0; i < longSlowComputation(); i++)
{
...

And so, when we code an array loop as:

(for int i = 0; i < array.length; i++)
{
...

we are implicitly trusting the JITer to optimise array.length, or else deciding
that the cost isn't worth bothering about even if that optimisation doesn't
happen (which IMO is typically correct).

-- chris
 
K

Knute Johnson

Chris said:
Rene wrote:

I agree with your comments, but I wanted to point out that:




is assuming that getting the length of an array is a simple (one instruction)
operation. That will only be true /if/ the JITer is bright enough to pull the
array bounds into a register (or somewhere else handy) before the start of the
loop.

No one would code a loop like this:

(for int i = 0; i < longSlowComputation(); i++)
{
...

And so, when we code an array loop as:

(for int i = 0; i < array.length; i++)
{
...

we are implicitly trusting the JITer to optimise array.length, or else deciding
that the cost isn't worth bothering about even if that optimisation doesn't
happen (which IMO is typically correct).

-- chris

I pulled out my 80286 and 80287 Programmer's Reference Manual to see if
that went far enough back to have a difference between loops that count
up and loops that count down. A loop in 80286 assembly code would
require an INC (increment), a CMP (compare) and a JC (jump on
condition). None of these instructions would have different numbers of
clocks if the loop were incrementing or decrementing.

So it could be urban myth or go back even farther to 8086s or even
possibly farther into machine history.
 
A

Antti S. Brax

A loop in 80286 assembly code would
require an INC (increment), a CMP (compare) and a JC (jump on
condition). None of these instructions would have different numbers of
clocks if the loop were incrementing or decrementing.

You forgot JCXZ (jump if cx is zero).
So it could be urban myth or go back even farther to 8086s or even
possibly farther into machine history.

JCXZ dates back to at least 8088.
 
A

Alex

Hard to say.
Of course ">=" is faster then "<".
But "i" is faster then "arr.length".
How much? It depends.

Because I have time I wrote a little complicated application
public class A {

public static void main(String args[]){
int arr[]=new int[100000000];
long time;
time=System.currentTimeMillis();
for (int i=arr.length-1; i>=0; i--) {}
time=System.currentTimeMillis()-time;
System.out.println(time);

time=System.currentTimeMillis();
for (int i=0; i<arr.length; i++) {}
time=System.currentTimeMillis()-time;
System.out.println(time);

System.out.println();
time=System.currentTimeMillis();
for (int i=0; i<arr.length; i++) {}
time=System.currentTimeMillis()-time;
System.out.println(time);

time=System.currentTimeMillis();
for (int i=arr.length-1; i>=0; i--) {}
time=System.currentTimeMillis()-time;
System.out.println(time);

}
}
and run it! Many, amny times. And had a little surpise.
It had pretty stable (statistical) result. (!!!! are mine).
Under Windows:
361
360

301 (!!!!)
350
And under Solaris:
546
674 (!!!!)

539
539

Which literally means that third forces are involved! :)

Alex Kizub.
 
U

Ulf Nordlund

Alex skrev:
Hard to say.
Of course ">=" is faster then "<".
But "i" is faster then "arr.length".
How much? It depends.

Because I have time I wrote a little complicated application
public class A {

public static void main(String args[]){
int arr[]=new int[100000000];
long time;
time=System.currentTimeMillis();
for (int i=arr.length-1; i>=0; i--) {}
time=System.currentTimeMillis()-time;
System.out.println(time);

time=System.currentTimeMillis();
for (int i=0; i<arr.length; i++) {}
time=System.currentTimeMillis()-time;
System.out.println(time);

System.out.println();
time=System.currentTimeMillis();
for (int i=0; i<arr.length; i++) {}
time=System.currentTimeMillis()-time;
System.out.println(time);

time=System.currentTimeMillis();
for (int i=arr.length-1; i>=0; i--) {}
time=System.currentTimeMillis()-time;
System.out.println(time);

}
}
and run it! Many, amny times. And had a little surpise.
It had pretty stable (statistical) result. (!!!! are mine).
Under Windows:
361
360

301 (!!!!)
350
And under Solaris:
546
674 (!!!!)

539
539

Which literally means that third forces are involved! :)

Alex Kizub.

Is this possibly an effect of the Hot Spot VM?
/ulf
 
A

Andy Hill

Considering that your results imply 3 or 4 ns per iteration, I think you've got
other issues. Perhaps the loops are being optimized out?
 
A

Alex

Is this possibly an effect of the Hot Spot VM?
For me it means that it so effected by particular circumstances that it
doesn't cost time which we spent for discussion.
Also it means for me that each solution should be tested in real life
and only then accepted.
But even if you work under J2ME or any other environment where you cont
milliseconds impact is not mentionable and totally overriden by other
factors.

Alex Kizub.
 
A

Alex

Sorry, don't know what is
optimized out
Probably not.
javac A.java
java A

Maybe my development box 1.3 Gh/1 Gb is not good enough.
Or Solaris is not cool. (even it's production).
For me it's good result.

Alex Kizub.
 
D

Dimitri Maziuk

Knute Johnson sez:
.... None of these instructions would have different numbers of
clocks if the loop were incrementing or decrementing.

So it could be urban myth or go back even farther to 8086s or even
possibly farther into machine history.

I seem to recall that decrement loop was one instruction shorter, though.
Hmm... perhaps it was about putting counter in the right register and
doing a JNZ (as in for( i = 10; i--; )) instead of CMP/JNE?
-- I don't remember.

Dima (and I think it was m68k anyway)
 
K

Knute Johnson

Antti said:
You forgot JCXZ (jump if cx is zero).




JCXZ dates back to at least 8088.

JCXZ requires one more clock than all the other JCs. I think I would
write my compiler so it didn't use JCXZ if that were the case. Wasn't
there something about the CX register and using it for an index to a
table? If I remember that is what the JCXZ compare was for.

Below is the C and assembly code generated by my MSC++ 6.0 compiler. I
would guess that the Java compiler generates similar code. In any case
whether you do a subtraction or an addition and then a compare probably
makes no difference on a Pentium processor.

void main() {
int i;

for (i=0; i<10; i++) ;
}


TITLE test.c
.386P
include listing.inc
if @Version gt 510
..model FLAT
else
_TEXT SEGMENT PARA USE32 PUBLIC 'CODE'
_TEXT ENDS
_DATA SEGMENT DWORD USE32 PUBLIC 'DATA'
_DATA ENDS
CONST SEGMENT DWORD USE32 PUBLIC 'CONST'
CONST ENDS
_BSS SEGMENT DWORD USE32 PUBLIC 'BSS'
_BSS ENDS
_TLS SEGMENT DWORD USE32 PUBLIC 'TLS'
_TLS ENDS
FLAT GROUP _DATA, CONST, _BSS
ASSUME CS: FLAT, DS: FLAT, SS: FLAT
endif
PUBLIC _main
_TEXT SEGMENT
_i$ = -4
_main PROC NEAR
; File test.c
; Line 1
push ebp
mov ebp, esp
push ecx
; Line 4
mov DWORD PTR _i$[ebp], 0
jmp SHORT $L466
$L467:
mov eax, DWORD PTR _i$[ebp]
add eax, 1
mov DWORD PTR _i$[ebp], eax
$L466:
cmp DWORD PTR _i$[ebp], 10 ; 0000000aH
jge SHORT $L468
jmp SHORT $L467
$L468:
; Line 5
mov esp, ebp
pop ebp
ret 0
_main ENDP
_TEXT ENDS
END
 
A

Antti S. Brax

Sorry, don't know what is
Probably not.

Examine the statement below:

for (int i=arr.length-1; i>=0; i--) {}

Does executing this loop have any effects on program state?

The only variable that is modified during the loop is i and it
will be discarded immediately after the loop has executed.

The JIT compiler can and most likely will notice this and will
not execute the loop.

Try changing the loop like below and execute it again.

int j = 0;
for (int i=arr.length-1; i>=0; i--) { j++; }
For me it's good result.

They're good only if you're happy with results that don't have
any practical value.
 
A

Antti S. Brax

JCXZ requires one more clock than all the other JCs.

My reference manual states otherwise (at least for 80(2/3)86).
But even if it did, the idea is that the compare is built into
this jump command. You don't need to execute separate compare
command before the jump command. That is why using JCXZ for
looping down should be faster on a 80(2/3)86.

According to my manual, JCXZ is specifically meant for down
counting loops.


What comes to looping up or down in Java, I've personally
noticed that in a tight loop counting down _is_ faster (while
doing animated real time graphics) on a Pentium processor.

Would I do it in a "normal" application? Hell no...
 
A

Alex

They're good only if you're happy with results that don't have
any practical value.
Practical value is killing the time and enjoy.

Actually thread is about going up or down in array.
Nothing else.
So, everything was excluded even inoriginal post.
To eliminate all other factors.
For example, if I change {} to {arr++;} then increment and array
access is included also. And, again, I have pretty stable results.
Windows:
2193
1612
(!!! up is better!!!)
1663
2193

UNIX:
3011
3249
(!!! similar)
3302
3017

which says that these two incarnations are different too.
But question was about going down and up. Not about other factors.

Alex Kizub.
 
A

Alex

They're good only if you're happy with results that don't have
any practical value.
Actually thread is about going up or down in array.
Nothing else.
So, everything was excluded even inoriginal post.
To eliminate all other factors.
For example, if I change {} to {arr++;} then increment and array
access is included also. And, again, I have pretty stable results.
Windows:
2193
1612
(!!! up is better!!!)
1663
2193

UNIX:
3011
3249
(!!! similar)
3302
3017

which says that these two incarnations are different too.
But question was about going down and up. Not about other factors.

Alex Kizub.
 
J

jonck

This is a side-trip, but I was wondering whether one of you might know
the answer. I was interested in this problem, but when I tried running
the test class:
public class Test {
public static void main(String args[]){
int arr[]=new int[14897979];
long time;
time=System.currentTimeMillis();
int j = 0;
for (int i=arr.length-1; i>=0; i--) {j++;}
time=System.currentTimeMillis()-time;
System.out.println(time);


time=System.currentTimeMillis();
j = 0;
for (int i=0; i<arr.length; i++) {j++;}
time=System.currentTimeMillis()-time;
System.out.println(time);


System.out.println();
time=System.currentTimeMillis();
j = 0;
for (int i=0; i<arr.length; i++) {j++;}
time=System.currentTimeMillis()-time;
System.out.println(time);


time=System.currentTimeMillis();
j = 0;
for (int i=arr.length-1; i>=0; i--) {j++;}
time=System.currentTimeMillis()-time;
System.out.println(time);
}
}

For me the number of values in arr[] could go no larger than 14897979,
14897980 gave me an outOfMemoryError. You guys were talking about
having this value at 100000000, a number about 7 times larger, yet you
were (obviously) not getting an OutOfMemoryError. Any ideas what could
be causing this for me?

I'm on OS X 10.3.7, using Java 1.4.2_05
 
J

jonck

BTW, opening extra applications and thus lowering the amount of
available memory seemed to have no impact whatsoever...
 
A

Antti S. Brax

For me the number of values in arr[] could go no larger than 14897979,
14897980 gave me an outOfMemoryError. You guys were talking about
having this value at 100000000, a number about 7 times larger, yet you
were (obviously) not getting an OutOfMemoryError. Any ideas what could
be causing this for me?

Java VM has a default maximum heap size of 64 megabytes.

You can increase it with -Xmx command line parameter:

java -Xmx256M (this gives Java VM 256 megabytes memory).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,527
Members
44,999
Latest member
MakersCBDGummiesReview

Latest Threads

Top