time factor for loops and arrays

Patrick · Nov 24, 2004

Hello,

I noticed that there is time difference for the following problem:
Example: Array with size 32*100000

Now I iterate through the array with two for-loops:

for(i=0; i<32;i++){
for(j=0; j<100000; j++){
//store a value in the array
}
}

is 3 times slower than:

for(i=0; i<100000;i++){
for(j=0; j<32; j++){
//store a value in the array
}
}

but why?
the command within the loops will be executed 32*100000 in both cases.
I'm working with JBuilder10

Greetings and TIA

John B. Matthews · Nov 24, 2004

Hello,

I noticed that there is time difference for the following problem:
Example: Array with size 32*100000

Now I iterate through the array with two for-loops:

for(i=0; i<32;i++){
for(j=0; j<100000; j++){
//store a value in the array
}
}

is 3 times slower than:

for(i=0; i<100000;i++){
for(j=0; j<32; j++){
//store a value in the array
}
}

but why?
the command within the loops will be executed 32*100000 in both cases.
I'm working with JBuilder10

Greetings and TIA

In the latter, it's feasible to unroll the inner loop: only 32
repetitions versus 100,000. In addition, the JVM may be able to
optimize bus bandwidth: initializing 32 bytes may require eight
32-bit store instructions on one processor and four 64-bit
instructions on another. Numerous other optimizations are possible
depending on the architecture.

Tom McGlynn · Nov 24, 2004

Patrick said:
Hello,

I noticed that there is time difference for the following problem:
Example: Array with size 32*100000

Now I iterate through the array with two for-loops:

for(i=0; i<32;i++){
for(j=0; j<100000; j++){
//store a value in the array
}
}

is 3 times slower than:

for(i=0; i<100000;i++){
for(j=0; j<32; j++){
//store a value in the array
}
}

but why?
the command within the loops will be executed 32*100000 in both cases.
I'm working with JBuilder10

Greetings and TIA

If may well depend upon what the command you've represented as a comment
does. E.g.,

double array[100000][32];

for (i=0; i<100000; i += 1) {
for (j=0; j<32; j += 1) {
array[j] = i+j;
}
}

will likely have much better memory behavior than
when the loops are in the other direction. As written the inner
loop deals with each 32 element array once and for all while putting the
loops in the other order makes it stride through memory 32 times and
likely requires many more transfers between cache and main memory.

If the limits were something like 4096x4096, you might begin to get
in a situation where you would start thrashing the memory and have
to page when going the inefficient route. Then you'd get differences
that can be factors of 1000's not just two or three.

All of this depends upon the underlying architecture of the
system but generally if you have multidimensional arrays you want to
loop over the last index first.

Regards,
Tom McGlynn

Yamin · Nov 25, 2004

Tom McGlynn said:
Greetings and TIA

If may well depend upon what the command you've represented as a comment
does. E.g.,

double array[100000][32];

for (i=0; i<100000; i += 1) {
for (j=0; j<32; j += 1) {
array[j] = i+j;
}
}

will likely have much better memory behavior than
when the loops are in the other direction. As written the inner
loop deals with each 32 element array once and for all while putting the
loops in the other order makes it stride through memory 32 times and
likely requires many more transfers between cache and main memory.

Yeah, that's what I was thinking. I'll detail it a bit for curiousity
sake. It might be page thrashing. Memory is only linear, so consider
the compiler lays out the array of doubles as follows:
a[x][y]
a[0][0] a[0][1] a[0][2]...a[0][31]a[1][0]a[1][1]...a[99999][0]...a[99999][31]
so as we see in this layout, all the y values for a particular x are
laid out sequentially.

Lets take an imaginary processor cache page size of 128 bytes. I'm
being nice for this example assuming sizeof(double) == 4; So each
cache page can hold exactly 32 doubles And lets assume there is
only 1 page in the cache.

Doing it this way, when the processor loads the value a[0][0], the
values a[0][0] all the way to a[0][31] are cached. So as you go
through the inner loop, you are very nicely accessing all these
values...which are cached. When you move to the next 'x' index, you
throw out these cached results and load in a[1][0] to a[1][31]. Its
very nice for locality. Therefore, you only change the cache page x
time...which in your case is 100000;

Now for the sake of argument, lets say the array is laid out exactly
the same as above, but you go through the loop the other direction.
1. you access a[0][0]
2. a[0][0] to a[0][31] are all loaded in
3. you access a[1][0]
4. a[1][0] is not in the cache, so it throws that out and loads in
a[1][0] to a[1][31]
5. you access a[2][0]
6. a[2][0] is not in the cache, so it throws that out and loads in
a[2][0] to a[2][31]

Notice how at every access, you're throwing out what's in the cache
and loading a new cache back in. How wasteful Of course a real
processor has a much larger cache and the cache page size is not going
to work out exactly on even bounds with your array...but that's the
just of the idea. Here, you are swapping the cache page 100000*32
times. That's very bad

Yamin

SENTINEL CONTROL LOOP WHEN DEALING WITH TWO ARRAYS	1	Oct 26, 2023
My subroutine won't call within a for and if...is this normal?	1	Jan 25, 2022
How do i Do this function(dealing with arrays)	1	Dec 10, 2021
Function for arrays	1	Oct 22, 2013
Help with variables and 'for' loops	2	Apr 29, 2012
Boomer trying to learn coding in C and C++	6	Dec 16, 2022
Grid Stacking Game Issues	5	Jun 30, 2023
Early beginner learning arrays and for and while loops ...	33	Jul 6, 2009

time factor for loops and arrays

Patrick

John B. Matthews

Tom McGlynn

Yamin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads